Math 221 Notes - Chad Davis (Last Update July 2022)
Math 221 Notes - Chad Davis (Last Update July 2022)
Chad Davis
Foreword iv
2 Vectors in Rn 49
2.1 2-Dimensional Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.2 Vectors in Euclidean Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
2.2.1 Vector Forms of Solutions to Linear Systems . . . . . . . . . . . . . . . . . . . . . . . 65
2.3 Linear Combinations and Vector Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
2.4 Matrix Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
2.5 Spanning Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
2.5.1 Geometry of Spanning Sets in R2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
2.5.2 The Span Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
2.6 Linear Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
2.6.1 The Homogeneous Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
2.6.2 Linear Independence of Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
2.6.3 Special Cases of Linear Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
2.7 Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
2.7.1 Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
2.7.2 Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
i
ii CONTENTS
4 Subspaces 163
4.1 Introduction to Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
4.2 Column and Null Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
4.3 Bases for Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
4.4 Dimension of Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
4.4.1 The Basis Theorem and Three Useful Corollaries for Calculating Bases and Dimensions177
4.5 The Rank-Nullity Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
4.6 Row Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
4.6.1 The Canonical Basis for Col(A) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
4.7 The Invertible Matrix Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
5 Determinants 195
5.1 Calculation of Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
5.2 Properties of Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
5.2.1 Determinants of Triangular Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
5.2.2 Determinants of Transposes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
5.2.3 Elementary Row Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
5.2.4 Determinants and Invertibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
5.2.5 The Multiplicative Property of Determinants . . . . . . . . . . . . . . . . . . . . . . . 212
5.3 Cramer’s Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
5.3.1 Cramer’s Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
5.3.2 A New Formula for A−1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
5.4 Determinants as Areas and Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
5.4.1 Determinants as Areas and Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
5.4.2 Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
Foreword
Linear algebra is the study of linear transformations of vector spaces. A linear transformation is a function
that has sets of vectors as its domain and range. A linear transformation is an abstract mathematical con-
struction. The amazing thing about linear algebra is that, under the right conditions, these abstract objects
can always be represented by something concrete: an array of numbers known as a matrix.
The focus of this course is matrices and their various properties. Learning about matrices in a tangible
sense will allow us to learn about the abstract linear transformations. This is a remarkably wonderful thing.
We will learn various ways to manipulate matrices and learn how to perform algebraic operations on them,
such as addition and multiplication. We will see how all of these things relate to linear transformations in
a meaningful way. This is what makes linear algebra such a beautiful subject. We are able to study and
learn about complicated, abstract mathematics via objects that we can manipulate in a completely hands
on manner.
Chapter 1
In this chapter, we introduce linear systems and matrices. You have been solving systems of linear equations
since high school. Matrices are a natural way to encode all the relevant data in a linear system and can
be manipulated to obtain solutions to such systems in a very efficient manner. There are many real world
problems that can be modelled using linear systems, so we see immediately that matrices find application in
many areas.
Before we jump into the content, we fix some notation. The symbol R stands for the set of all real numbers,
Q stands for the set of all rational numbers, Z stands for the set of all integers, and C stands for the set of
all complex numbers.
If S is a set, then the notation x ∈ S means that x is in the set S; the symbol “∈” is read “is an element
of.” For example, x ∈ R means that x is a real number.
We begin with a type of problem you are likely to have seen before.
Example 1.1.1
Wentworth Music sells two types of guitars: Fender Stratocasters for $350 each and Gibson Les Pauls
for $600 each. Last year, they sold 230 guitars and made $100,000 from their total sales. How many
of each guitar did they sell?
Solution
Let x1 be the total number of Stratocasters sold and let x2 be the total number of Les Pauls sold. Since
1
2 CHAPTER 1. SOLVING LINEAR SYSTEMS WITH MATRICES
x1 + x2 = 230. (1.1)
We are also given that the total revenue Wentworth Music made off of guitars last year is $100,000.
Since each Stratocaster is sold for $350 and each Les Paul is sold for $600, we get another equation,
Presumably, you have seen how to simultaneously solve equations of this type before. You likely have
seen a number of different ways to do it. Some methods for solving these equations include graphing
both equations and determining the point where they intersection; doing substitution by solving for one
variable in one equation, substituting this into the other, and solving for the remaining variable; or you
might have seen how to solve such equations by subtracting multiples of one equation from the other
to eliminate variables. Even though substitution might be the method of preference to do this, I am
going to use the latter method.
Multiplying Equation 1.1 by 350 and subtracting it from Equation 1.2 gives
To solve for x1 , we subtract 600 times Equation 1.1 from Equation 1.2 to get
The equations in the previous examples are special types of equations called linear equations.
Let k be a positive integer. A linear equation in k variables is one that can be written in the
form,
a1 x1 + a2 x2 + . . . + ak xk = b
where x1 , x2 , . . . , xk are variables and a1 , a2 , . . . , ak , b are real (or complex) numbers. The values
a1 , a2 , . . . , ak are called the coefficients of the linear equation and b is called the constant term.
The above form is called the standard form of a linear equation.
1.1. LINEAR SYSTEMS 3
Identifying linear equations is straightforward. You are only allowed to have variables multipled by coeffi-
cients that are linked with addition and subtraction. No other functions of those variables - such as raising
to a power, products, quotients, trigonometric function etc. - can be present.
Example 1.1.2
The equation x1 − 3 = 6 − 2x2 is a linear equation in k = 2 variables. Its standard for is,
x1 + 2x2 = 9.
Example 1.1.3
The equation x21 + x2 = 1 is not linear as it involves a quadratic term x21 . Likewise, the following
equations are not linear:
x1 + x2
x1 + sin(x2 ) = 5, x1 + ln(x2 ) + x3 = 1, x1 x2 = 1, − = 1.
x3
♦
Example 1.1.4
Solution
Don’t be followed by the division! We can write the given equation as follows,
−4x1 + 3x3 4 3
− 2x2 − 1 = 3 =⇒ − x1 − 2x2 + x3 = 4,
5 5 5
which is the standard form of a linear equation. Its coefficients are − − 4/5, −2, and 3/5, and the
constant term is 4. ♦
The main goal of this chapter is to devise a method for solving not one linear equation, but for solving a
collection of linear equation simultaneously, just as we did in Example 1.1.1.
4 CHAPTER 1. SOLVING LINEAR SYSTEMS WITH MATRICES
Let k and n be positive integers. A linear system (or a system of linear equations) in k variables
with n equations is a collection of n linear equations in the same k variables. Symbolically, a linear
system is a collection of equations of the form,
where x1 , . . . , xk are variables and aij , bi are real (or complex) numbers for each i ∈ {1, 2, . . . , n} and
j ∈ {1, 2, . . . , k}. The aij ’s are called the coefficients of the linear system and the bi ’s are called the
constant terms or the constant coefficients.
The coefficients aij are read “a eye jay.” For example, the a12 entry is read “a one two” and not “a twelve”.
Moreover, the coefficients are always indexed by equation and then variable it is multiplying. That is, the i
picks out the equation the coefficient is in and the j picks out the variable that coefficient multiplies.
Example 1.1.5
x1 + x2 = 230
350x1 + 600x2 = 100, 000
The first equation is x1 + x2 = 230; the second is 350x1 + 600x2 = 100, 000. Therefore, the coefficients
of this linear system are,
Obviously, the coefficients are dependent on how the linear system is written down. If you’re asked to write
down the coefficients for a linear system, then you do this based on how the linear system is given to you.
1.1. LINEAR SYSTEMS 5
Example 1.1.6
Determine the coefficients and constant terms for the following linear system.
√ 1 5
−7x1 − 2x2 + x3 =
3 7
3x2 − 4x4 = 2
Solution
The first thing we notice is that this is a linear system in 4 variables with 3 equations. The fourth
variable is missing in the first equation. This means the coefficient on x4 in the first equation is a zero.
Similarly for the coefficients on x1 and x3 in the second equation. Thus, the coefficients for this linear
system are,
√ 1
a11 = −7, a12 = − 2, a13 = , a14 = 0,
3
is a k-tuple of real (or complex) numbers (s1 , s2 , . . . , sk ) such that when we make the substitutions.
x1 = s1 , x2 = s2 , . . . , xk = sk ,
6 CHAPTER 1. SOLVING LINEAR SYSTEMS WITH MATRICES
then each of the equation in the linear system is satisfied. That is, each equation becomes the
corresponding constant term on the right hand side.
The solution set of a linear system is the set of all solutions to that linear system; that is, it is a
set of k-tuples each of which are a solution to the linear system.
Example 1.1.7
x1 + x2 = 230
350x1 + 600x2 = 100, 000.
We saw that x1 = 158 and x2 = 72 is a solution to this system because, when we substitute these
values into each equation in the system, we have,
158 + 72 = 230;
This means (x1 , x2 ) = (158, 72) is a solution to the given linear system. In fact, it is the only solution
to this linear system. This means that the solution set to this linear system is exactly {(152, 78)} .
♦
Example 1.1.8
Solution
To verify this is true, we substitute the values into each equation and see if all of them are satisfied.
For the first,
−(13) − 2(0) − 7(−4) = 28 − 13 = 15;
This shows that (x1 , x2 , x3 , x4 ) = (13, 0, −2, −4) is a solution to the linear system. To verify the other,
we calculate:
−21 − 2(−4) − 7(−4) = −21 + 8 + 28 = 36 − 21 = 15;
This shows (x1 , x2 , x3 , x4 ) = (21, −4, 2, −4) is also a solution to the linear system. This means the
solution set for the system consists at least of the 4-tuples {(13, 0, −2, −4), (21, −4, 2, −4)}. We will see
later that, in fact, there are infinitely many different solutions to this linear system. ♦
In the previous two examples, the linear systems in question had at least one solution. It is not necessary
that this happens.
Example 1.1.9
3x1 + x2 = 1
6x1 + 2x2 = 5
Solution
We can use the same method as in Example 1.1.1 to try and find a solution. If we try to subtract twice
the first equation from the second, we wind up with:
6x1 + 2x2 = 5
−2 3x1 + x2 = 1
0 = 3
Clearly it is not possible that 3 = 0. This shows that the linear system has no solution.
If you don’t like this method, try substitution. First, solve for x2 in the first equation:
3x1 + x2 = 1 =⇒ x2 = 1 − 3x1 .
Once again, this is not possible, which implies that the system is not solvable. ♦
The previous examples show that linear systems may have solutions or they may not. This prompts the
following definition.
8 CHAPTER 1. SOLVING LINEAR SYSTEMS WITH MATRICES
A linear system is called consistent if it has at least one solution. It is called inconsistent if it does
not have a solution.
For example, the linear systems in Examples 1.1.1 and 1.1.8 are consistent since they have exactly one and
at least two solutions respectively. The linear system in Example 1.1.9 is inconsistent.
1.1.2 The Geometry of Solution Sets: Linear Systems in Two Variables and
Beyond
Let’s consider the linear system in Example 1.1.1:
x1 + x2 = 230
250x1 + 600x2 = 100, 000
We can graph these two equations in an x1 -x2 plane, which is the same as an x-y plane with the axes
relabelled. Doing so certainly yields two lines:
80
x1 + x2 = 230
350x1 + 600x2 = 100, 000
79
78
x2
77
76
75
150 151 152 153 154
x1
We see that the intersection point of these two lines is (152, 78) which is exactly the solution to the linear
system. This should make sense if we think about what a solution is: If (x1 , x2 ) = (s1 , s2 ) satisfies a linear
equation then (s1 , s2 ) is a point on the line represented by that equation. Since a solution to a linear system
must satisfy every linear equation in the system, it must be a point that lies on each line concurrently. In
the case of this example, it should now be clear why there is exactly one solution to the linear system: There
is only one point of intersection between these two lines.
1.1. LINEAR SYSTEMS 9
Let’s look at another two examples to see what else can happen.
Example 1.1.10
Graph the following linear system and, from the graph, determine the number of solutions the linear
system has. Moreover, state whether the system is consistent or inconsistent.
−2x1 + x2 = 4
−2x1 + x2 = 0
Solution
If we write these two equations in the usual y = mx + b form we typically see for linear equation, we
have
x2 = 2x1 + 4, x2 = 2x1 ,
which shows the lines have the same slope with different x2 -intercepts. Therefore, the lines are parallel.
Their graph is,
10
x2
9
1
x1
−7 −6 −5 −4 −3 −2 −1 1 2 3 4 5 6 7
−1
−2
−3
−4
−5
−6
−7
−8
−2x1 + x2 = 4
−9 −2x1 + x2 = 0
−10
Since these lines are parallel, we know there is no intersection point. Since they never intersect, there
is no solution to this linear system. This is an example of an inconsistent linear system. ♦
Note
If you graph the linear system in Example 1.1.9, you will also get two parallel lines with different x2 -
intercepts, which is why the system is inconsistent. We will see a bit later on that this is not the only
10 CHAPTER 1. SOLVING LINEAR SYSTEMS WITH MATRICES
Example 1.1.11
Graph the following linear system and, from the graph, determine how many solutions the linear
system has. Moreover, state whether the system is consistent or inconsistent.
x1 + x2 = 5
−x1 − x2 = −5
Solution
x2 = 5 − x1 ,
−x1 − x2 = −5 =⇒ x2 = 5 − x1 .
This means that these two equations represent the same line. Therefore, if we graph both of them, we
get the same two lines sitting on top of one another:
10
x2
x1 + x2 = 5
9
−x1 − x2 = −5
8
x1
−7 −6 −5 −4 −3 −2 −1 1 2 3 4 5 6 7
I have graphed the first equation with a thicker line so you can see the two lines sitting on top of one
another.
What is the solution set in this case? Here, every point on the two lines are concurrent. This means
every point on the line is a solution to the linear system. Since there are infinitely many points on a
1.1. LINEAR SYSTEMS 11
line, this means that this linear system has infinitely many solutions. Since there is at least one if
there are infinitely many, it follows that this linear system is consistent. ♦
The previous three examples show linear systems that have either,
ii) No solutions;
If we do the thought experiment and keep in mind what straight lines look like, it is not much of a stretch
to see that these are the only three possibilities for solution sets. Even if more linear equations are added to
the linear system, these are still the only possibilities. For instance, here is a graph of a linear system with
four equations in two variables that has exactly one solution:
7
x2
6
−x1 + 2x2 = 1
−x1 + x2 = 0
5 x1 + x2 = 2
5x1 + x2 = 6
4
x1
−5 −4 −3 −2 −1 1 2 3 4 5
−1
−2
−3
−4
−5
−6
Here, we see that the four lines simultaneously intersect at the point (1, 1). This is the solution to the linear
system.
7
x2
6
−x1 + 2x2 = 3
−x1 + x2 = 0
5 x1 + x2 = 2
4
x1
−5 −4 −3 −2 −1 1 2 3 4 5
−1
−2
−3
−4
−5
−6
Note that even though there are common points of intersection between pairs of lines, there is not a single
point where they all meet simultaneously. This means the linear system has no solution.
What about linear systems in three variables? The first thing we need to ask ourselves is “What does a
linear equation in three variables look like?” If we graph such an equation in a 3 dimensional coordinate
axes, we get a plane, which looks like a flat sheet of paper. As in the case of two variables, a solution to a
linear system in three variables is a point where all the planes in the linear system intersect simultaneously.
Keeping in mind that, like lines, planes don’t bend, it is somewhat easy to see that there is only a few ways
a set of planes can intersect:
ii) The planes lie on top of one another and, hence, intersect in infinitely many different points;
iii) The planes intersect along a line and, hence, intersect in infinitely many different points;
In any event, we see that, as in the case of linear systems in three variables, that there are once again only
three possibilities for the number of solutions: one, none, or infinitely many.
What about linear equations in four variables? Unfortunately, such equations can not be graphed or visual-
ized since they lie in a space that is a dimension higher than we live in. If you want to try to do the thought
experiment, though, a linear equation in four variables (called a four-dimensional hyperplane) is a four
dimensional cube. And, once again, if you start looking at common intersection points in four space of such
objects, they can only intersect in either exactly one, none, or infinitely many points, corresponding to one,
none, or infinitely many solutions.
1.1. LINEAR SYSTEMS 13
In fact, this same observation is true for all linear systems! Any collection of k linear equations in n variables
(called n-dimensional hyperplanes) intersects in either 1 point in n-dimensional space, infinitely many
points in n-dimension space, or in no points in n-dimensional space. This leads to the following which we
state as a fact, but whose truth will become obvious once we introduce our methods for solving linear systems
with matrices.
iii) No solutions.
where the aij ’s are numbers called the entries of A. The quantity n × k is referred to as the
dimension or size of A.
We refer to specific entries in a matrix as follows: the (i, j)-entry of A is the number in the ith row and jth
column of A. This means the (i, j)-entry of A is aij .
Example 1.1.12
−5 −5 0 8
The (2, 3)-entry of A is the number in the 2nd row and 3rd column. In this case, a23 = 2. Write
down and identify some of the other entries in this matrix!
14 CHAPTER 1. SOLVING LINEAR SYSTEMS WITH MATRICES
Example 1.1.13
0 1 −2
Solution
This matrix has 3 rows and 3 columns, so its size is 3 × 3. For the entries:
i) a21 is the entry in the second row and first column. In this case, a21 = 4.
ii) a11 is the entry in the first row and first column, so a11 = −1.
iii) a32 is the entry in the third row, second column, so a32 = 1.
iv) a23 is the entry in the second row, third column, so a23 = −7
v) a43 would be the entry in the fourth row and third column of A. Since A does not have four rows,
there is no (4, 3)-entry, so a43 does not exist.
In both cases, the first k columns of the matrix correspond to the coefficients of the variables
x1 , x2 , . . . , xk .
We draw the vertical line in the augmented matrix so we don’t confuse between the augmented matrix and
the coefficient matrix. Think of the vertical bar as representing the equals signs in the linear system. This
is not standard notation so, if you are reading other references, make sure you are aware of their notational
convention.
Example 1.1.14
Write down the coefficient and augmented matrices for the linear systems in Examples 1.1.1 and 1.1.6
and state their sizes.
Solution
x1 + x2 = 230
350x1 + 600x2 = 100, 000
3x2 − 4x4 = 2
16 CHAPTER 1. SOLVING LINEAR SYSTEMS WITH MATRICES
0 3 0 −4
0 3 0 −4 2
Example 1.1.15
π −4 1/2 9 10
State the size of this matrix and write down the corresponding linear system.
Solution
This matrix has 3 rows and 5 columns, so is 3 × 5. Since the matrix is augmented, the values in the
last column are the constant terms in the corresponding linear system. The rest of the values are
the coefficients on the variables in the linear system, of which there are four which we can denote by
x1 , x2 , x3 , and x4 . Thus, the corresponding linear system is,
x1 + 12x2 − x3 + 3x4 = 13
√
x1 − 2x2 + 9x3 + 12x4 = − 2
πx1 − 4x2 + (1/2)x3 + 9x4 = 10
♦
1.2. ROW REDUCING MATRICES: THE KEY TO SOLVING LINEAR SYSTEMS 17
In Example 1.1.1, we solved the linear system by subtracting multiples of one equation from another to
eliminate variables. While it might be easier to do substitution in this example, the reason I chose to use
this elimination method is because it provides concrete example of the aforementioned row operations. Let’s
see how this works.
The linear system in Example 1.1.1 and its corresponding augmented matrix are given below
" #
x1 + x2 = 230 1 1 230
⇐⇒ .
350x1 + 600x2 = 100, 000 350 600 100, 000
Denote the first row by R1 . R1 represents the equation x1 + x2 = 230. The second row, R2 , represents
the equation 350x1 + 600x2 = 100, 000. To find a solution to this linear system, the first thing we did was
subtract 350 times the first equation from the second. The corresponding operation on the matrix is to
replace R2 with 350 times R1 subtracted from R2 . This is denoted by R2 ⇒ R2 − 350R1 . The resulting
linear system and corresponding augmented matrix is,
" #
x1 + x2 = 230 1 1 230
R2 ⇒ R2 − 350R1 : ⇐⇒ .
250x2 = 19, 500 0 250 19500
The second row of the new matrix represents the equation 250x2 = 19500. We solve for x2 by dividing by
250. The corresponding operations performed on the matrix is to replace R2 with (1/250) times R2 . This is
written R2 ⇒ (1/250)R2 . The resulting linear system and corresponding augmented matrix is,
" #
x1 + x2 = 230 1 1 230
R2 ⇒ (1/250)R2 : ⇐⇒ .
x2 = 78 0 1 78
We can now solve for x1 by subtracting the second equation from the first. This is written as R1 ⇒ R1 − R2
and the resulting linear system and corresponding augmented matrix is
" #
x1 = 152 1 0 152
R1 ⇒ R1 − R2 : ⇐⇒ .
x2 = 78 0 1 78
Now look! This new linear system clearly has solution (x1 , x2 ) = (152, 78), which is precisely the solution to
the original linear system!
Keeping this example in mind, it is certain that there is more than meets the eye here. Indeed, by doing
matrix manipulations similar to what we did above, we can find the solution set to any linear system that
we’re given! This is a beautiful process and the rest of the chapter is dedicated to fleshing it out.
18 CHAPTER 1. SOLVING LINEAR SYSTEMS WITH MATRICES
The elementary row operations that can be performed on an n × k matrix A are the following.
1. Row Replacement: Add a non-zero multiple of one row to another. If c is a non-zero number,
then replacing row i by row i plus c times row j is denoted by Ri ⇒ Ri + cRj . The arrow is
pronounced “is replaced by” so the expression Ri ⇒ Ri + cRj is read “row i is replaced by row
i plus c times row j.”
2. Scaling Rows: Multiply one row by a non-zero number. If c is a non-zero number, then row
i replaced by c times row i is denoted Ri ⇒ cRi . This is read “row i is replaced by c times row
i.”
3. Swapping Rows: Interchange two rows. Interchanging row i with row j is denoted Ri ⇔ Rj .
This is read “row i is replaced by row j and row j is replaced by row i.”
Example 1.2.1
Perform the following row operations on the matrix in the order given:
Solution
ii) In the new matrix, we need to replace the third row with 5 times the second. This yields,
−2/5 0 1 −2/5 0 1
1 5 1/2 1 5 1/2
R3 ⇐⇒ R3 + 5R2 : =
−2 + 5(5) −1/3 + 5(1/2)
8 + 5(1) 13 23 13/6
−2 3 1 −2 3 1
After all four row operations have been performed, we are left with the matrix,
−262/5 −92 −23/3
−5 −25 −5/2
13 23 13/6
−2 3 1
Exercise
Perform the row operations on A in Example 1.2.1 in the reverse order. Do you get the same matrix?
Some matrices can be transformed into others using elementary row operations and others can not. For
instance, consider the matrices,
" # " #
1 0 1 2
A1 = , A2 = .
0 1 0 1
It is clear we can transform A1 into A2 by doing the row operation R1 ⇒ R1 + 2R2 . You can also do the
reverse: change A2 into R1 by doing R1 ⇒ R1 − 2R2 .
It is impossible to transform B1 into B2 using row operations alone. This means we can classify matrices by
whether or not they can be transformed into one another using row operations. This motivates the following
definition.
Two matrices that can be transformed into one another using a finite sequence of row operations are
called row equivalent. If A and B are row equivalent matrices, we write A ∼ B.
Exercise: Challenge
Prove that row equivalence is an equivalence relation on the set of all n × k matrices. (Ignore this
exercise if you don’t know what an equivalence relation is).
Example 1.2.2
Example 1.2.3
1 3 1 2 0 0 1 −3
Show this is true by finding the sequence of row operations that transform A into B.
Solution
There are many ways to proceed that will give the same answer. However, the way I’m going to perform
1.2. ROW REDUCING MATRICES: THE KEY TO SOLVING LINEAR SYSTEMS 21
the operations is foreshadowing for the algorithm we see in the next section.
Start in the top left corner of both matrices. The (1,1)-entry of B is a 1. To transform A into B,
we must make the (1,1)-entry of A into 1. This can be done by either swapping the first and third
rows or by multiplying the first row by −1/4. However, when doing row operations, it is useful to keep
fractions out of the calculations unless it is absolutely necessary. Therefore, instead of multiplying row
1 by −1/2, we will swap the first and third rows.
1 3 1 2
R1 ⇔ R3 : A ∼ 2 1 1 −3 = A1 .
−4 −2 −1 3
We now need the two entries below the 1 in the (1,1)-position to be zero. We can obtain this by doing
two row replacements.
1 3 1 2
R2 ⇒ R2 − 2R1 : A1 ∼ 0 −5 −1 −7 = A2 ,
−4 −2 −1 3
1 3 1 2
R3 ⇒ R3 + 4R1 : A2 ∼ 0 −5 −1 −7 = A3 .
0 10 3 11
Now we need a 0 below the -5 in the (2,2)-position of A3 . We can obtain this by doing the following
row replacement.
1 3 1 2
R3 ⇒ R3 + 2R2 A3 ∼ 0 −5 −1 −7 = A4
0 0 1 −3
We now need zeroes in the (2, 3) and (1, 3) positions. This can be done with two row replacements.
1 3 1 2
R2 ⇒ R2 + R3 : A4 ∼ 0 −5 0 −10 = A5
0 0 1 −3
1 3 0 5
R1 ⇒ R1 − R3 : A5 ∼ 0 −5 0 −10 = A6 .
0 0 1 −3
Now we need a 0 above the −5 in the (2,2)-position. We can avoid fractions by first multiplying the
second row by −1/5, and then doing row replacement.
22 CHAPTER 1. SOLVING LINEAR SYSTEMS WITH MATRICES
1 3 0 5
R2 ⇒ −(1/3)R2 : A6 ∼ 0 1 0 2 = A7
0 0 1 −3
1 0 0 −1
R1 ⇒ R1 − 3R2 : A7 ∼ 0 1 0 2 = B.
0 0 1 −3
We have now transformed A into B by performing a finite sequence of elementary row operations. This
shows that A is row equivalent to B; that is, A ∼ B. ♦
The matrix B in the previous example is in a special form called reduced row echelon form. The steps
followed in this example illustrate a standard algorithm used to transform a given matrix into its reduced
row echelon form.
The routine for solving linear systems using a matrix is contingent on using row operations to transform
the matrix into a special form called reduced row echelon form. Before we can define this, we need some
terminology.
Definition 1.2.3
Let A be an n × k matrix.
Example 1.2.4
The leading entries in the following matrix are bolded and underlined.
5 −2 0 1 0
0 0 0 0 0
0 0 7 0 0
0 −1 0 −1/2 2
1. All zero rows are below all non-zero rows; that is, all zero rows are at the bottom of the matrix.
2. Each leading entry of a row is in a column to the right of the leading entry of the row above it.
A is in reduced row echelon form (abbreviated RREF) if it satisfies the additional two conditions:
If A and B are row equivalent matrices and B is in echelon form, then B is called an echelon form
of A.
Reduced row echelon form is a special type of echelon form. This means every matrix in reduced row echelon
form is also in echelon form. The converse is not true: there are matrices in echelon form that are not in
reduced row echelon form.
Example 1.2.5
Determine which of the following six matrices are in reduced row echelon form, echelon form only, or
not in echelon form at all.
0 1 0 0 0
1 3 0 1 0 0 0 0
" #
1 0 4 3 1
i) A1 = 0 3 5 ii) A2 = iii) A3 =
0 0 1 0 0
0 1 3 9 10
0 0 0 0 0 0 1 0
0 0 0 0 1
1 0 0 0 0 1 1 0 0 2 1 3 0
0 1 0 2 0 0 1 0 0 0 4 0 0
iv) A4 = v) A5 = vi) A6 =
0 0 0 0 0 0 0 0 0 0 0 0 17
0 0 0 1 0 0 0 0 0 0 0 0 0
Solution
i) The leading entries in A1 are the (1, 1) and (2, 2)-entries. We can see that the leading entries are
in columns to the left of the ones above and that the only zero row is at the bottom of the matrix.
Therefore, A1 is in echelon form. It is not in reduced row echelon form, though. This is because
the leading entry in the (2, 2)-position is a 3 and not a 1, so condition 3 is violated. Condition 4
is violated as well because the entry about the (2, 2)-position is not a zero.
24 CHAPTER 1. SOLVING LINEAR SYSTEMS WITH MATRICES
ii) The leading entries of A2 are in the (1, 1)- and (2, 2)-positions. They obey condition 2, 3, and 4,
and there are no zero rows so condition 1 is vacuously satisfied. Therefore, A2 is in reduced row
echelon form.
iii) The leading entries of A3 are in the (2, 1), (1, 2), (3, 3), (4, 4), and (5, 5)-positions. We can see that
the leading entry in the (2, 1) position is in a column to the right of the leading entry above it in
the (1, 2)-position. This means condition 2 is violated so that A3 is not in echelon form.
iv) Condition 1 is violated by the third column of A4 . Therefore, this matrix is also not in echelon
form.
v) The leading entries of A5 are in the (1, 1) and (2, 2)-positions. They obey condition 2, they are
both 1’s, however condition 4 is not satisfied because there is a non-zero entry in the (1, 2)-position.
Finally, all the zero rows are at the bottom of the matrix. Therefore, this matrix is in echelon
form, but not in reduced row echelon form.
vi) The leading entries of A6 are in the (1, 1), (2, 2), and (3, 4)-positions, so they satisfy condition 2,
and condition 1 is also satisfied as the only zero row is at the bottom of the matrix. However,
there are leading entries that are not 1, and there is a non-zero entry in the (1, 2)-position, so
both conditions 3 and 4 are violated. Therefore, this matrix is in echelon form, but not in reduced
row echelon form.
♦
A given matrix A is row equivalent to infinitely many different echelon forms. Indeed, this is because you
can scale a row by any non-zero real number and remain in echelon form. For example,
" # " #
1 0 m 0
∼
0 0 0 0
for any real number m. The reduced row echelon form, on the other hand, is unique.
Every n × k matrix A is row equivalent to exactly one n × k matrix B in reduced row echelon form.
This fact is certainly something that needs to be proved, but we will omit the proof at this time.
The fact permits us to talk about the reduced row echelon form of a matrix A. For language, we say “put
A into RREF” or “row reduce A to RREF” to mean “use elementary row operations on A to transform A
into RREF.”
Example 1.2.6
In Example 1.2.3, B is the RREF of A and the example explicitly shows how to transform A into its
RREF using elementary row operations.
1.2. ROW REDUCING MATRICES: THE KEY TO SOLVING LINEAR SYSTEMS 25
When determining if a matrix is in echelon form, we need to look at its leading entries as we did in Example
1.2.5. The leading entries of a matrix in echelon form and the positions they lie in are important.
3. A pivot column (resp. pivot row) of A is a column (resp. row) of A that contains a pivot.
Example 1.2.7
The matrices A1 , A2 , A5 , A6 of Example 1.2.5 are all in echelon form. The pivots of each matrix have
been boldfaced and underlined:
1 1 0 0 2 1 3 0
1 3 0 " #
1 0 4 3 1 0 1 0 0 0 4 0 0
A1 = 0 3 5 , A 2 = , A5 = , A6 = .
0 1 3 9 10 0 0 0 0 0 0 0 17
0 0 0
0 0 0 0 0 0 0 0
The first and second columns of A1 , A2 , and A5 are pivot columns. The first, second, and fourth
columns of A6 are pivot columns. The pivot positions of A1 , A2 , and A5 are the (1, 1) and (2, 2)-
positions. The pivot positions of A6 are the (1, 1), (2, 2), and (3, 4)-positions. ♦
Given any n × k matrix A, whether it is in echelon form or not, we will still refer to columns/rows of A
as pivot columns/rows if, once we transform A to echelon form, the corresponding column/row contains a
pivot. We are allowed to do this due to the following result.
Theorem 1.2.1
Suppose that A is an n × k matrix. Then, the pivot positions in any two echelon forms of A are the
same.
Proof
The first thing we note is that row operations are reversible in the sense that if B is obtained from A
using a single row operation, then A can be obtained from B by applying a row operation that reverses
the first one. I leave it as an exercise to determine how to reverse the three different types of row
operations. What this means is that if A can be transformed into B using a sequence of row operations,
then you can transform B into A using a (generally different) sequence of row operations. In other
words, if A is row equivalent to B, then B is row equivalent to A.
26 CHAPTER 1. SOLVING LINEAR SYSTEMS WITH MATRICES
Now, suppose that B and C are echelon forms of A and that B has at least one pivot in a different
position than C. Note that, since B and C are in echelon form, all values in a column below a pivot
must be zero. Both B and C can be transformed into reduced row echelon form using row operations
as follows:
1. For any pivot that is not 1, multiply the corresponding row by the one over the pivot value;
2. Zero out all entries in a pivot column above the pivot using row replacement. Note that this will
not affect any of the other pivot positions because every entry directly to the left of the pivot
you’re working with, and all entries below, are necessarily zero.
Proceeding as described, transform B and C to reduced row echelon form with row operations and
denote these by B ∗ and C ∗ . It is clear that the row operations described above do not change any pivot
positions; that is, B and B ∗ have the same pivot positions, as do C and C ∗ . Therefore, since B and C
differ by at least one pivot positions, B ∗ and C ∗ can not be the same matrix; they differ in at least one
entry. But, A is row equivalent to both B ∗ and C ∗ , which implies A is row equivalent to two different
matrices in reduced row echelon form. This contradicts RREF Is Unique. Hence, it follows that the
pivot positions in different echelon forms of A are the same.
Note
Another way to state Theorem 1.2.1 is that the pivot positions are invariant between echelon forms of
a matrix.
This theorem implies that the pivot columns of a matrix don’t change as we pass between different row
equivalent echelon forms. This has a number of useful consequences going forward. In particular, this means
we can refer to the pivot positions/columns of a matrix A whether it is in echelon form or not.
Example 1.2.8
In Example 1.2.3, B is the RREF of A. The first three columns of B contain a pivot. Therefore, we
refer to the first three columns of A as pivot columns even though it is not in echelon form.
The reduced row echelon form of a matrix is the most important tool we have for solving linear systems and
also for solving many of the problems we encounter in this course. Therefore, it is important that we have an
efficient method for transforming a given matrix into its RREF. Gauss-Jordan Elimination is an algorithm
that gives us exactly this.
1.2. ROW REDUCING MATRICES: THE KEY TO SOLVING LINEAR SYSTEMS 27
Step 1. Start in the left-most non-zero column and, if necessary, interchange rows so that the
top entry is non-zero. This is a pivot position. If desired, scale the first row to make this first entry a 1.
Step 2. Use row replacement to create zeros in each position below the top pivot position.
Step 3. Ignore the row that contains the pivot in step 1 and any rows above this row. Apply steps
1 and 2 to the matrix that results from ignoring these rows. Repeat this process until there are no
more non-zero columns to apply it to. The resulting matrix will be in echelon form.
To put the matrix into RREF, perform the following additional step:
Step 4. Begin at the right most pivot. Working upwards, use replacement to create zeros in every
entry above each pivot. Use scaling to make sure each pivot is equal to 1.
Note
The process of using Gauss-Jordan elimination to put a matrix into echelon form or RREF is referred
to as row reduction. We say “Row reduce A to echelon form” to mean “Use Gauss-Jordan elimination
to put A into echelon form”.
The steps followed in Example 1.2.3 are exactly those outlined in Gauss-Jordan Elimination. We give another
example outlining all of the steps explicitly.
Example 1.2.9
Use Gauss-Jordan Elimination to row reduce the following matrix A to echelon form. Once in echelon
form, identify the pivot columns. Then, continue with Gauss-Jordan Elimination to put the matrix
in reduced row echelon form.
0 −4 −4 1 −7 −8
3 −1 −7 1 −4 −14
A= .
1 1 −1 3 4 10
2 2 −2 0 2 −4
Solution
Step 1. Starting in the left-most non-zero column, the first entry is a zero, so we need to swap to
28 CHAPTER 1. SOLVING LINEAR SYSTEMS WITH MATRICES
rows so that the first row is not zero. Theoretically, you can swap with any row you want so long as
this entry becomes non-zero. But, since you know this is a pivot position and that the pivot eventually
needs to be 1, you may as well swap with the row that makes this entry a 1. We see there is a 1 in the
(3, 1)-position, so swap row 1 with row 3.
1 1 −1 3 4 10
3 −1 −7 1 −4 −14
R1 ⇐⇒ R3 : A ∼ = A1 .
−4 −4 1 −7 −8
0
2 2 −2 0 2 −4
Step 2. We start at the pivot we just made. It is a 1 in the (1, 1)-position. We need to make everything
below this pivot a 0. Do two row replacement to do this:
1 1 −1 3 4 10
R2 ⇒ R2 − 3R1 0 −4 −4 −8 −16 −44
: A1 ∼ = A2 .
R4 ⇒ R4 − 2R1 −4 −4 1 −7 −8
0
0 0 0 −6 −6 −24
Note the row operations are executed in order from the top down.
Step 3. Ignore the row that contains the pivot we were working from, and any above. In this case, we
only ignore the fist row of A and look at the following submatrix:
0 −4 −4 −8 −16 −44
B= 0 −4 −4 1 −7 −8 .
0 0 0 −6 −6 −24
The second column of this sub-matrix is the left-most non-zero column, which corresponds to the (2,2)-
position in A2 . Now repeat steps 1 and 2 starting at this pivot position. If we divide row 2 by −4, we
get a 1 in this position and we start with this row operation.
1 1 −1 3 4 10
0 1 1 2 4 11
R2 ⇒ (−1/4)R2 : A2 ∼ = A3 .
−4 −4 1 −7 −8
0
0 0 0 −6 −6 −24
This row operation isn’t necessary at this time, but all the pivots must become 1 eventually anyway, so
we may as well do it now. To get zeroes below this new pivot, we do one row replacement.
1 1 −1 3 4 10
0 1 1 2 4 11
R3 ⇒ R3 + 4R2 : A3 ∼ = A4 .
0 0 0 9 9 36
0 0 0 −6 −6 −24
Now we’re back at step 3. Ignoring the row containing the pivot in B, the new sub-matrix is
1.2. ROW REDUCING MATRICES: THE KEY TO SOLVING LINEAR SYSTEMS 29
" #
0 0 0 9 9 36
C= .
0 0 0 −6 −6 −24
The fourth column is the left-most non-zero column. The 9 that is in the (3, 4)-position of A4 is the
new pivot position. Again, we can scale by 1/9 to turn it into 1 and then do a row replacement to get
a zero below:
1 1 −1 3 4 10
R3 ⇒ (1/9)R3 0 1 1 2 4 11
: A4 ∼ = A5 .
R4 ⇒ R4 + 6R3 0 0 0 1 1 4
0 0 0 0 0 0
Note again that the row operations are performed top down. If you do them in the reverse order, you
won’t get the same matrix.
Now, if we ignore the third row, and all rows above, we see that there are no more non-zero columns
to apply the algorithm to. This means step 3 is over and the resulting matrix is in echelon form, as
you can check against the definition. The pivot positions are the (1, 1), (2, 2), and (3, 4)-positions. The
first, second, and fourth columns are the pivot columns. These have been boldfaced and underlined.
1 1 −1 3 4 10
0 1 1 2 4 11
A5 =
0 0 0 1 1 4
0 0 0 0 0 0
Step 4. We start at the right-most pivot, which is the 1 in the (3, 4)-position. The pivot is already a 1
so we don’t need to do any scaling. Therefore, we only need to make zeroes above this pivot by doing
row replacement.
1 1 −1 0 1 −2
R2 ⇒ R2 − 2R3 0 1 1 0 2 3
: A5 ∼ = A6 .
R1 ⇒ R1 − 3R3
0 0 0 1 1 4
0 0 0 0 0 0
Proceed to the next pivot to the left. This is the 1 in the (2,2)-position. We muse row replacement to
make all entries above it zero.
1 0 −2 0 −1 −5
0 1 1 0 2 3
R1 ⇒ R1 − R2 : A6 ∼ = A7 .
0 0 0 1 1 4
0 0 0 0 0 0
The next pivot to the left is the 1 in the (1, 1)-position. There are no entries above this pivot. Therefore
the algorithm terminates and we are left with the RREF of A. ♦
There is a bit of an art to getting good with this algorithm and there are certain places where you can
combine steps to get the answer quicker. There is nothing wrong with doing this but, when you’re first
30 CHAPTER 1. SOLVING LINEAR SYSTEMS WITH MATRICES
learning, it is good practice to go through all the steps in full so you understand exactly how the algorithm
works. After this, you can start doing shortcuts.
1.3. SOLVING LINEAR SYSTEMS WITH MATRICES 31
Theorem 1.3.1
Let A be an augmented matrix corresponding to a linear system. Let B a matrix that is obtained
from A by performing a finite sequence of row operations on A. Then, the solution set to the linear
system represented by B is the same as the solution set to the linear system represented by A.
Proof
The argument I’ll present is more of a sketch of a proof as opposed to rigorous proof itself. That said,
I don’t think it should be too hard to see why this is true.
B is obtained from A using row operations, of which there are three types. Consider how these row
operations effect the linear system A represents.
1. Interchanging rows changes the order in which we write the equations in the linear system down.
This doesn’t change the linear system itself, so the solution set doesn’t change.
2. Scaling a row is equivalent to multiplying an equation in the linear system by some non-zero
constant. Once again, this won’t change the solution set as you could always divide the constant
back out.
The above explanations show that the elementary row operations are equivalent to operations we can
perform on equations in a linear system that do not change the solution set, whence the theorem
follows.
This theorem tells us that if we want to solve a linear system, we can perform row operations on the corre-
32 CHAPTER 1. SOLVING LINEAR SYSTEMS WITH MATRICES
sponding augmented matrix and the solution set will remain unchanged. The strategy, therefore, is to use
row operations to transform the augmented matrix into one that represents a linear system whose solution
is evident. This form is reduced row echelon form! We give an example.
Example 1.3.1
Use a matrix to find the solution set to the following linear system,
−4x1 − 2x2 − x3 = 3
2x1 + x2 + x3 = −3
x1 + 3x2 + x3 = 2
Solution
1 3 1 2
From Example 1.2.3, the RREF of A is
1 0 0 −1
0
A = 0 1 0 2 .
0 0 1 −3
The linear system A0 represents is
x1 = −1
x2 = 2
x3 = 3
It is fairly clear that the solution to this linear system is (x1 , x2 , x3 ) = (−1, 2, −3). By Theorem 1.3.1,
this is the solution to the original linear system. ♦
ii) No solutions;
The number of solutions to a linear system can be determined by looking at the reduced row echelon form
of the corresponding augmented matrix. We saw an example of what the reduced row echelon form looks
like when there is exactly one solution in Example 1.3.1. Let’s see what happens in the other two cases.
Example 1.3.2
Determine the solution set to the following linear system by putting its augmented matrix in reduced
row echelon form.
x1 + x2 = 2
x1 − x2 = 0
x1 − 2x2 = −3
Solution
1 −2 −3
We use Gauss-Jordan Elimination to put this matrix into reduced row echelon form.
Start in the (1, 1)-position. This entry is already a 1 so we don’t need to do any scaling. We need
to create zeroes below this entry using row replacement. To do this, we do the following two row
operations.
1 1 2
R2 ⇒ R2 − R1
: A ∼ 0 −2 −2 = A1 .
R3 ⇒ R3 − R1
0 −3 −5
From here, we see the next pivot is in the (2, 2)-position. This entry is a -2. Since dividing row 2 by
−2 does not introduce fractions, let’s do that. After, do a row replacement to get a zero underneath.
The two row operations we perform are:
1 1 2
R2 ⇒ (−1/2)R2
: A1 ∼ 0 1 1 = A2 .
R3 ⇒ R3 + 3R2
0 0 −2
The matrix is now in echelon form. To go to reduced row echelon form, start at the pivot in the (2, 2)-
position and work to the right. One can see there is only one row operation necessary to go to reduced
row echelon form:
1 0 1
R1 ⇐ R1 − R2 : A1 ∼ 0 1 1 = A3 .
0 0 −2
34 CHAPTER 1. SOLVING LINEAR SYSTEMS WITH MATRICES
A3 is the reduced row echelon form of A. The linear system this augmented matrix represents is:
x1 + = 2
x2 = 1
0x1 + 0x2 = −2
The last equation simplifies to −2 = 0. Clearly, this equation is nonsense and is never satisfied. This
means it is impossible to find a solution to the linear system A3 represents; i.e. it has no solutions.
Since the solution set of this linear system is the same as the original, it follows that the original linear
system has no solution. ♦
The previous example is one in which the linear system has no solution. The reduced row echelon form of
the corresponding augmented matrix has a row of the form
0 0 0 ... 0 | b
where b is a non-zero number. We will see that this is always how it works.
Example 1.3.3
Use a matrix to find the solution set to the following linear system,
Solution
x1 −2x3 − x5 = −5
x2 + x3 + 2x5 = 3
x4 + x5 = 4
These three variables x1 , x2 , and x4 can be solved in terms of x3 and x5 :
x1 = −5 + 2x3 + x5 , x2 = 3 − x3 − 2x5 , x4 = 4 − x5 .
This means that if the variables x1 , x2 , and x5 have the above specified forms, then values for x3 and
x5 can be chosen arbitrarily to obtain solution to the linear system. For example, take x3 = x5 = 0.
Then,
x1 = −5 + 2(0) + 0 = −5;
x2 = 3 − 0 − 2(0) = 3;
x4 = 4 − 0 = 4.
x1 = −5 + 2(−1) + 1 = −5 − 2 + 1 = −6;
x2 = 3 − (−1) − 2(1) = 4 − 2 = 2;
x4 = 4 − 1 = 3.
which again shows that (x1 , x2 , x3 , x4 , x5 ) = (−6, 2, −1, 3, 1) is a solution to the linear system.
Since we have already noticed that a linear system has either no solutions, one solution, or infinitely
many, it follows that this linear system must have infinitely many solutions because we have found more
36 CHAPTER 1. SOLVING LINEAR SYSTEMS WITH MATRICES
than one. In fact, we get a different solution for every real value of x3 and x5 . The complete solution
to the linear system is written formally as follows: let x3 = s and x5 = t where s, t ∈ R. Then,
x1 = −5 + 2s + t;
x2 = 3 − s − 2t;
x4 = 4 − t.
Then, the complete solution to the linear system is (x1 , x2 , x3 , x4 , x5 ) = (−5+2s+t, 3−s−2t, s, 4−t, t)
where s, t ∈ R. In fact, if you substitute these values into the equations in the lienar system, you’ll see
all of the equations are satisfied. This means that regardless of the values you pick for s and t, you
always get a new solution to the linear system. ♦
Exercise
The linear system in Example 1.3.3 has infinitely many solutions. When writing the solution to the system,
we chose two variables and replaced them with new variables s and t where s and t can take any real value.
Moreover, the other variables in the solution were dependent on the values of s and t and couldn’t be picked
arbitrarily. We differentiate these two types of variables as follows.
Example 1.3.4
x3 and x5 are free variables in the solution to the linear system in Example 1.3.3. x1 , x2 , x5 are basic
variables. ♦
Exercise
If there are free variables in the solution to a linear system, they are not necessarily unique. Generally,
you can rewrite the solution to change basic variables into free variables. Give this a try with the
solution in Example 1.3.3: make x1 and x2 free variables in the solution and make x3 and x5 basic
variables. This is merely a different way of writing the solution to the linear system down. Keep in
mind that it probably will look nothing like the one we obtained!
1.3. SOLVING LINEAR SYSTEMS WITH MATRICES 37
1. Is the linear system consistent or inconsistent? I.e. does it have at least one solution or not?
The answers to these questions can be deduced from the reduced row echelon form of the corresponding
augmented matrix. The following big theorem explains how it is all tied together.
i) The rightmost column of B (the one to the right of the vertical bar) is a pivot column. In this
case, the linear system A represents is inconsistent.
ii) The rightmost column of B is not a pivot column, and every column to the left of the vertical
bar is a pivot column. In this case, the linear system A represents is consistent and has exactly
one solution.
iii) The rightmost column of B is not a pivot column, and at least one column to the left of the
vertical bar is not a pivot column. In this case, the linear system A represents is consistent
and has infinitely many solutions.
Moreover, since each column either contains a pivot or doesn’t, this list is exhaustive. Thus, we
deduce that there are only three possibilities for solution sets to linear systems:
1) No solutions;
and the structure of the pivot columns in B described above determines the nature of the solutions.
Proof
First suppose the rightmost column of B is a pivot column. Then, this column contains a pivot, which
is a leading entry. This means that the first non-zero entry of some row of B is contained in this column.
This necessarily means that B contains a row of the form,
0 0 0 ... 0 | b
which is clearly impossible as b 6= 0. This means the linear system B represents is inconsistent and, so,
by Theorem 1.3.1, the linear system A represents is also inconsistent.
Now suppose that the rightmost column of B is not a pivot column. Additionally, suppose every column
to the left of the vertical bar is a pivot column. Since B is in reduced row echelon form, it follows that
B must have the following form:
1 0 0 . . . 0 b1
0 1 0 . . . 0 b2
0 0 1 . . . 0 b3
. . . .
. . ... ...
.. .. ..
B=
0 0 0 . . . 1 bk
0 0 0 ... 0 0
. . . .
. . ... ...
.. .. ..
0 0 0 ... 0 0
where b1 , b2 , . . . , bk are real numbers. Note, if n = k, then there are no rows of zeroes at the bottom of
B. Moreover, since each of the k columns to the left of the bar has a pivot, it must be the case that
there are at least k rows since each row can have at most one pivot; that is n ≥ k.
which clearly has only one solution: (x1 , x2 , x3 , . . . , xk ) = (b1 , b2 , . . . , bk ). Therefore, by Theorem 1.3.1,
the linear system represented by A has exactly one solution.
Finally, suppose that the rightmost column is not a pivot column and that at least one column to the
left of the vertical bar is a non-pivot column. Relabelling the variables in the linear system if necessary,
we can assume that the first m columns of B are pivot columns where 1 ≤ m < k. In this case, B has
the form,
1 0 0 . . . 0 b1,m+1 b1,m+2 . . . b1,k b1
0 1 0 . . . 0 b2,m+1 b2,m+2 . . . b2,k b2
0 0 1 . . . 0 b3,m+1 b3,m+2 . . . b3,k b3
. . .
.. .. .. . . . ... ..
.
..
. ...
..
.
..
.
0 0 0 . . . 1 bm,m+1 bm,m+2 . . . bm,k bm
0 0 0 ... 0 0 0 ... 0 0
. . .
.. .. .. . . . ... ..
.
..
. ...
..
.
..
.
0 0 0 ... 0 0 0 ... 0 0
1.3. SOLVING LINEAR SYSTEMS WITH MATRICES 39
where b1 , b2 , . . . , bm and the bij ’s all represent arbitrary real numbers (it is possible they are all zero).
Note that if there if k > n, that is, if there are more variables than equations, then there are no rows
of zeroes at the bottom of this matrix. Also, in this case, m < n.
Rewriting this augmented matrix as a linear system and rearranging as we did in Example 1.3.3, we
get the following:
From here, we can see that the variables xm+1 , xm+2 , . . . , xk are all free variables, meaning they can
take any real value and a new solution is obtained. This means the linear system represented by B has
infinitely many solutions and, so, the linear system represented by A also has infinitely many solutions
by Theorem 1.3.1.
The only case left out of this argument is when B consists entirely of zeroes. The deduction of what
happens in this case is left as an exercise.
Finally, a column of a matrix either is a pivot column or it is not, there is no in between. This means
the above 3 cases exhaust all possibilities for the pivot column structure of B. Therefore, the reduced
row echelon form of the augmented matrix for any linear system must fall into one of these three cases.
In each case, we have deduced the solution set. This means that there are only three possibilities for
solutions to linear systems:
1. No solutions;
which suffices for proof of Fact Number of Solutions of Linear Systems and the last statement of the
theorem.
Exercise
Suppose that the augmented matrix of a linear system is given by,
0 0 ... 0 0
0 0 ... 0 0
.
. . . .
. .
. . . . . .. ..
0 0 ... 0 0
Determine the number of solutions this linear system has. To do this, first try writing out the linear
system the augmented matrix represents.
40 CHAPTER 1. SOLVING LINEAR SYSTEMS WITH MATRICES
It is important to recognize that The Solutions Theorem only tells you the nature of the solution set for a
linear system. It does not tell you what the solution(s) is/are.
Example 1.3.5
The reduced row echelon form of the augmented matrix for the linear system in Example 1.3.3 is,
1 0 −2 0 −1 −5
0 1 1 0 2 3
0 0 0 1 1 4
0 0 0 0 0 0
Here, the pivot columns are the first, second, and fourth. The rightmost column is not a pivot
column, and there are two columns to the left of the bar (third and fifth) that are non-pivot columns.
Therefore, The Solutions Theorem immediately implies that the linear system has infinitely many
solutions, though you still need to write everything out to determine what the solution is.
Similarly, the reduced row echelon form of the augmented matrix for the linear system in Example
1.3.2 is,
1 0 1
0 1 1 .
0 0 −2
Here, we see that the rightmost column is a pivot column. Therefore, the corresponding linear system
has no solution by The Solutions Theorem. ♦
The Solutions Theorem can be generalized slightly. In particular, one only needs to look at any echelon form
of an augmented matrix to determine the nature of the solutions to a linear system. In particular, if B is an
echelon form of an augmented matrix A, then:
i) If the rightmost column of B is a pivot column, the corresponding linear system is inconsistent.
ii) If the rightmost column of B is not a pivot column and every column to the left of the vertical bar is
a pivot column, then the corresponding linear system is consistent and has exactly one solution.
iii) If the rightmost column of B is not a pivot column and at least one column to the left of the vertical
bar is not a pivot column, then the corresponding linear system is consistent and has infinitely many
solutions.
The proof of this is a straightforward application of Theorems ?? and 1.3.1 to The Solutions Theorem. I
leave the deduction of this as an exercise.
Exercise
Fill in the details for the argument needed to prove the statement above.
1.3. SOLVING LINEAR SYSTEMS WITH MATRICES 41
In order to determine the solution set of a linear system, do the following steps.
Step 1. Write down the augmented matrix A for the linear system.
Step 2. Use Gauss-Jordan Elimination to put A into its reduced row echelon form B.
i) The rightmost column of B is a pivot column. In this case, the linear system is inconsistent by
The Solutions Theorem. Stop here, you are done.
ii) The rightmost column of B is not a pivot column and every column to the left of the bar is a
pivot column. Then, the linear system has exactly one solution by The Solutions Theorem and
the solution can be easily read from the matrix.
iii) The rightmost column of B is not a pivot column and at least one of the columns to the left
of the vertical bar is a non-pivot columns. In this case, the linear system has infinitely many
solutions. If you are in this case, move to Step 4.
Step 4. This step is only applicable if you have infinitely many solutions to the linear system. To
write down the solution, first note that the pivot columns correspond to basic variables in the solution
and the non-pivot columns to the left of the bar correspond to free variables in the solution. Write
out the linear system that the reduced row echelon form represents and solve for the basic variables
in terms of the free variables. The resulting values provide a solution to the linear system.
Note
Example 1.3.6
4x1 + x2 − 7x3 = 23
x1 − 3x3 = 6
2x1 + 5x2 + 19x3 = 7
42 CHAPTER 1. SOLVING LINEAR SYSTEMS WITH MATRICES
Solution
2 5 19 7
Step 2. Proceed with Gauss-Jordan Elimination to put A into reduced row echelon form. We start by
making a 1 in the (1,1)-position and then getting zeroes below:
R1 ⇔ R2 1 0 −3 6
R2 ⇒ R2 − 4R1 : A ∼ 0 1 5 −1 = A1 .
R3 ⇒ R3 − 2R1 0 5 25 −5
Now start at the 1 in the (2,2)-position. We need a zero underneath this:
1 0 −3 6
R3 ⇒ R3 − 5R2 : A1 ∼ 0 1 5 −1 = A2 .
0 0 0 0
As one can check, the matrix is now in reduced row echelon form.
Step 3. The pivots are the 1’s in the (1, 1) and (2, 2)-positions. This means the rightmost column is
not a pivot column. Moreover, there is a non-pivot column to the left of the bar (column 3). By The
Solutions Theorem, the linear system has infinitely many solutions.
Step 4. We only do this step because the system has infinitely many solutions. Since the third column
is non-pivot, we see that x3 is the free variable whereas x1 and x2 are basic. Rewriting the matrix A2
as a linear system yields,
x1 − 3x3 = 6
x2 + 5x3 = −1
x3 is a free variable, so write x3 = s where s ∈ R. Substituting this into the above and solving for the
basic variables x1 and x2 in terms of x3 = s yields,
x1 − 3s = 6 =⇒ x1 = 6 + 3s
x2 + 5s = −1 =⇒ x2 = −1 − 5s
Therefore, the solution to the linear system is,
Note
The notation s ∈ R is important. Whenever there is a solution involving free variable, you must write
this because it tells the reader that s can be any real value. If this is not present, the solution means
nothing.
Example 1.3.7
x1 − (3/2)x2 + x3 = 2
5x1 − 7x2 + 5x3 = 10
2x1 − 3x2 + 2x3 = 6
Solution
2 −3 2 6
Step 2. We do Gauss-Jordan Elimination to put A into reduced row echelon form. The first step is
1 −3/2 1 2
R2 ⇒ R2 − 5R1
: A ∼ 0 1/2 0 0 = A1 .
R3 ⇒ R3 − 2R1
0 0 0 2
Note here that we are now in echelon form and the rightmost column is a pivot column. This is enough
to conclude the linear system is inconsistent. If you want to proceed to reduced row echelon form, we
do the following:
R1 ⇒ R1 + 3R2 1 0 1 2
R2 ⇒ 2R2 : A1 ∼ 0 1 0 0 = A2 .
R3 ⇒ (1/2)R3 0 0 0 1
Step 3. The rightmost column is a pivot column. Therefore, the linear system is inconsistent by The
Solutions Theorem. ♦
Example 1.3.8
Solution
1 −5 0 −5 12
R2 ⇒ −R2 0 1 0 1 −2
R3 ⇒ R3 + 3R2 : A2 ∼ = A3 .
0 0 1 3 1
R4 ⇒ R4 − 18R2
0 0 −5 −17 −4
The next pivot is the 1 in the (3, 3)-position. We need a zero below it.
1 −5 0 −5 12
0 1 0 1 −2
R4 ⇒ R4 + 5R3 : A3 ∼ = A4 .
0 0 1 3 1
0 0 0 −2 1
This matrix is now in echelon form. To proceed to reduced ow echelon form, start at the pivot in the
(4, 4)-position. It is a -2. Turn it into a 1 and make zeroes above it:
R4 ⇒ (−1/2)R4 1 −5 0 0 19/2
R3 ⇒ R3 − 3R4 0 1 0 0 −3/2
: A4 ∼ = A5 .
R2 ⇒ R2 − R4 0 0 1 0 5/2
R1 ⇒ R1 + 5R4 0 0 0 1 −1/2
There is one final move to get to reduced row echelon form: zero out the (1, 2)-entry:
1 0 0 0 2
0 1 0 0 −3/2
R1 ⇒ R1 + 5R2 : A5 ∼ = A5 .
0 0 1 0 5/2
0 0 0 1 −1/2
This is reduced row echelon form.
Step 3. We see that all of the columns to the left of the bar are pivot columns and the rightmost one is
not. By The Solutions Theorem, the linear system has exactly one solution and it can be readily seen
by looking at the matrix: (x1 , x2 , x3 , x4 ) = (2, −3/2, 5/2, −1/2). ♦
Example 1.3.9
Solution
Move to the next pivot to the left in the (2, 2)-position. It is not a one, but the row operation is easier
to do if you leave it as a 3 and don’t introduce fractions:
−6 0 −3 0 −6 6
0 3 −9 0 1 −4
R1 ⇒ R1 − 5R2 : A4 ∼ = A5 .
0 1 −3
0 0 3
0 0 0 0 0 0
Finally, to get to reduced row echelon form, we need to make the pivots equal to 1. We do this via two
row scalings:
1 0 1/2 0 1 −1
R1 ⇒ −(1/6)R1 0 1 −3 0 1/3 −4/3
: A5 ∼ = A6 .
R2 ⇒ (1/3)R2 −3
0 0 0 1 3
0 0 0 0 0 0
We are now in reduced row echelon form.
Step 3. The rightmost column of the reduced row echelon form is not a pivot column and there
are pivot columns to the left of the vertical bar that are non-pivot columns (columns 3 and 5). There-
fore, the linear system has infinitely many solutions by The Solutions Theorem and we proceed to step 4.
Step 4. In this step, we determine the solution to the linear system. Start by writing the reduced row
echelon form as a linear system:
x1 + (1/2)x3 + x5 = −1
x2 − 3x3 + (1/3)x5 = −4/3
x4 − 3x5 = 3
Since the non-pivot columns correspond to x3 and x5 , we take these as the free variables, so x1 , x2 , and
x4 are the basic variables. Letting x3 = s and x5 = t where s and t are real parameters, we rewrite the
above equations as follows:
x1 + (1/2)s + t = −1 =⇒ x1 = −1 − (1/2)s − t
x4 − 3t = 3 =⇒ x4 = 3 + 3t.
There are more ways we could write this. Suppose we didn’t like the fractional coefficients on the free
variables. Since the parameters s and t can take any real value, we can write them as s = 2s0 and
48 CHAPTER 1. SOLVING LINEAR SYSTEMS WITH MATRICES
t = 3t0 where s0 , t0 ∈ R with the coefficients chosen to clear denominators. Then, an equally valid way
to write the solution is,
4
−1 − s0 − 3t0 , − + 6s0 − t0 , 3s0 , 3 + 9t0 , 3t0 , s0 , t0 ∈ R
3
Both ways of writing are completely valid, you can do whichever you like. ♦
Note that a linear system must be consistent to have free variables - free variables occur only when a matrix
has infinitely many solutions.
Example 1.3.10
0 0 0 1
There is one non-pivot column to the left of the vertical bar, which typically would indicate the
existence of a free variable in the corresponding linear system. However, because the rightmost
column of this matrix is a pivot column, the linear system is inconsistent by The Solutions Theorem.
Hence, it has no free variables. ♦
Chapter 2
Vectors in Rn
Linear algebra is the study of vector spaces; these are abstract sets where vectors live. In this chapter, we
introduce vectors and some of their elementary properties.
That being said, just because I am using the word “elementary” does not meant that the forthcoming ma-
terial is easy. In fact, it is at this point where the material becomes more abstract. As you read on, I want
you to keep in mind that all of what is coming can be linked back to solving linear systems. In fact, a lot
of the challenge of the problems we solve isn’t calculation, it is determining what the question is asking and
how to interpret the corresponding calculation.
We all know the standard Cartesian plane: it is the one you work in to draw graphs of functions with the x
and y axes that has coordinates (x, y) where x and y are real numbers. Another way to refer to the Cartesian
plane is R2 , pronounced “R two.”
Let A and B be two points in the Cartesian plane R2 . The displacement vector from a point A to
−−→
a point B is an arrow with its tail at A and its head at B. This is denoted by ~v = AB. We have the
following terminology for vectors:
49
50 CHAPTER 2. VECTORS IN RN
ii) The length of ~v is the distance between the points A and B and is represented by the length
of the arrow. This is denoted by k~v k.
iii) The direction of ~v is the direction the arrow is pointing in. Note that two vectors have the
same direction if they are parallel and pointing in the same direction.
Example 2.1.1
Determine the initial and terminal points of the following vectors. Also determine their lengths.
x2
4
3 ~v
x1
−4 −3 −2 −1 1 2 3 4
w
~ −1
Solution
The initial point is where the vector starts. We can see that the vector ~v starts at the point (1, −1)
and w~ starts at (3, 4). The terminal points are the points where the arrow heads touch. In this case,
~ is (−3, −1).
the terminal point of ~v is (4, 3) and the terminal point of w
To find the lengths, we can use the Pythagorean theorem. Taking ~v for example, we can form the
following right angle triangle:
2.1. 2-DIMENSIONAL VECTORS 51
x2
3 ~v
x1
1 2 3 4
−1
We can see from here that the base of the right angle triangle has length 3 and it has height 4. Then,
by the Pythagorean Theorem, the length of ~v is
p √
k~v k = 32 + 42 = 25 = 5.
Note, the positive square root is always used because we are talking about a length.
x2
4
x1
−3 −2 −1 1 2 3
w
~ −1
52 CHAPTER 2. VECTORS IN RN
We see the base of the triangle is 6 and the height of the triangle is 5. Therefore,
p √
~ = 62 + 52 = 61.
kwk
Note these vectors do not share the same direction because they are not parallel. ♦
Example 2.1.2
The following two directions have the same direction, but not the same length:
x2
1 w
~
x1
−4 −3 −2 −1 1 2 3 4
−1
~v −2
−3
The directions are the same because they are pointing in the same direction and are parallel. Note
that if we reverse the initial and terminal points of one of these vectors, they are still parallel, but no
longer point in the same direction, so the vectors themselves do not have the same direction. This is
pictured below:
2.1. 2-DIMENSIONAL VECTORS 53
x2
1 w
~
x1
−4 −3 −2 −1 1 2 3 4
−1
−2
~v −3
The only quantities that matter when it comes to vectors are the length of the vector, also referred to as
magnitude, and direction. Indeed, two vectors ~v and w~ are equal when their magnitude and direction are
the same. We don’t care where they are positioned in R2 . For example, the following two vectors in R2 are
equal even though they have different initial and terminal points.
x2
w
~
4
~v
2
x1
−1 1 2
It may be obvious from this discussion, but it is worth mentioning that vectors are not numbers. Therefore,
54 CHAPTER 2. VECTORS IN RN
if we want perform operations like addition and subtraction on vectors, then we need to define this operation.
Let’s look at addition first.
−−→ −−→
Suppose A, B, and C are three points in R2 and let ~v = AB, and w
~ = BC. We give the following picture
below as an example to visualize the situation:
x2
C
−−→
w
~ = BC
−−→
~v = AB x1
A
Vectors represent quantities that have direction and magnitude. With this in mind, how should we define
the “sum” of two vectors? Think about adding two numbers as combining them together into a number that
represents the “total” of the two. If we think about how to extend this to vectors, we could reasonably think
that the sum of two vectors should be a vector that represents the total “effect” of the two being added.
To see how one might define this “total” vector, think about a truck starting at A, driving along ~v until it
hits the point B. then driving along w
~ until it hits the point C. The total “effect” is a path from the point
−→
A to the point C. This suggests we should define the sum of ~v and w~ as AC: the vector whose initial point
is A and whose terminal point is C; this is depicted below:
x2
C
−−→
w
~ = BC
−→
~v + w
~ = AC
−−→
~v = AB x1
A
2.1. 2-DIMENSIONAL VECTORS 55
Now, we present a way to construct the vector ~v + w. ~ This construction will be quite convenient because,
after this section, all vectors will be assumed to have their initial point at the origin (0, 0).
First translate w
~ so its initial point is the same as that of ~v :
x2
w
~
~v
x1
Remember that the actual position of the vectors doesn’t matter, only magnitude and direction does, so the
w
~ above is the same as the one before, just in a different position.
Next, starting at the terminal point of ~v , and draw a vector that is parallel to w
~ and has the same length.
It should look like the following:
x2
w
~
~v
x1
x2
w
~
~v
x1
If you have done this correctly, you will always create the above parallelogram. Then, the vector ~v + w ~ is
the vector whose initial point is the initial point of ~v and w
~ and whose terminal point is the far corner of
this parallelogram:
x2
~v + w
~
w
~
~v
x1
It is evident that ~v + w
~ constructed in this way is the same as the one constructed above. This is because
the vector starting at the terminal point of ~v is actually just w:
~ it has the same length and points in the
same direction.
The above shows how to add vectors. Can we multiply them? The answer to that question is not clear at the
moment. There are notions of multiplication of vectors, but they all have some issues. Instead of defining
multiplication of vectors, we define a related notion called scalar multiplication.
2.1. 2-DIMENSIONAL VECTORS 57
A real number r ∈ R is called a scalar. If ~v is a displacement vector and r is a scalar, the scalar
multiple r~v is the displacement vector whose length is,
Example 2.1.3
x2
4
2~v
2
~v
1
(1/2)~v
x1
−4~v
−5 −4 −3 −2 −1 1 2 3 4
−~v −1
Note that −~v is merely ~v pointing in the opposite direction. We call this the negative of ~v . ♦
If you think about the arithmetic of numbers, the operation of subtraction is the same as adding negative
numbers. We define subtraction of vectors in the same way:
~v − w
~ = ~v + (−w).
~
We can calculate differences using the parallelogram law for addition. An example of this is shown below.
58 CHAPTER 2. VECTORS IN RN
w
~ x2
4
x1
−1 1 2 3 4 5
~v
−1
−2
−3
−4 −w
~
−5 ~v − w
~
Like most things in mathematics there is a shortcut. Notice that the vector we have drawn as ~v − w ~ is
exactly parallel to the vector we get if we start at the tip of w
~ and join this to the tip of ~v . See below:
w
~ x2
4
x1
−1 1 2 3 4 5
~v
−1
−2
−3
−4 −w
~
−5 ~v − w
~
This vector has the same magnitude and direction as ~v − w, ~ thus it is equal to ~v − w.
~ Therefore, we can
draw the difference ~v − w
~ by drawing a vector from the terminal points of w ~ to the terminal point of ~v . The
~ − ~v , which should make sense because w
direction is reversed for w ~ − ~v is the negative of ~v − w!
~
2.2. VECTORS IN EUCLIDEAN SPACE 59
The previous examples show we can represent vectors in the Cartesian plane using matrices. The beautiful
thing about this observation is that we can use matrices to generalize vectors. Indeed, we could do the same
thing in the previous section for vectors in 3-dimensional space. And then we can generalize to 4-dimensional
space or 5-dimensional space, or any n-dimensional space we can think of. Therefore, we define vectors using
matrices as follows.
The entries v1 , . . . , vn are called the components of the vector ~v . The number of components n is
called the dimension of the vector. The set of all n-dimensional vectors with real components is
60 CHAPTER 2. VECTORS IN RN
.
In the above, the .. notation is a placeholder that tells you to repeat the pattern up to the given last value.
We use this because, to work in total generality, we can not assign a specific positive integer value to n.
Indeed, if you are supposed to prove that a specific property holds for vectors in Rn , then you must work with
general vectors written down just like above. Proving the property holds by only considering vectors with a
fixed number of components, say 3, only proves that property holds for vectors in R3 , and not necessarily in
R4 , R5 , etc.
Example 2.2.1
R2 is 2-dimensional real space, which we know as the Cartesian plane. This is the space you work in
in Calculus 1 and 2. R3 is three dimensional real space. You’ve worked in R3 if you’ve done Calculus
3. R3 is also what we live in.
If n ≥ 4, then we can’t visualize Rn . This is why defining vectors in terms of matrices is so important.
In the previous section, we defined a vector as a 2-dimensional arrow with length and direction. We then
used our geometric intuition to derive how to add, subtract, and do scalar multiplication. We can do the
same thing in R3 , though addition and subtraction is a little more complicated to figure out geometrically.
But since we can’t visualize Rn for any n ≥ 4, we can’t repeat this same geometric intuition process for
higher dimensional spaces. What do length and direction even mean in a space like R4 ? Defining vectors as
matrices allows us to skirt around these problems. We can then define operations on n-dimensional vectors
via their matrix representations from a purely algebraic standpoint which is motivated by what we know
happens in the 2 and 3 dimensional cases.
~ ∈ Rn ,
3. For any scalar r ∈ R, and any vector w scalar multiplication of r and w,
~ denoted rw,
~
is defined as
w1 rw1
w2 rw2
rw
~ = r ..
=
.. .
. .
wn rwn
1) If you want to add or subtract two vectors, they must have the same dimension. If two vectors have
different dimension, then they can not be added or subtracted. Therefore, we can not add/subtract a
vector in R2 from a vector in R5 .
2) We do not define multiplication of two vectors. We have scalar-vector multiplication, but that is it.
This is a little different than what we are used to as we now have two different objects to deal with
(scalars and vectors).
Note
Under the operations of addition and scalar multiplication, the set Rn becomes a mathematical
structure called a vector space. We won’t delve into the abstract notion of a vector space too much
in this course but, these are studied frequently in lots of areas of mathematics.
Example 2.2.2
2 " #
−1
Let ~u = 1 and ~v = . Then ~u ± ~v is not defined.
−1
1
62 CHAPTER 2. VECTORS IN RN
Example 2.2.3
Let
1 −2 −3
~ = 8 .
~u = 0 , ~v = 1 , w
2 3 −1
Calculate 3~u, 2~v − w,
~ 2~u + 3~v + w.
~
Solution. Applying the definitions of vector addition and scalar multiplication yields
3
3~u = 0 ,
6
−4 −3 −4 − (−3) −1
2~v − w
~ = 2 − 8 = 2−8 = −6 ,
6 −1 6 − (−1) 7
2 −6 −3 −7
~ = 0 +
2~u + 3~v + w 3 + 8 = 11 . ♦
4 9 −1 12
There is a special vector that plays the same role that zero does for numbers. It is aptly called the zero
vector.
The zero vector in Rn , denoted ~0, is the vector that contains only zeroes
0
0
~0n =
.. .
.
0
When the dimension of the zero vector is clear from context, we drop the n and simply write ~0.
The following theorem gives some properties of vector addition and scalar multiplication. Many of these are
similar to those of real numbers. We need these in order to do algebra on vectors.
1. ~u + ~v = ~v + ~u (commutativity of addition);
2.2. VECTORS IN EUCLIDEAN SPACE 63
4. (~u + ~v ) + w
~ = ~u + (~v + w)
~ (associativity of addition);
6. ~u + (−~u) = ~u − ~u = ~0;
7. ~u + ~0 = ~0 + ~u = ~u;
8. 1~u = ~u.
Proof. To prove each property in this theorem, we need to show each vector on the left is equal to the vector
on the right. To do this, you pick arbitrary vectors, write them out, compute the expression on the left and
show it is equal to the expression on the right. I’ll do two of these proofs and leave the rest as exercises as
they all follow in a similar fashion.
u1 v1
u2 v2
Proof of Property 1. Let ~u = . , ~v = .
. Then,
.. ..
un vn
u1 v1
u2 v2
~u + ~v = . + .
.. ..
un vn
u1 + v1
u2 + v2
=
.. by definition of vector addition,
.
un + vn
v1 + u1
v2 + u2
=
.. by properties of real numbers,
.
vn + un
v1 u1
v2 u2
=
.. +
.. by definition of vector addition,
. .
vn un
64 CHAPTER 2. VECTORS IN RN
= ~v + ~u
u1
u2
Proof of Property 5. Let ~u = .. ∈ Rn and r, s ∈ R. Then,
.
un
su1
su2
s~u =
..
.
sun
Therefore,
r(su1 )
r(su2 )
r(s~u) =
..
.
r(sun )
(rs)u1
(rs)u2
=
.. by properties of real numbers,
.
(rs)un
u1
u2
= (rs)
.. by definition of scalar multiplication,
.
un
= (rs)~u.
Exercise
Prove the rest of Properties of Vectors.
It should be clear why all of these properties are true for 2-dimensional vectors given the geometric definitions
of addition and subtraction we derived in the last section. For example, ~u + ~v = ~v + ~u makes sense because
the parallelogram defined by ~u and ~v is the same as the parallelogram defined by ~v and ~u.
The idea is straightforward. Once we have a solution to a linear system, instead of putting the solution in
round brackets as an ordered tuple, put the solution into a vector. If a linear system has only one solution,
the difference between the two forms of the solution is negligible. For example, the vector form of the solution
to the linear system in Example 1.1.1 is the following,
" # " #
x1 152
= .
x2 78
Vector forms of solutions to linear systems with infinitely many solutions are a little bit different. In this
case, we split the vector apart using vector addition and scalar multiplication so that each free variable sits
inside of its own vector. This process becomes clear after seeing a few examples.
Example 2.2.4
Solution. Writing out the components explicitly, the solution to the linear system is,
7 s 58 31
x1 = − − , x2 = − s, x3 = s.
25 25 25 25
Put this solution into a vector as follows,
x1 −(7/25) − (1/25)s
x2 = 58/25 − (31/25)s
x3 s
66 CHAPTER 2. VECTORS IN RN
Now pull apart the vector on the left using vector addition and scalar multiplication:
−(7/25) − (1/25)s −7/25 −(1/25)
58/25 − (31/25)s = 58/25 + s −(31/25) , s ∈ R.
s 0 1
The expression on the right hand side of the above equation is the vector form of the solution to the linear
system. ♦
Example 2.2.5
Express the solutions to the linear systems of Examples 1.3.8 and 1.3.9 in vector form.
Solution. The solution to the linear system in Example 1.3.8 was (x1 , x2 , x3 ) = (7/2, 1, 5/2). The vector
form of the solution is,
x1 7/2
x2 = 1 .
x3 5/2
The solution to the linear system in Example 1.3.9 is
26 15 1 1 1 25
(x1 , x2 , x3 , x4 ) = − s+ t, − + s− t, s, t , s, t ∈ R.
11 11 11 11 11 11
To get the vector form, make x1 , x2 , x3 , and x4 components of a vector and split the vector apart into the
fixed part of the solution and one vector for each free variable. Doing so yields the following.
x1 26/11 − (15/11)s + (1/11)t 26/11 −15/11 1/11
x2 −1/11 + (1/11)s − (25/11)t −1/11 1/11 −25/11
= = + s + t , s, t ∈ R.
x3 s 0 1 0
x4 t 0 0 1
The expression on the far right is the vector form of the solution. ♦
In general, if the solution to a linear system has m free variables, then there are m + 1 vectors in the vector
form of the solution. There is one for each free variable and one for the fixed part. If the vector form only
has free variables then there is still m + 1 vectors in the vector form of the solution, but we usually only
write m of them down because the fixed part is the zero vector. If you like, you can write the zero vector in
the vector form but it isn’t necessary.
2.2. VECTORS IN EUCLIDEAN SPACE 67
Example 2.2.6
Suppose the solution to a linear system is (x1 , x2 , x3 , x4 ) = (2s + t, 3s, t, s), s, t ∈ R. The vector
form of the solution can be written as
x1 0 2 1
x2 0 3 0
= + s + t , s, t ∈ R,
x3 0 0 1
x4 0 1 0
or as
x1 2 1
x2 3 0
= s + t , s, t ∈ R.
x3 0 1
x4 1 0
Either way is acceptable, though most references opt for the latter.
Example 2.2.7
Solution. There are three free variables so the vector form of the solution has 4 vectors. The vector form
of the solution is
√
x1 2 −14 2/3 −13
x2 3 0 0 −π
x3
62
−1/11
0
−10
x4 0 1 0 0
= +r + s + t , r, s, t ∈ R. ♦
−8
x5
82
0
0
x6
0
19
0
0
x7 0 0 1 0
x8 0 0 0 1
68 CHAPTER 2. VECTORS IN RN
where c1 , c2 , . . . , ck ∈ R are scalars. c1 , c2 , . . . , ck are called the coefficients of the linear combination.
Example 2.3.1
Let ~v1 , ~v2 ∈ Rn be any two vectors. The following are examples of linear combinations of ~v1 and ~v2 :
1) ~v1 + ~v2 ,
2) ~v1 − ~v2 ,
345
3) π~v1 − ~v2 ,
23
4) 0~v1 + 0~v2 = ~0.
Note
Notice that the vector ~b above is in Rn . This implies that any linear combination of vectors in Rn is
also a vector in Rn .
Let ~v1 , ~v2 , . . . , ~vk , ~b ∈ Rn be vectors and let x1 , x2 , . . . , xk be one-dimensional real variables. A vector
equation is an equation of the form
A solution to a vector equation is an k-tuple of real numbers (c1 , c2 , . . . , ck ) such that when we make
the variable substitutions x1 = c1 , x2 = c2 , . . . , xk = ck , then the equation
is true.
2.3. LINEAR COMBINATIONS AND VECTOR EQUATIONS 69
The difference between a linear combination of vectors and a vector equation is that a vector equation
involves variables and a linear combination is a fixed vector in Rn . Moreover, it follows immediately from
the definitions of linear combinations and solutions of vector equations that the vector equation
Every vector equation represents a linear system and, conversely, every linear system can be expressed using
a vector equation. Consider the linear system from Example 1.1.1,
x1 + x2 = 230
350x1 + 600x2 = 100, 000
Since x1 and x2 are one-dimensional real variables they function like scalars. This allows us to split the
vector on the left apart using vector addition and scalar multiplication:
" # " # " # " # " #
x1 + x2 x1 x2 1 1
= + = x1 + x2 .
350x1 + 600x2 350x1 600x2 350 600
Example 2.3.2
Solution. Combining the vectors on the left into one vector gives
−x1 + 10x2 −2
2x1 + 7x2 1
= .
0x1 − x2
2
x1 − x2 1
70 CHAPTER 2. VECTORS IN RN
Since the vectors are equal, the components of each must be equal. Writing out each equality yields the
following linear system.
−x1 + 10x2 = −2
2x1 + 7x2 = 1
−x2 = 2
x1 − x2 = 1 ♦
Example 2.3.3
2x1 + 3x2 − x3 = 6
−x1 + x2 = 0
4x1 + x2 = 5
Solve the linear system and show that the solution is also a solution to the corresponding vector
equation.
Solution. Following the above, the linear system can be expressed as the following vector equation,
2 3 −1 6
x1 −1 + x2 1 + x3 0 = 0 . (2.1)
4 1 0 5
We use Solving a Linear System Using a Matrix to solve the linear system. From now on, I will skip many
of the steps and leave it to the reader to fill in the blanks. The augmented matrix for the linear system is
2 3 −1 6
A = −1 1 0 0 .
4 1 0 5
The RREF of A is
1 0 0 1
A∼ 0 1 0 1
0 0 1 −1
The solution to the linear system is (x1 , x2 , x3 ) = (1, 1, −1). This solution is exactly a solution to the vector
equation in Equation (2.1) because, when we make the corresponding substitutions, we get,
2 3 −1 (1)2 + (1)3 + (−1)(−1) 6
1 −1 + 1 1 + (−1) 0 = (1)(−1) + (1)1 + (−1)(0) = 0 . ♦
Let ~v1 , ~v2 . . . , ~vk ∈ Rn . The n × k matrix whose columns are the vectors ~v1 , ~v2 . . . , ~vk in that order is
denoted as [ ~v1 ~v2 . . . ~vk ] . This is shorthand notation to make things slightly more compact. For example,
2.3. LINEAR COMBINATIONS AND VECTOR EQUATIONS 71
1 2
if ~v1 = 2 and ~v2 = 1 , then
1 4
1 2
[ ~v1 ~v2 ] = 2 1 .
1 4
Example 2.3.3 shows the solution to a linear system is also a solution to the vector equation representing
the linear system. The converse to this is also true: The solution to a vector equation is a solution to the
linear system it represents. The next theorem summarizes this.
Theorem 2.3.1
The solution set to this vector equation is the same as the solution set of the linear system whose
augmented matrix is, h i
A = ~v1 ~v2 . . . ~vk | ~b .
In particular, ~b can be written as a linear combination of ~v1 , ~v2 , . . . , ~vk if and only if the matrix A
corresponds to a consistent linear system.
Proof. Write
v1i b1
.
~b = .. ,
~vi = .
. , i = 1, 2, . . . , k, .
vni bn
so that,
v11 v12 ... v1k b1
v21 v22 ... v2k b2
h i
~v1 ~v2 . . . ~vk | ~b
A= = .. .. .. .. .. .
.
. . . .
vn1 vn2 ... vnk bn
(c1 , c2 , . . . , ck ) is a solution to the vector equation (2.2) if and only if
Equating components in this linear combination yields the following series of equations,
which shows that (c1 , c2 , . . . , ck ) is a solution to the linear system whose augmented matrix is A. Reversing
all of these steps shows that any solution to the linear system that has augmented matrix A is also a solution
to the vector equation. Therefore, the solution sets are equal. The last statement of the theorem is immediate
from the definitions of linear combinations and consistent linear systems.
2.4. MATRIX EQUATIONS 73
In this section, we introduce another equivalent way of representing linear systems. This one uses matrices.
First, we need to define how we multiply a matrix and a vector.
Note
We only define matrix vector multiplication if ~v has as many components as A has columns. Therefore,
if A is n × k, then A~v is defined only if ~v ∈ Rk .
Example 2.4.1
" # 6
1 2 3
Let A = and ~v = 1 . Then,
2 10 3
2
" # 6 " # " # " # " # " #
1 2 3 1 2 3 6+2+6 14
A~v = 1 =6 +1 +2 = = .
2 10 3 2 10 3 12 + 10 + 6 28
2
74 CHAPTER 2. VECTORS IN RN
Example 2.4.2
1 2 " #
0
Let A = 2 1 and ~v = . Then,
1
1 8
1 2 " # 1 2 2
0
A~v = 2 1 = 0 2 + 1 1 = 1 .
1
1 8 1 8 8
Example 2.4.3
" # 1
0 1
Let A = and ~v = 1 . Since ~v ∈ R3 , but A only has 2 columns, the matrix vector
1 1
1
multiplication A~v is not defined.
2. A(r~v ) = r(A~v ).
Example 2.4.4
Calculate,
i) A(~u + ~v )
ii) A(5~v )
Solution.
2.4. MATRIX EQUATIONS 75
u1 v1 u1 + v1
u2 v2 u2 + v2
Proof. Write ~u = .. and ~v = .. . Then, ~u + ~v =
.. . Let ~a1 , ~a2 , . . . , ~ak denote the
. . .
uk vk uk + vk
columns of A, so A = [ ~a1 ~a2 . . . ~ak ] . Therefore,
u1 + v 1
u2 + v 2
A(~u + ~v ) = [ ~a1 ~a2 . . . ~ak ]
..
.
uk + vk
= (u1 + v1 )~a1 + (u2 + v2 )~a2 + . . . + (uk + vk )~ak by definition of matrix vector multiplication
= (u1~a1 + u2~a2 + . . . + uk~ak ) + (v1~a1 + v2~a2 + . . . + vk~ak ) by parts 1 and 4 of Properties of Vectors
Exercise
Prove part 2 of Matrix Vector Multiplication Properties.
x1 + x2 = 230
350x1 + 600x2 = 100, 000
As a vector equation, " # " # " #
1 1 230
x +y = .
350 600 100, 000
" # " #
1 1 x1
Let A = . Define the following vector of one dimensional variables ~x = and let
350 600 x2
" #
~b = 230
. Then, by definition of matrix vector multiplication, we can write the vector equation that
100, 000
represents this linear system as an equation involving matrix vector multiplication.
" #" # " #
1 1 x1 230
A~x = = = ~b.
350 600 x2 100, 000
This is an example of a matrix equation and it gives us another way of representing linear systems.
A~x = ~b.
v1
v2
A solution to a matrix equation is a vector ~v = ∈ Rk such that when we make the variable
..
.
vk
substitution ~x = ~v , the equation A~v = ~b is true.
Matrix equations are useful because they provide a very compact way of writing down linear systems. More-
over, every matrix equation can be written as a linear system.
Example 2.4.5
x1 + 4x2 − ex3 = 1
√
(2/3)x1 + 6x3 = 2
−x2 + πx3 = 1
2.4. MATRIX EQUATIONS 77
0 −1 π 1
Now use the definition of matrix vector multiplication to translate this into a matrix equation.
1 4 −e x1 1
√
2/3 0 6 x2 = 2 . ♦
0 −1 π x3 1
| {z } | {z } | {z }
A ~
x ~b
Example 2.4.6
2 1 −6 50
x1
0 1 2
~b = 2
A= , , ~x = x2 .
0 0 1 50
x3
2 1 1 −1
Solution. We have
2 1 −6 50
0 1 2 x1 2
A~x = ~b ⇐⇒ x2 = .
0 0 1 50
x3
2 1 1 −1
Write this as a vector equation using the matrix vector product.
2 1 −6 50
0 1 2 2
x1 + x2 + x3 = .
0 0 1 50
2 1 1 −1
2x1 + x2 − 6x3 = 50
x2 + 2x3 = 2
x3 = 50
2x1 + x2 + x3 = −1 ♦
Note
Given a linear system written as a matrix equation A~x = ~b, the matrix A is always the coefficient
matrix for the corresponding
78 CHAPTER 2. VECTORS IN RN
Theorem 2.3.1 says the solution set of a linear system and the solution set of its corresponding vector equa-
tion are the same. It should come as no surprise that the solution set to the matrix equation representing a
linear system also coincides with the solution set to the linear system and conversely. The following theorem
ties all of this together.
Theorem 2.4.2
x1
x2
~
n n
Let A be an n × k matrix with columns ~a1 , ~a2 , . . . , ~ak ∈ R and let b ∈ R be fixed. Let ~x =
.. .
.
xk
Then, the solution set of the matrix equation A~x = ~b is the same as the solution set of the vector
equation
x1~a1 + x2~a2 + . . . + xk~ak = ~b
which is the same as the solution set of the linear system whose augmented matrix is
h i
~a1 ~a2 . . . ~ak | ~b .
Proof. We need only verify that the solution set of A~x = ~b is the same as the solution set of
x1~a1 + . . . + xk~ak = ~b
and the rest follows from Theorem 2.3.1. This follows immediately from the definition of matrix vector
multiplication.
Exercise
If the proof for Theorem 2.4.2 is not obvious, give a proof of it in order to convince yourself it is true.
2. As a vector equation;
3. As a matrix equation.
All three of these representations have the exact same solution set and, the beautiful part is that, regardless
of what representation we pick, we always find these solutions using 1. This also allows us to answer three
seemingly unrelated questions with the exact same method!
We summarize the equivalence of the solution sets of these three types of equations as follows.
2.4. MATRIX EQUATIONS 79
Example 2.5.1
3 −3 3
Can ~b be expressed as a linear combination of ~v1 and ~v2 ? If it can, write one down.
Solution. By the Equivalence of Solutions, asking if ~b his a linear icombination of ~v1 and ~v2 is the same as
asking if the linear system whose augmented matrix is ~v1 ~v2 | ~b has a solution. We know how to solve
this! Use Solving a Linear System Using a Matrix!
3 −3 3
0 0 −16
The rightmost column of this echelon form is a pivot column. Therefore, the corresponding linear system
has no solution by The Solutions Theorem. This shows ~b can not be written as a linear combination of ~v1
and ~v2 . ♦
Note
Example 2.5.1 could be reworded as follows:
−1 4 2 −1 4
“Let ~v1 = 4 , ~v2 = 9 , ~b = 1 . Let A = [ ~v1 ~v2 ] = 4 9 . Does A~x = ~b have
3 −3 3 3 −3
2.5. SPANNING SETS 81
a solution?”
Example 2.5.2
9
Redo Example 2.5.1 with ~b = 14 .
−9
Solution. For this example, the augmented matrix we work with is,
−1 4 9
A= 4 9 14 .
3 −3 −9
0 0 0
The rightmost column of the RREF of A is not a pivot column so the corresponding linear system is
consistent. Therefore, ~b is a linear combination of ~v1 and ~v2 . We now determine how to write ~b as a linear
combination of ~v1 and ~v2 . This means we must find scalars c1 , c2 ∈ R such that
and we know the solution to this vector equation is the same as the solution to the corresponding linear
system. Reading from the RREF of A, we see that the solution to this linear system is (x1 , x2 ) = (−1, 2).
Therefore, we write ~b as a linear combination of ~v1 and ~v2 as follows.
Moreover, all of the columns to the left of the bar of the RREF are pivot columns. Therefore, there is only
one solution to the corresponding linear system. This means that the above is the only way we can write ~b
as a linear combination of ~v1 and ~v2 . ♦
Determining which vectors can be written as linear combinations of others and how to do it is one of the
main themes of this course. This motivates the following definition.
82 CHAPTER 2. VECTORS IN RN
Let ~v1 , ~v2 , . . . , ~vk ∈ Rn . The set of all linear combinations of the vectors ~v1 , ~v2 , . . . , ~vk is called the
span of ~v1 , ~v2 , . . . , ~vk and is denoted by
In set notation,
then we say the set {~v1 , ~v2 , . . . , ~vk } spans Rn . This means every vector in Rn is a linear combination
of the vectors ~v1 , ~v2 , . . . , ~vk .
Warning!
It is very important to remember that the span of a set of vectors is itself a set and not a single vector.
This is a common mistake.
This probably looks complicated. Remember: these definitions are only fancy language that allows us to
ask the same question in a different way, and nothing more. For example, we could have reworded Example
2.5.1 as follows.
2 −1 4
“Determine if the vector ~b = 1 is in the span of the two vectors ~v1 = 4 and ~v2 = 9 .”
3 3 −3
Using more compact notation, we could also reword Example 2.5.1 as the following.
−1 4 2
“Let ~v1 = 4 , ~v2 = 9 , and ~b = 1 . Is ~b ∈ span {~v1 , ~v2 }?”
3 −3 3
Perhaps this seems like overkill but we use all of this language throughout the course. Hence, it is important
to get familiar with all of these different phrasings and know exactly what they mean.
2.5. SPANNING SETS 83
In this section, we determine what a spanning set looks like in R2 . Of course, we can’t do this in Rn for
n ≥ 4 because we can’t visualize these Euclidean spaces.
n o n o
First consider span ~0 . If ~v ∈ span ~0 , then ~v = c~0 for some c ∈ R, from which it follows that ~v = ~0 itself.
n o n o n o
Therefore, span ~0 is always equal to the singleton set ~0 . That is, span ~0 is a single vector: the zero
vector, which is simply the origin.
Now suppose ~v ∈ R2 and ~v 6= ~0. Let ~u ∈ span {~v }. Then, by definition, there exists c ∈ R such that ~u = c~v .
Conversely, every scalar multiple of ~v is contained in span {~v }. Therefore, span {~v } is the set of all scalar
multiples of ~v . The following graph shows various scalar multiples of a single vector in R2 . Recall that all
vectors have the origin as their initial point.
x2
2~v
~v
(1/2)~v
x1
−2~v
If we could continue indefinitely, we’d see that every scalar multiple of ~v lies on the line through the origin
that is parallel to ~v . Since each of these vectors can be identified with a point in R2 , we interpret span {~v }
as the line through the origin parallel to ~v .
It should be clear from the above that there are always vectors in R2 that are not in span {~v }. That is, there
is no vector ~v ∈ R2 such that span {~v } = R2 . In other words, a single vector ~v never spans R2 .
Now suppose ~u, ~v ∈ R2 and ~u is not a scalar multiple of ~v . Then, given any other vector w ~ in R2 , we can
multiply ~u and ~v by appropriate scalars, say c1 , c2 ∈ R, such that w
~ is the fourth vertex of the parallelogram
whose three other vertices are the origin, c1 ~u and c2~v . Thus, by the parallelogram law, w~ = c1 ~u + c2~v . This
2 2
means span {~u, ~v } = R so that ~u and ~v span R . We’ll prove this formally in the next section. In the
meantime, an example of this is shown below.
84 CHAPTER 2. VECTORS IN RN
x2
c1 ~u + c2~v
c1~v
~u
~v
c2~v x1
In summary, spans of vector in R2 are either a single point, which is the span of the zero vector, a line
through the origin, or all of R2 itself.
In R3 , the situation is similar. If ~v ∈ R3 is non-zero, then span {~v } is interpreted as a line in R3 through the
origin that is parallel to ~v . If ~u, ~v ∈ R3 are not scalar multiples of one another, then span {~u, ~v } is not equal
to R3 , but it is interpreted as a plane through the origin that contains both of ~u and ~v . This shouldn’t be
completely unexpected because the span of two vectors in R2 that are not scalar multiples of one another is
all of R2 , and we can interpret R2 as a plane in R3 . Finally, if ~u, ~v , and w~ are vectors in R3 , none are scalar
multiples of one another, and the 3 vectors don’t lie the same plane, then span {~u, ~v , w} ~ = R3 . All of these
assertions will be proved in the next section.
Example 2.5.3
2 4 4
Let A = 3 0 3 . Does the equation A~x = ~b have a solution for any ~b ∈ R3 ?
1 2 2
Note
This question could have been asked in either of the following two ways.
2 4 4
“Let ~v1 = 3 , ~v2 = 0 , ~v3 = 3 . Does {~v1 , ~v2 , ~v3 } span R3 ?”
1 2 2
2.5. SPANNING SETS 85
2 4 4
“Let A = 3 0 3 . Do the columns of A span R3 ?”
1 2 2
b1
Solution. Let ~b = b2 be an arbitrary vector in R3 . The solution to A~x = ~b is the same as the solution
b3
to the linear system whose augmented matrix is
2 4 4 b1
B= 3 0 3 b2 .
1 2 2 b3
R3 ⇒ R3 − 2R1 0 0 0 b1 − 2b3
By The Solutions Theorem, A~x = ~b is consistent if and only if the rightmost column of this matrix is not a
pivot column. This happens exactly when b1 − 2b3 = 0, so that b1 = 2b3 , then the rightmost column is not
a pivot column. However, vectors in R3 do not necessarily satisfy this
condition. There are many vectors
1
in R3 whose first component is not twice the third, for example 1 . For all such vectors, the rightmost
1
column is a pivot column and, consequently, the matrix equation doesn’t have a solution. Therefore, the
answer is no, A~x = ~b does not have a solution for all ~b ∈ R3 . ♦
Let’s look a little closer at the echelon form matrix in Example 2.5.3:
1 2 2 b3
0 −6 −3 b2 − 3b3 .
0 0 0 b1 − 2b3
The only thing preventing A~x = ~b from being solvable for any value of ~b is the bottom row of zeroes to the
left of the bar. If the third row contains a pivot to the left of the bar, then we can find a solution to A~x = ~b
for any value of ~b ∈ R3 . Moreover, the converse also holds. That is, if A~x = ~b has a solution, then A has a
pivot in each row. This is summarized in the following important theorem (next page).
86 CHAPTER 2. VECTORS IN RN
Let ~v1 , ~v2 , . . . , ~vk ∈ Rn and let A = [ ~v1 ~v2 . . . ~vk ] be the n × k matrix whose columns are
~v1 , ~v2 , . . . , ~vk . The following are equivalent (this means that if one of the statements is true, then
they are all true, and if one of the equations is false then they are all false):
2. The columns of A span Rn . This means every vector ~b ∈ Rn is a linear combination of the
columns of A. Equivalently, {~v1 , ~v2 , . . . , ~vk } spans Rn .
This is a most excellent result. It gives an easy test to see if a set of vectors in Rn spans Rn . Simply make
the vectors the columns of a matrix, row reduce the matrix to echelon form, and count pivots! We do some
examples before we give a proof.
Example 2.5.4
−2 0 1 −2
1 4
3 1
ii) S2 = , , ,
−1
0 0 0
1 1 1 1
−2
1 −2 −3
1 0 0 1
iii) S3 = , , ,
0 0
2 2
1 1 1 3
Solution. All three questions can be answered by forming the matrix whose columns are the vectors in each
set, row reducing to echlon form, and counting pivot rows.
i) We have
−2 −2 3 −2 −2 3
1 −1 1 0
−2 5/2
A1 = ∼ .
0 1 0 0 0 5/4
1 1 3 0 0 0
The bottom row contains no pivots. Therefore, by The Span Theorem, the vectors in S1 do not span
2.5. SPANNING SETS 87
R4 . ♦
ii) We have
−2 0 1 −2 −2 0 1 −2
1 4 3 1 0 4 7/2 0
A2 = ∼ .
0 0 0 −1 0 0 5/8 0
1 1 1 1 0 0 0 −1
There is a pivot in every row. Therefore, by The Span Theorem, the vectors in S2 do span R4 . ♦
iii) We have
−2 1 −2 −3 −2 1 −2 −3
1 0 0 0 1/2 −1 −1/2
1
A3 = ∼ .
0 0 2 2 0 0 2 2
1 1 1 3 0 0 0 0
The bottom row does not have a pivot. Therefore, by The Span Theorem, the vectors in S3 do not
span R4 . ♦
Proof. To prove a chain of equivalences like this, first prove that 1 implies 2, then that 2 implies 3, and
finally that 3 implies 1. The Equivalence of Solutions already shows 1 implies 2. Therefore, we only need to
prove that 2 implies 3 and that 3 implies 1.
2 =⇒ 3: We prove this using contraposition. This means we assume the negation of 3 and show that it
implies the negation of 2.
To this end, suppose B does not have a pivot in every row. We must show there is a vector in Rn that is
not a linear combination of the columns of A. Since B doesn’t have a pivot in every row, the bottom row of
0
0
. n
B contains only zeroes by definition of echelon form. Augment B with the vector ~u = .
. ∈ R where
0
1
0
there are n − 1 zeroes above the 1. Denote this matrix by B = [ B | ~u ] .
Since A ∼ B, there is a sequence of elementary row operations that transform A into B. If we apply the
opposite of each of these row operations in the reverse order to B, then we transform B into A. Applying
this procedure to B 0 , we get a matrix of the form A0 = [ A | w ~ ] where w ~ is some vector in Rn . Since B 0 is
an echelon form of A0 , and B 0 contains a row of the form [ 0 . . . 0 | 1 ], The Solutions Theorem implies the
linear system with augmented matrix [ A | w ~ ] is inconsistent. By Equivalence of Solutions, w ~ is not a linear
combination of ~a1 , ~a2 , . . . , ~ak .
88 CHAPTER 2. VECTORS IN RN
3 =⇒ 1: Suppose B has a pivot in each row. Pick any ~b ∈ Rn and form the augmented matrix A0 = [ A | ~b ].
Apply the sequence of elementary row operations required to turn A into B to get a matrix of the form
B 0 = [ B | ~c ] where ~c ∈ Rn . This matrix is an echelon form of A0 and, since B has a pivot in every row,
B 0 contains no row of the form [ 0 . . . 0 | m ] where m 6= 0. By The Solutions Theorem, the linear system
whose augmented matrix is A0 is consistent, and so the Equivalence of Solutions implies the matrix equation
A~x = ~b has a solution. Since ~b ∈ Rn was arbitrary, we have shown that A~x = ~b has a solution for every
~b ∈ Rn .
Corollary 2.5.1
Let ~v1 , ~v2 , . . . , ~vk ∈ Rn be vectors. If k < n, then span {~v1 , ~v2 , . . . , ~vk } =
6 Rn ; that is, if there are fewer
vectors in a set than components in each vector, that set of vectors does not span Rn .
Example 2.5.5
Solution. We have a set of two vectors each of which have three comopnents. Since there are fewer vectors
than components in each vector, Corollary 2.5.1 implies that S does not span R3 . ♦
Proof. Form the n × k matrix A = [ ~v1 ~v2 . . . ~vk ]. Let B be an echelon form of A. If k < n, then B has
more rows than columns. Since B is in echelon form, it can have at most one pivot in each column. As there
are more rows than columns, B can not have a pivot i every row. Therefore, by The Span Theorem, the
columns of A, {~v1 , ~v2 , . . . , ~vk }, do not span Rn .
Unfortunately, if you have a set with at least as many vectors as their are components in each vector, there
is typically no way to tell whether or not that set spans Rn just by looking at the vectors (unless you are
really really good at mental calculation). There is one exception to this rule. Suppose you have some vectors
~v1 , ~v2 , . . . , ~vk ∈ Rn and assume they all have a zero in the same component. Then, the set doesn’t span Rn .
This is because any vector that has a non-zero entry in that component will never be a linear combination
of ~v1 , ~v2 , . . . , ~vk .
2.5. SPANNING SETS 89
Example 2.5.6
Solution. S does not span R3 because every vector in S has a zero in the second component. Therefore,
any linear combination of these vectors has a zero in the second component. This means all vectors in R3
with a non-zero second component can not be written as a linear combination of the vectors in S, hence is
0
not in the span of S. An example of such a vector is 1 . Therefore, S does not span R3 . ♦
0
90 CHAPTER 2. VECTORS IN RN
Let A be an n × k matrix. The homogeneous equation is the matrix equation A~x = ~0.
The homogeneous equation is always consistent. Certainly, if A = [ ~a1 ~a2 . . . ~an ] and we substitute ~x = ~0,
then
This shows ~0 is always a solution to the homogeneous equation, which means that the homogeneous equation
is always consistent. Therefore, there are only two possibilities for solution sets of the homogeneous equation:
1. The homogeneous equation has exactly one solution ~x = ~0. This solution is called the trivial solution.
2. The homogeneous equation has infinitely many solutions. A non-zero solution is referred to as a
non-trivial solution. Note that in this case, the solution to the matrix equation will contain a free
variable.
h i
To solve A~x = ~b, we form the augmented matrix A | ~b and follow Solving a Linear System Using a Ma-
h i
trix. For the homogeneous equation, ~b = ~0, so we form the augmented matrix A | ~0 . The elementary row
operations never change the last column because it is full of zeroes. Therefore, when we solve homogeneous
equations, we don’t usually augment A with ~0 and instead, we apply Solving a Linear System Using a Matrix
to the coefficient matrix A itself. If you really want to, you can still augment with the column of zeroes, but
going forward I will not.
Example 2.6.1
1 2 2
0 0 0
Since column 3 is a non-pivot column, x3 is a free variable by The Solutions Theorem. Thus, the linear
system has infinitely many solutions and, hence, a non-trivial solution. To get the vector form, we look at
the RREF of the matrix. The RREF represents the following linear system.
x1 + x3 = 0
x2 + (1/2)x3 = 0.
Rearranging, we get
x1 = −x3
x2 = −(1/2)x3 .
x3 s 1
Example 2.6.2
2 0 1 1
−1 2 3 −1
Let A = . Determine if A~x = ~0 has a non-trivial solution. Write the solution to
0 10 1 5
1 1 2 1
~
A~x = 0 in vector form.
Every column in the RREF of A is a pivot column. Therefore, A~x = ~0 has exactly one solution, the trivial
solution. Thhis shows A~x = ~0 has no non-trivial solution. The vector form of the solution is
92 CHAPTER 2. VECTORS IN RN
x1 0
x2 0
~
= = 0. ♦
x3 0
x4 0
Let A be n × k and ~b ∈ Rn . Suppose A~x = ~b is consistent. The number of solutions to A~x = ~b is related to
the number of solutions to A~x = ~0. In fact, all you need is one particular solution to A~x = ~b and the whole
solution to A~x = ~0 in order to get every solution to A~x = ~b.
Theorem 2.6.1
Let A be an n × k matrix and let ~b ∈ Rn be a fixed vector. Suppose A~x = ~b is consistent. Let ~vp be
a fixed solution to A~x = ~b. Then, every solution of A~x = ~b can be written in the form w
~ = ~vp + ~vh
where ~vh is a solution to the homogeneous equation A~x = ~0.
Proof. Since A~x = ~b is consistent, it has at least one solution. Denote this solution by ~vp . Let w
~ be any
~ ~ − ~vp . Then,
solution to A~x = b (this could be ~vp itself). Define ~vh = w
~ − A~vp = ~b − ~b = ~0.
~ − ~vp ) = Aw
A~vh = A(w
Side Note
A similar theorem holds for solving first order partial differential equations, and the proof in that
setting is the same as it is here! Cool!
The solutions ~vp and ~vh of Theorem 2.6.1 are called particular and homogeneous solutions respectively.
This idea of writing all solutions to a system as a sum of a fixed particular solution and a solution the
corresponding homogeneous system appears in lots of areas of mathematics. It is a useful technique because,
in general, solving a homogeneous system tends to be easier than solving a non-homogeneous one.
Let ~v1 , ~v2 , . . . , ~vk ∈ Rn and define the n × k matrix A = [ ~v1 ~v2 . . . ~vk ]. The set of vectors
{~v1 , ~v2 , . . . , ~vk } is called linearly independent if the homogeneous equation A~x = ~0 has only the
trivial solution. If the homogeneous equation has infinitely many solutions, then {~v1 , ~v2 , . . . , ~vk } is
called linearly dependent.
Solutions to matrix equations and solutions to the corresponding vector equations are the same by Equiva-
lence of Solutions. Therefore, an equivalent characterization for linear independence is if the vector equation,
has only the solution x1 = x2 = . . . = xk = 0, and {~v1 , ~v2 , . . . , ~vk } is linearly dependent otherwise. Most
references define linear independence in terms of vector equations. This is because linear independence gen-
eralizes to a lot of different settings beyond matrices.
Example 2.6.3
3 10 8
The set of vectors ~v1 = 1 , ~v2 = 0 , ~v3 = 1 ⊆ R3 is linearly dependent because
1 2 2
3 10 8 −1 3 10 8 −3 − 5 + 8
1
1 0 1 −1/2 = − 1 − 0 + 1 = −~v1 −(1/2)~v2 +v~3 = −1 + 1 = ~0.
2
1 2 2 1 1 2 2 −1 − 1 + 2
In the previous example, we wrote the zero vector as a non-trivial linear combination of non-zero vectors.
This is an example of a linear dependence relationship.
Let {~v1 , ~v2 , . . . , ~vk } ⊆ Rn be a set of non-zero vectors. A linear dependence relationship for
{~v1 , ~v2 , . . . , ~vk } is a linear combination of the form
Linear dependence relationships can be found by calculating a non-trivial solution to the homogeneous equa-
tion.
94 CHAPTER 2. VECTORS IN RN
Example 2.6.4
If it is linearly dependent, write down a linear dependence relationship for the vectors.
0 3 5
Solution. Let A = 4 0 2 . The RREF of A is
1 1 1
1 0 0
A∼ 0 1 0 .
0 0 1
This shows x1 = x2 = x3 = 0 is the only solution to the homogeneous equation A~x = ~0. Therefore, the set
{~v1 , ~v2 , ~v3 } is linearly independent. ♦
Example 2.6.5
If it is linearly dependent, write down a linear dependence relationship for the vectors.
0 3 −3
Solution. Form the matrix A = 4 0 4 . We must determine the number of solutions to the
1 1 0
~
homogeneous equation A~x = 0. The RREF of A is
1 0 1
A∼ 0 1 −1 .
0 0 0
Since there is a non-pivot column, the The Solutions Theorem implies the homogeneous equation A~x = ~0
has infinitely many solutions. Therefore, the set of vectors {~v1 , ~v2 , ~v3 } is linearly dependent.
To find a linear dependence relationship, we need to find a non-trivial solution to A~x = ~0. Start by writing
out the vector form of the solution to A~x = ~0. The linear system corresponding to the RREF is
x1 + x3 = 0 ⇒ x1 = −x3
x2 − x3 = 0 ⇒ x2 = x3
2.6. LINEAR INDEPENDENCE 95
x1 −1
x2 = s 1 , s ∈ R.
x3 1
Picking any non-zero value of s yields a non-trivial solution to the homogeneous equation. Taking s = 1 say,
we have
−1
A 1 = ~0 =⇒ −~v1 + ~v2 + ~v3 = ~0
The equation,
is a linear dependence relationship for {~v1 , ~v2 , ~v3 } . We could pick any non-zero value of s we want to get a
linear dependence relationship. For example, if s = 2,
Note
There exist infinitely many linear dependence relationships for a set of linearly dependent vectors.
The Span Theorem provides an easy way of checking if the columns of an n × k matrix A span Rn : count
pivot rows in the echelon form of a matrix. There is a similar way to check if the columns of a matrix are
linearly independent.
Let ~v1 , ~v2 , . . . , ~vk ∈ Rn and let A = [ ~v1 ~v2 . . . ~vk ] be the n × k matrix whose columns are
~v1 , ~v2 , . . . , ~vk . The following are equivalent.
2. The columns of A are linearly independent; that is, {~v1 , ~v2 , . . . , ~vk } is a linearly dependent set.
Example 2.6.6
−1 7 −10
2 −2 −2
Solution. Form the matrix A = . An echelon form of A is,
0 1 −10
1 0 1
−1 7 −10
0 12 −22
A∼ .
0 0 −49/6
0 0 0
Each column in this matrix contains a pivot. Therefore, the vectors {~v1 , ~v2 , ~v3 } are linearly independent by
The Linear Independence Theorem. ♦
2 =⇒ 3: Suppose the columns of A are linearly independent. By definition, the matrix equation A~x = ~0
has only the trivial solution. Therefore, the solution to the homogeneous equation does not contain a free
variable. Thus, The Solutions Theorem implies every echelon form of A has a pivot in every column.
h i
3 =⇒ 1: Suppose B has a pivot in every column. Form the augmented matrix A0 = A | ~0 . An echelon
h i
form of A0 is B 0 = B | ~0 . Since B has a pivot in every column, every column to the left of the bar of
B 0 is a pivot column. Therefore, the matrix equation A~x = ~0 has only the trivial solution by The Solutions
Theorem.
Corollary 2.6.1
Since ~v 6= ~0, at least one of v1 , v2 , . . . , vn is not zero. Thus, the only way all the components of the vector
on the left can be zero is if x1 = 0. This shows the vector equation x1~v = ~0 has only the trivial solution so
{~v } is linearly independent. On n theoother hand, if ~v = ~0, it is clear that the vector equation x1~v = ~0 has
infinitely many solutions so that ~0 is linear dependent.
Corollary 2.6.2
A set of two non-zero vectors {~v1 , ~v2 } ⊆ Rn is linearly dependent if and only if ~v1 is a scalar multiple
of ~v2 . If one of ~v1 or ~v2 are ~0 then the set is linearly dependent.
Example 2.6.7
Proof. Define the matrix A = [ ~v1 ~v2 ] and let B be an echelon form of A. If one of ~v1 or ~v2 is the zero
vector, then it is clear that the corresponding column of B will also be zero, and hence B will not have a
pivot in every column. Therefore, if one of the vectors is the zero vector, then The Linear Independence
Theorem implies that ~v1 and ~v2 are linearly dependent.
Now assume both ~v1 and ~v2 are non-zero and that ~v1 and ~v2 are linearly dependent. Then, the " matrix
# equation
" #
w1 0
A~x = ~0 has a non-trivial solution. Since A is n×2, the solution is a vector in R2 , call it w
~= 6= .
w2 0
Then,
~ = ~0.
Aw
Rewriting this as a vector equation, we get
w1~v1 + w2~v2 = ~0
~ 6= ~0, it must be the case that w2 6= 0. Then, the above equation only holds if ~v2 = ~0 which is not
Since w
possible. Therefore, w1 6= 0, and similarly w2 6= 0. Hence,
w2
~v1 = − ~v2
w1
Conversely, suppose ~v1 is a scalar multiple of ~v2 . Say ~v1 = c~v2 for some non-zero c ∈ R. Rearranging this
vector equation yields
~v1 − c~v2 = ~0
" #
1
which shows that ~u = is a non-trivial solution to A~x = ~0. Therefore, ~v1 and ~v2 are linearly depen-
−c
dent.
In view of the previous two results, it is easy to tell when sets of 2 or fewer vectors are linearly independent.
For sets of three or more vectors, we generally have to use row reduction and count pivot columns to de-
termine linear independence. However, there are some special cases where we can quickly determine linear
dependence.
Corollary 2.6.3
Let {~v1 , ~v2 , . . . , ~vk } ⊆ Rn . If one of the ~vi ’s is ~0, then the set is linearly dependent.
Exercise
Prove Corollary 2.6.3.
Corollary 2.6.4
Let {~v1 , ~v2 , . . . , ~vk } ⊆ Rn . If k > n, then {~v1 , ~v2 , . . . , ~vk } is linearly dependent. That is, if you have
a set with more vectors than there are components in the vectors, then that set of vectors is linearly
dependent.
Example 2.6.8
Proof. Let A = [ ~v1 ~v2 . . . ~vk ] . Then A has more columns than rows. Since there can be at most one pivot
in each row, an echelon form of A can have at most k pivots. Therefore, it is impossible for A to have a pivot
in each column, and so the set of vectors {~v1 , ~v2 , . . . , ~vk } is linearly dependent by The Linear Independence
Theorem.
Warning!
If the set of vectors contains fewer vectors than there are components in each vector, this does not
mean the set is linearly independent. All you can conclude is that if there are more vectors than
there are components, then the set is linearly dependent.
Theorem 2.6.3
Let S = {~v1 , ~v2 , . . . , ~vk } ⊆ Rn be a set of vectors with k ≥ 2. Then, S is a linearly dependent set
if and only if at least one of the vectors in S is a linear combination of the others. In fact, if S is
linearly dependent and ~v1 6= ~0, then some ~vj with j > 1 is a linear combination of the preceding
vectors ~v1 , . . . , ~vj−1 . In other words, if the vectors in S are linearly dependent, then one of the vectors
in S can be written as a linear combination of all the vector preceding it.
First suppose ~vj is a linear combination of the other vectors for some j = 1, . . . , k. If ~vj = ~0, then Corollary
2.6.3 implies S is linearly dependent. If ~vj 6= 0, then there exist scalars c1 , c2 , . . . , cj−1 , cj+1 , . . . , ck , not all
zero, such that
c1~v1 + c2~v2 + . . . + cj−1~vj−1 + cj+1~vj+1 + . . . + ck~vk = ~vj .
Subtracting ~vj on both sides gives
Since at least one of the scalars in Equation (2.3) is non-zero, this implies
c1
.
..
cj−1
k
−1 ∈ R
~u =
cj+1
.
.
.
ck
is a non-trivial solution to the homogeneous equation A~x = ~0. Therefore, {~v1 , ~v2 , . . . , ~vk } is linearly depen-
dent.
100 CHAPTER 2. VECTORS IN RN
Conversely, suppose that S is linearly dependent. If ~v1 = ~0, then S is linearly dependent by Corollary 2.6.3.
Hence, suppose ~v1 6= ~0. Since S is linearly dependent, there exist scalars d1 , d2 , . . . , dk ∈ R, not all zero,
such that
d1~v1 + d2~v2 + . . . + dk~vk = ~0.
d1~v1 + d2~v2 + . . . + dj ~vj = ~0 =⇒ ~vj = −(d1 /dj )~v1 − (d2 /dj )~v2 − . . . − (dj−1 /dj )~vj−1
This theorem tells us that if we have a set of vectors {~v1 , ~v2 , . . . , ~vk } in Rn , and one of the vectors is a linear
combination of the others, then the set is necessarily linearly dependent and vice versa.
Warning!
Theorem 2.6.3 does not guarantee that all of the vectors in the set are linear combinations of the
others. It only guarantees that at least one is.
The following provides some examples of using everything we’ve seen in this section so far.
Example 2.6.9
Determine whether the following sets of vectors are linearly independent or linearly dependent.
(" # " #)
1 3
i) S1 = , ⊆ R2
2 1
2
1 0
ii) S2 = 3 , 0 , 0 ⊆ R3
1 0 0
3 2 1/2 5 0
2 2
π
12 1
iii) S3 = , , , , ⊆ R4
1 2 2 324/5 0
1/2 1 7 1 0
2
1 5
iv) S4 = 4 , 2 , 2 ⊆ R3
1 1/2 1
2.6. LINEAR INDEPENDENCE 101
Solution. S1 is a set of two vectors and they are not scalar multiples of one another. Thus, by Corollary
2.6.2, S1 is linearly independent. S2 contains the zero vector, so is linearly dependent by Corollary 2.6.3.
There are more vectors in S3 then there are components in each vector. Therefore, S3 is linearly dependent
by Corollary 2.6.4. The first two vectors in S4 are scalar multiples of one another, hence one is a linear
combination of the other two. Therefore, Theorem 2.6.3 implies S4 is linearly dependent. ♦
Example 2.6.10
Let {~v1 , ~v2 , ~v3 } ⊆ Rn be linearly dependent. Show that {~v1 , ~v2 , ~v3 , ~v4 } is also linearly dependent for
any ~v4 ∈ Rn .
Solution. Let A = [ ~v1 ~v2 ~v3 ~v4 ]. Since the set {~v1 , ~v2 , ~v3 } is linearly dependent, by definition, there exist
scalars c1 , c2 , c3 not all zero such that,
Now add 0 · ~v4 to both sides. Since 0 · ~v4 = ~0, this doesn’t change the right hand side so we get,
c1
c2
c1~v1 + c2~v2 + c3~v3 + 0 · ~v4 = ~0 =⇒ A
~
= 0.
c3
0
c1
c
2
This shows that ~x = is a non-trivial solution to the homogeneous equation A~x = ~0, which means
c3
0
{~v1 , ~v2 , ~v3 , ~v4 } is a linearly dependent set. ♦
102 CHAPTER 2. VECTORS IN RN
2.7.1 Transformations
We start with an example.
Example 2.7.1
v3
" # " # " # " #
1 3 5 v1 + 3v2 + 5v3
A~v = v1 + v2 + v3 = .
2 4 6 2v1 + 4v2 + 6v3
Some specific examples:
" # 1 " # " #
1 3 5 1·1+3·0+5·0 1
0 = = ,
2 4 6 2·1+4·0+6·0 2
0
" # 4 " # " #
1 3 5 4·1+3·1+5·2 17
1 = = .
2 4 6 2·4+4·1+6·2 24
2
This shows that multiplication by A assigns a vector in R2 to a vector in R3 , hence it is like a function
with domain R3 and codomain R2 !
The previous example shows that we can think of matrix multiplication as some sort of function that takes
vectors as input and spits vectors out. Such functions are called transformations.
A transformation is merely a special type of function. It takes vectors from its domain as input and returns
2.7. LINEAR TRANSFORMATIONS 103
vectors in its codomain. Consequently, the same language for functions carries over for transformations as
well.
Let F : Rk → Rn be a transformation. For any ~v ∈ Rk , the vector F (~v ) ∈ Rn is called the image of
~v under F . The set of all images F (~v ) is called the range of F .
Example 2.7.2
Given any n × k matrix A, and any ~v ∈ Rk , the matrix vector product A~v is a vector in Rn . Thus, matrix
vector multiplication defines a transformation FA : Rk → Rn given by FA (~v ) = A~v for all ~v ∈ Rk . These are
called matrix transformations.
Example 2.7.3
v1
v2
Solution. Write ~v = ∈ Rn . Then,
...
~vn
1 0 0 v1
0 1 0 v2
0 0 0 v3
FI (~v ) = In~v = v1 + v2 + . . . + vn = = ~v .
.. .. .. ..
.
.
.
.
0 0 0 vn−1
0 0 1 vn
Note
The matrix In defined above is called the n × n identity matrix. The transformation FI is called
the identity transformation. The identity matrix plays a very important role in linear algebra.
We will see it more frequently later in the course.
Example 2.7.4
1
" #
7
iii) Let ~b = . Find ~v ∈ R3 such that FA (~v ) = ~b.
8
Solution.
i) The matrix transformation takes vectors from R3 as input and outputs vectors in R2 . Therefore, the
domain of FA is R3 and the codomain of FA is R2 . ♦
2.7. LINEAR TRANSFORMATIONS 105
ii)
" # 3 " # " #
1 3 5 1·3+3·2+5·1 14
FA (~u) = 2 = = . ♦
2 4 6 2·3+4·2+6·1 20
1
iii) The questions asks if for a vector ~v ∈ R3 such that FA (~v ) = ~b. Since FA (~v ) = A~v , this means we need
to find a vector ~v ∈ R3 such that A~v = ~b; i.e. we need to find a solution to the matrix equation A~x = ~b.
We know how to do this! Augment A with ~b and row reduce!
" # " #
h i 1 3 5 7 1 0 −1 −2
A | ~b = ∼ .
2 4 6 8 0 1 2 3
The vector form of the solution is evident from the RREF:
x1 −2 1
~x = x2 = 3 + s −2 , s ∈ R.
x3 0 1
Therefore, any vector ~v of the above form is a solution to A~x = ~b; hence satisfies FA (~v ) = ~b. A specific
vector is found by picking any fixed value of s. If we take s = 0, then,
−2 " #
7
FA 3 = . ♦
8
0
iv) Yes because there are infinitely many solutions to the matrix equation A~x = ~b and each such solution
yields a vector ~v such that FA (~v ) = ~b. ♦
"#
0
v) If ~y = ~ ∈ R3 such that FA (w)
is in the range of FA , then by definition there is a vector w ~ = ~y .
1
Such a vector is necessarily a solution to the matrix equation A~x = ~y . Therefore, we can answer the
question by determining whether or not this matrix equation has a solution. We know how to do this!
Form the augmented matrix and row reduce!
" # " #
1 3 5 0 1 0 −1 3/2
[ A | ~y ] = ∼ .
2 4 6 1 0 1 2 −1/2
x3 0 1
Since A~x = ~y has a solution, we conclude that yes, ~y is in the range of FA . For example, letting s = 1,
5/2
p) = ~y is p~ = −5/2
a specific vector p~ such that FA (~ . ♦
1
106 CHAPTER 2. VECTORS IN RN
Parts iii) and v) of Example 2.7.4 ask whether or not a specific vector is in the range of a matrix transfor-
mation. The question is answered by finding the solution to a matrix equation. Moreover, we know this is
equivalent to asking whether or not a given vector is in the span of some others. Therefore, we can now
phrase questions that we’ve seen in the last two sections in terms of transformations. For example, part iii)
of Example 2.7.4 can be stated in terms of spanning sets as follows.
Restatement
" # " # " # " #
1 3 5 7
“Let ~v1 = , ~v2 = , ~v3 = . Is ~b = in span {~v1 , ~v2 , ~v3 }?”
2 4 6 8
There are many types of transformations. Those of specific interest are the linear transformations.
2. F (rw)
~ = r(F (w)) ~ ∈ Rk and all scalars r ∈ R.
~ for all w
Note on Language
There are transformations that are not linear, as the following example shows.
Example 2.7.5
v3 v3
Solution. To show F is not linear, it suffices to find examples of vectors that violate one of the two properties
2.7. LINEAR TRANSFORMATIONS 107
0 1
1
Then, ~v1 + ~v2 = 0 , so that
1
1
F (~v1 + ~v2 ) = 1 .
1
On the other hand,
1 0
F (~v1 ) = 1 , and F (~v2 ) = 1 ,
0 1
so that
1
F (~v1 ) + F (~v2 ) = 2 .
1
This shows F (~v1 + ~v2 ) 6= F (~v1 ) + F (~v2 ) which means F is not a linear transformation. ♦
The answer to the previous example violates the first condition in the definition of linearity to show F is not
a linear transformation. It is perfectly valid to violate the second condition to show a transformation is not
linear as well.
Example 2.7.6
v3
Solution. This time, we find a counterexample to the second condition in the definition of linear transfor-
mations. To this end, let
1
~v = 0 .
0
Then,
2
F (2~v ) = F 0 = 22 = 4,
0
108 CHAPTER 2. VECTORS IN RN
and
2F (~v ) = 2 · 12 = 2.
Therefore, F (2~v ) 6= 2F (~v ). This violates property 2 in the definition of linear transformations and, conse-
quently, F is not a linear transformation. ♦
The following examples show how to prove a transformation is linear, even if it is not given explicitly.
Example 2.7.7
Solution. To prove that F is linear, we must show that F satisfies the two conditions in the definition.
First, let ~u, ~v ∈ Rn . Then,
where the middle equality follows from part 2 of Properties of Vectors. Therefore, the first condition for
~ ∈ Rn . Then,
linear transformations is verified. For the second condition, let r ∈ R and w
F (rw)
~ = c(rw)
~ = (cr)w
~ = (rc)w
~ = r(cw)
~ = r(F (w)),
~
where the middle equality follows from part 5 of Properties of Vectors. Therefore, the second condition for
linear transformations is verified and so, F is a linear transformation.
Note
If 0 ≤ c < 1, then the linear transformation in Example 2.7.7 is called a contraction. If c ≥ 1, then
the linear transformation in Example 2.7.7 is called a dilation.
The following theorem is important. It shows that every matrix transformation is a linear transformation.
Theorem 2.7.1
Every matrix transformation is a linear transformation.
where the middle equality follows from part 1 of Theorem 2.4.1. Thus, the first property in the definition of
~ ∈ Rk . Then,
linear transformations holds. For the second property, let r ∈ R be any scalar and let w
F (rw)
~ = A(rw)
~ = r(Aw)
~ = r(F (w)),
~
2.7. LINEAR TRANSFORMATIONS 109
where the middle equality follows from part 2 of Theorem 2.4.1. Thus, the second condition in the definition
of linear transformation holds and, hence, F is a linear transformation.
An amazing fact about is that the converse of Theorem 2.7.1 holds as well; that is, every linear transforma-
tion is a matrix transformation. We prove this formally in Section 2.7.4.
The following theorem gives two properties that every linear transformation satisfies. The first one is par-
ticularly useful for showing transformations are not linear.
Theorem 2.7.2
Example 2.7.8
Since F (~0) 6= ~0, part 1 of Theorem 2.7.2 implies that F is not linear. ♦
Proof. Note that ~0k + ~0k = ~0k . Since F is linear, property 1 in the definition implies,
Warning!
If asked to determine if a given transformation F is linear, the first thing you should always check is
F (~0) = ~0. But be careful. Theorem 2.7.2 can only be used to show that something is not linear. If
a transformation F satisfies F (~0) = ~0, this is not enough information to conclude that F is a linear
transformation.
Linear transformations possess the following property that allows you to “pull” it through a sum.
110 CHAPTER 2. VECTORS IN RN
Theorem 2.7.3
Let F : Rk → Rn be a linear transformation. Let ~v1 , ~v2 , . . . , ~vm ∈ Rk and let c1 , c2 , . . . , cm be scalars.
Then,
F (c1~v1 + c2~v2 + . . . + cm~vm ) = c1 F (~v1 ) + c2 F (~v2 ) + . . . + cm F (~vm ).
Proof. We only prove this for 2 vectors. I leave the proof of the general case to the reader.
Let ~u, ~v ∈ Rk and r, s ∈ R. Then, r~u, s~v , r~u + s~v ∈ Rk and, since F is assumed linear, the first condition in
the definition implies
F (r~u + s~v ) = F (r~u) + F (s~v ),
Example 2.7.9
" #
1 0
Let and define the linear transformation FA : R2 → R2 by FA (~v ) = A~v . Describe how
0 −1
FA acts on vectors in R2 geometrically.
" #
v1
Solution. Let ~v = ∈ R2 . Then,
v2
" #" # " #
1 0 v1 v1
FA (~v ) = A~v = = .
0 −1 v2 −v2
# " " # " #
v1 2 v1 v1
If we consider the vector as a point in R , is the reflection of over the x-axis.
v2 −v2 v2
This means that FA is reflecting the given vector ~v over the x-axis. Therefore, FA acts on vectors in R2
by reflecting them over the x-axis. The following picture shows examples of FA applied to some vectors in R2 .
2.7. LINEAR TRANSFORMATIONS 111
w
~ x2
FA (~v ) x1
~v
FA (w)
~
We can also apply FA to a set of vectors. Here is FA applied to the square S with vertices at (±1, 1) and
(±1, 3). In this case, the vectors are interpreted as points in R2 .
x2
x1
FA (S)
Exercise
Reflections over the y-axis and reflections over the line x = y are matrix transformations as well. See
if you can figure out the matrix that defines.
112 CHAPTER 2. VECTORS IN RN
Example 2.7.10
" #
1 1 −1
Let A = √ and define the linear transformation FA : R2 → R2 by FA (~v ) = A~v . Describe
2 1 1
how FA acts on vectors in R2 geometrically.
" #
1
Solution. This one is a little trickier. To get an idea of what FA is doing, apply FA to ~v = a few
0
times and plot them.
x2
FA (FA~v ))
x1
~v
We see that the linear transformation is rotating ~v counter-clockwise about the origin through an angle of
45 degrees. In fact, any counter-clockwise rotation of a vector about the origin through an angle ϕ, is a
matrix transformation. The matrix that defines the rotation is called a rotation matrix, denoted Rϕ , and
is given by,
" #
cos(ϕ) − sin(ϕ)
Rϕ = . ♦
sin(ϕ) cos(ϕ)
2.7. LINEAR TRANSFORMATIONS 113
where m is some non-zero number. These are called shear transformations, the former matrix corresponds
to a horizontal shear and the latter corresponds to a vertical shear. Here are a few examples of what
shears look like after applying them to the square with vertices (0, 0), (1, 0), (1, 1), and (0, 1).
" #
1 2
Horizontal Shear 1. :
0 1
x2
F (S)
x1
" #
1 −1/2
Horizontal Shear 2. :
0 1
x2
F (S) S
x1
" #
1 0
Vertical Shear 1. :
−4 1
114 CHAPTER 2. VECTORS IN RN
x2
S
x1
F (S)
2.7. LINEAR TRANSFORMATIONS 115
" #
1 0
Vertical Shear 2. :
3 1
x2
F (S)
x1
Here are the above shears applied to the square with vertices (1, −1), (1, 1), (−1, 1), and (−1, −1).
" #
1 2
Horizontal Shear 1. :
0 1
x2
S F (S)
x1
" #
1 −1/2
Horizontal Shear 2. :
0 1
116 CHAPTER 2. VECTORS IN RN
x2
S F (S)
x1
" #
1 0
Vertical Shear 1. :
−4 1
x2
x1
F (S)
2.7. LINEAR TRANSFORMATIONS 117
" #
1 0
Vertical Shear 2. :
3 1
x2 F (S)
x1
x2
~u
~v
P (w)
~
x1
P (~v ) P (~u)
w
~
118 CHAPTER 2. VECTORS IN RN
Projections to the vertical axis are also possible. I leave it to the reader to explore such transformations.
0 0 0 v3 0
~v onto the xy-plane.
There are some other types of linear transformations as well, such as scaling in the horizontal and vertical
directions and squeezing. See if you can find matrices that represent these operations.
Theorem 2.7.1 shows that every matrix is a linear transformation. In this section, we prove the converse to
this theorem. That is, every linear transformation is a matrix transformation. Even more amazingly, this
matrix is unique. First, we need a definition.
The standard basis for Rn is the set of vectors {~e1 , ~e2 , . . . , ~en } ⊆ Rn where ei is the vector with a
1 in the ith component, and zeroes everywhere else. That is, each ei has the following form,
0
.
..
0
ei = 1
0
.
.
.
0
Example 2.7.11
This shows every vector ~v ∈ R2 is a linear combination of ~e1 and ~e2 . It is not hard to generalize this argument
to Rn for any arbitrary n ≥ 1. That is, if ~v ∈ Rn and {~e1 , ~e2 , . . . , ~en } is the standard basis of Rn , then
where v1 , v2 , . . . , vn are the components of ~v . Using this, we prove the converse to Theorem 2.7.1.
Theorem 2.7.4
Let F : Rk → Rn be a linear transformation. Then, there exists a unique n × k matrix A such that
F (~v ) = A~v for all ~v ∈ Rk .
Proof. Let ~v ∈ Rk . Using the observation above, write ~v = v1~e1 + +v2~e2 + . . . + vk~ek where v1 , v2 , . . . , vk
are the components of ~v . Since F is linear, part 2 of Theorem 2.7.3 implies
Since F (~ei ) ∈ Rn for each i = 1, . . . , k, the matrix A = [ F (~e1 ) F (~e2 ) . . . F (~en ) ] is n × k. By definition of
matrix vector multiplication, Equation (2.4) implies
F (~v ) = A~v .
Thus, we have found an n × k matrix A such that F (~v ) = A~v for all ~v ∈ Rk .
We now show A is unique. Suppose B is another matrix such that F (~v ) = B~v for all ~v ∈ Rk . Then
A~v =h B~v for all ~v i∈ Rk . In particular, A~ei = B~ei for each i = 1, . . . , k. Write A = [ ~a1 ~a2 . . . ~ak ] and
B = ~b1 ~b2 . . . ~bk . Then,
A~ei = 0~a1 + +0~a2 + . . . + 0~ai−1 + 1~ai + 0~ai+1 + . . . + 0~ak = ~ai , for each i = 1, . . . , k
and similarly
B~ei = ~bi for each i = 1, . . . , k.
120 CHAPTER 2. VECTORS IN RN
Example 2.7.12
v1
v1
3 4
v1 + v2
Let F : R → R be a linear transformation given by F v2 = . Find the
v1 + v2 + v3
v3
v1 − v3
standard matrix for F .
Solution. Apply F to the standard basis {~e1 , ~e2 , ~e3 } for R3 to get
1 1 0 0 0
1+0 1 0+1 1 0
F (~e1 ) = = , F (~e2 ) = = , F (~e3 ) = .
1+0+0 1 0+1+0 1 1
1−0 1 0−0 0 −1
Example 2.7.1. Let f, g : R → R be differentiable functions and let r ∈ R. Recall from calculus that
derivatives obey the following properties:
d df dg d df
(f + g) = + , and (rf ) = r .
dx dx dx dx dx
d
This means that differentiation is a linear function from R to R. Can you find the standard matrix for ?
dx
Here is a way to visualize an onto transformation. Imagine you are at a school dance and that everyone
present is split into two groups; call the first group Rk and the second group Rn . Think of a transformation
F : Rk → Rn as a way of assigning to a person in Rk a dance partner in Rn . Then, F is onto if every person
in Rn has a dance partner in Rk under F .
Suppose F : Rk → Rn is an onto linear transformation with standard matrix A. Then, for every ~b ∈ Rn ,
there is a vector ~v ∈ Rk such that F (~v ) = A~v = ~b. Thus, the matrix equation A~x = ~b has a solution for every
vector ~b ∈ Rn and so the columns of A span Rn . This shows that if F is an onto linear transformation, then
the columns of its standard matrix span Rn . The converse is also true. This let’s us add a new condition
into The Span Theorem.
Let ~v1 , ~v2 , . . . , ~vk ∈ Rn and let A = [ ~v1 ~v2 . . . ~vk ] be the n × k matrix whose columns are
~v1 , ~v2 , . . . , ~vk . Let FA : Rk → Rn be the linear transformation whose standard matrix is A. The
following are equivalent.
2. The columns of A span Rn . This means every vector ~b ∈ Rn is a linear combination of the
columns of A. Equivalently, {~v1 , ~v2 , . . . , ~vk } spans Rn .
4. FA is onto.
Example 2.7.13
Is F onto?
122 CHAPTER 2. VECTORS IN RN
This matrix has a pivot in every row. Therefore, by The Span Theorem, the linear transformation F is
onto.
Proof. We’ve already proved the first three statements are equivalent. Therefore, it suffices to show the
fourth is equivalent to any of the first three. We show it is equivalent to 1.
First suppose A~x = ~b has a solution for all ~b ∈ Rn . Then, there is ~v ∈ Rk such that A~v = ~b. As A is the
standard matrix for FA , it follows that FA (~v ) = ~b. This shows every ~b ∈ Rn is in the range of FA , so that
FA is onto.
Conversely, suppose FA is onto. Pick any ~c ∈ Rn . Then, there is w ~ ∈ Rk such that FA (w)
~ = ~c. As A is the
standard matrix of FA , it follows that Aw ~ = ~c, which shows w
~ is a solution to the matrix equation A~x = ~c.
Therefore, A~x = ~c has a solution for all ~c ∈ Rn .
This is wonderful! We now have an easy way to test if a linear transformation is onto: calculate its standard
matrix, put it into echelon form, and see if there is a pivot in every row!
We now have four distinct ways of asking questions about spanning sets in Rn . These are summarized as
follows.
Let ~v1 , ~v2 , . . . , ~vk ∈ Rn . Let A = [ ~v1 ~v2 . . . ~vk ]. The following questions are all different ways to
ask the same thing.
3. Does {~v1 , ~v2 , . . . , ~vk } span Rn ? or does span {~v1 , ~v2 , . . . , ~vk } = Rn ?
All four of these question are solved the same way: row reduce the appropriate matrix and count
pivot rows.
Onto is a property of linear transformations that is related to spanning. One-to-one is related to linear
independence.
2.7. LINEAR TRANSFORMATIONS 123
Note
The definitions of one-to-one and onto hold true for any function, not just linear transformations. For
example, the function f : R → R given by f (x) = x2 is neither one-to-one nor onto, but the function
g : R → R given by g(x) = x3 is both one-to-one and onto. See if you can prove these claims!
The Span Theorem provides an easy criterion for checking if a linear transformation is onto. We prove a
similar criterion for one-to-one.
Let ~v1 , ~v2 , . . . , ~vk ∈ Rn and let A = [ ~v1 ~v2 . . . ~vk ] be the n × k matrix whose columns are
~v1 , ~v2 , . . . , ~vk . Let FA : Rk → Rn be the linear transformation whose standard matrix is A. The
following are equivalent.
2. The columns of A are linearly independent; that is, {~v1 , ~v2 , . . . , ~vk } is a linearly dependent set.
4. FA is one-to-one.
Example 2.7.14
Since this matrix does not have a pivot in every column, the linear transformation F is not one-to-one. ♦
124 CHAPTER 2. VECTORS IN RN
Proof. We’ve already proved the first three are equivalent. Therefore, it suffices to show 4 is equivalent to 1.
Suppose A~x = ~0 has only the trivial solution. Pick any ~u, ~v ∈ Rk that satisfy FA (~u) = FA (~v ). It suffices to
show that ~u = ~v . Rearranging this equation yields
FA (~u − ~v ) = ~0.
A(~u − ~v ) = ~0.
Therefore, ~u − ~v is a solution to the homogeneous equation A~x = ~0. By assumption, this equation has only
the trivial solution ~0. Therefore, we conclude that ~u − ~v = ~0 =⇒ ~u = ~v . This shows FA is one-to-one.
Conversely, suppose FA is one-to-one. Then, the equation FA (x) ~ = ~0 has at most one solution. Since ~x = ~0
is always a solution to this equation, it must be the only one. This exactly means A~x = ~0 has only the trivial
solution.
Example 2.7.15
The echelon form has a pivot in every row and in every column. Therefore, F is both one-to-one and onto. ♦
2.7. LINEAR TRANSFORMATIONS 125
Example 2.7.16
We now have four distinct ways to ask questions regarding linear independence. These are summarized as
follows.
Let ~v1 , ~v2 , . . . , ~vk ∈ Rn . Let A = [ ~v1 ~v2 . . . ~vk ]. The following questions are all different ways to
ask the same thing.
2. Does the matrix equation A~x = ~0 have only the trivial solution?
All four of these questions are solved in the same way: row reduce the appropriate matrix and count
pivot columns.
Sometimes you can tell if a linear transformation is one-to-one or onto just by looking at the domain and
codomain.
126 CHAPTER 2. VECTORS IN RN
Corollary 2.7.1
Proof. Let A be the standard matrix for F . Then, A is n × k. There are three cases.
Case 1. If n > k, then A has more rows than columns. By Corollary 2.5.1, the columns of A do not span
Rn . Therefore, F is not onto by The Span Theorem.
Case 2. If k > n, then A has more columns than rows. By Corollary 2.6.4, the columns of A are linearly
dependent. Therefore, F is not one-to-one by The Linear Independence Theorem Version.
Case 3. If n = k, then A has the same number of rows as columns. Therefore, an echelon form of A either
has a pivot in every row AND column, or there is a column and a row without a pivot. Thus, by The Span
Theorem and The Linear Independence Theorem Version, F is either is both one-to-one and onto or it is
neither.
There is a practical (but not at all rigorous) way to think about and remember Corollary 2.7.1. Suppose
F : Rk → Rn is a linear transformation and that n > k. Then, the vectors in Rn have more components than
the vectors in Rk . Intuitively, it seems like there should be no way to “cover” all of Rn with Rk . Likewise, if
k > n, the vectors in the domain have more components than the vectors in the codomain, so the domain is
“too big” to cover each vector in the codomain uniquely. This is only a practical way to think about this as
the reasoning is unsound. Indeed, both Rk and Rn have the same number of vectors in them for any posi-
tive integers n and m, surprising as that may seem. This fact, however, is beyond the scope of this document.
Warning!
One-to-one and onto are not binary. Just because a linear transformation is not one-to-one doesn’t
mean it’s onto, and vice versa. The only time they are intertwined is when the dimension of the
domain is equal to the dimension of the codomain. Otherwise, they are generally not related.
The following example ties all that we’ve been doing so far in this section together.
Example 2.7.17
i) Write down the domain and codomain of the corresponding linear transformation.
2.7. LINEAR TRANSFORMATIONS 127
ii) Determine if the corresponding linear transformation is one-to-one, onto, both, or neither.
−1 0
a) A = 0 2
3 9
" #
8 1 5 7
b) B =
1 −2 4 −1
2 −2 −8
c) C = 3 1 0
1 0 −1
" #
1 −1
d) D =
0 1
" #
−1 3 −1
e) E =
0 0 0
Solution.
0 0
This echelon form has a pivot in every column. Therefore, FA is one-to-one by The Linear
Independence Theorem Version. ♦
This echelon form has a pivot in each row. Therefore, FB is onto by The Span Theorem. ♦
0 0 0
This echelon form does not have a pivot in every row nor in every column. Therefore, FC is neither
one-to-one nor onto by The Span Theorem and The Linear Independence Theorem Version.
Algebra of Matrices
In this chapter, we develop the algebra of matrices. Some of the algebraic properties of matrices are similar
to those of real numbers, but there are some unfamiliar properties as well.
To simplify notation, we write A = [aij ], 1 ≤ i ≤ n, 1 ≤ j ≤ k, or A = [aij ] when the size of the matrix is
clear from context.
1. A matrix is called square if it has the same number of rows as it does columns.
2. Let A = [aij ], 1 ≤ i ≤ n, 1 ≤ j ≤ k be a n × k matrix. The aii terms are called the main
diagonal of A.
3. The n × k matrix whose entries are all zero is called the n × k zero matrix. This is denoted
by [0]nk , or 0nk , or merely [0] if the context is clear.
4. The n × n identity matrix, denoted In , is the matrix with with ones on its main diagonal and
zeroes everywhere else.
129
130 CHAPTER 3. ALGEBRA OF MATRICES
Example 3.1.1
The main diagonal of the following three matrices have been boldfaced and underlined
1 2
1 2 1 " #
0 1 −1 2 1 −1
A1 = 0 0 1 , A2 = , A3 = .
0 −1 2 −1 1 1
−1 1 2
10 1
Example 3.1.2
and etcetera.
Identity matrices play the same role for square matrices that 1 does for real numbers. We’ll see this when
we define matrix multiplication. Moreover, notice the columns of In form the standard basis for Rn . Thus,
we can write,
In = [ ~e1 ~e2 . . . ~en ] .
Before we can perform algebra on matrices, we need operations: addition, subtraction, multiplication,
etcetera. Since matrices are new mathematical objects, we need to define these operations from scratch.
Equality, addition, and subtraction are all defined in the obvious way. Multiplication is a little bit different.
1. A and B are equal if if aij = bij for each i ∈ {1, 2, . . . , n} and each j ∈ {1, 2, . . . , k} . That is,
A and B are equal componentwise. If two matrices are equal, we write A = B.
That is, the sum/difference of matrices is computed by adding/subtracting the terms compo-
nentwise.
3.1. ADDITION, SUBTRACTION, AND SCALAR MULTIPLICATION OF MATRICES 131
Note
Like with vectors, we do not define the sum/difference of two matrices if they have different sizes.
Example 3.1.3
" # " #
1 7 2 −2 0 1
Let A = and B = . Then,
1 2 −1 1 1 1
" # " #
1 + (−2) 7 + 0 2+1 −1 7 3
A+B = = ,
1+1 2+1 (−1) + 1 2 3 0
" # " #
1 − (−2) 7−0 2−1 3 7 1
A−B = = ,
1−1 2−1 (−1) − 1 0 1 −2
" # " #
−2 − 1 0−7 1−2 −3 −7 −1
B−A = = .
1−1 1 − 2 1 − (−1) 0 −1 2
Let A = [aij ] be an n × k matrix and let r ∈ R be any scalar. The scalar multiplication of r and A
is the n × k matrix
ra11 ra12 ... ra1k
ra21 ra22 ... ra2k
rA = [(raij )] =
.. .. .. .. .
.
. . .
ran1 ran2 ... rank
Example 3.1.4
" #
1 3 2
Let A = . Then,
2 4 7
" # " # " #
1 1/2 3/2 1 3 9 6 π 3π 2π
A= , 3A = , πA = .
2 1 2 7/2 6 12 21 2π 4π 7π
132 CHAPTER 3. ALGEBRA OF MATRICES
Example 3.1.5
" # " #
−1 2 −1 −1
Let A = ,B = . Then,
0 1 2 3
" # " # " #
−3 6 −2 −2 −1 8
3A − 2B = − = .
0 3 4 6 −4 −3
Vectors are matrices with a single column so it should come as no surprise that the definitions of addition,
subtraction, and scalar multiplication are the same for general matrices as they are vectors. The following
analogue of Properties of Vectors should come as no surprise either.
Let A, B, C be three matrices of the same size. Let r, s ∈ R be any scalars. Then,
1. A + B = B + A (commutativity of addition);
2. A + (B + C) = (A + B) + C (associativity of addition);
3. A + [0] = [0] + A = A;
7. A − A = [0] .
Proof. The proofs of these properties all follow immediately from the definitions of addition and subtraction
of matrices and from from properties of real numbers. Here is an example of two of them. The rest are
similar and are left as exercises.
Exercise
Prove the rest of Matrix Addition and Scalar Multiplication Properties.
Addition and scalar multiplication for matrices is defined in the obvious way. Matrix multiplication on the
other hand is defined in a bit of a different way.
By definition, matrix multiplication is not defined for all matrices A and B. For AB to be defined, B must
have the same number of rows as A has columns. In this case, he size of AB is easy to calculate. If A is
n × k and B is k × m, then AB is k × m. A mnemonic for remembering this is to write
where the size of AB is determined by “cancelling off” the two inner k’s.
Example 3.1.6
2 1 " #
2 8 1 −2
Let A = −1 0 ,B = . Compute AB.
0 1 7 −1
3 −2
h i
AB = (A~b1 ) (A~b2 ) (A~b3 ) (A~b4 ) .
2 1 " # 2 1 4
2
A~b1 = −1 0 = 2 −1 + 0 0 = −2 ,
0
3 −2 3 −2 6
2 1 " # 2 1 17
8
A~b2 = −1 0 = 8 −1 + 0 = −8 ,
1
3 −2 3 −2 22
2 1 " # 2 1 9
1
A~b3 = −1 0 = −1 + 7 0 = −1 ,
7
3 −2 3 −2 −11
2 1 " # 2 1 −5
−2
A~b4 = −1 0 = −2 −1 − 0 = 2 .
−1
3 −2 3 −2 −4
Therefore,
4 17 9 −5
AB = −2 −8 −1 2 . ♦
6 22 −11 −4
There is a shortcut for calculating AB that is less work than calculating from the definition. It is easiest
seen with an example.
Example 3.1.7
−1 8 " #
7 8 −3
Let A = 2 −3 and B = . Calculate the product AB.
7 2 1
4 1
Solution. The shortcut requires a little bit of imagination. First, AB is 3 × 3 because A is 3 × 2 and B is
2 × 3. Write the following.
−1 8 " #
7 8 −3
AB = 2 −3 = .
7 2 1
4 1
Start in the (1, 1)−entry of AB. To calculate " this,# take the first row of A and, in your mind, rotate it 90
−1
degrees clockwise so it looks like the vector . Lay this vector on top of the first column of B. You
8
" # " #
−1 7
should now be imagining the vector sitting on top of the vector . To calculate the (1, 1)-entry,
8 7
multiply the numbers sitting on top of each other, and add down the vector. In this example, we calculate
(−1) · 7 + 8 · 7 = −7 + 56 = 49.
3.1. ADDITION, SUBTRACTION, AND SCALAR MULTIPLICATION OF MATRICES 135
Therefore, we have
−1 8 " # 49
7 8 −3
AB = 2 −3 = .
7 2 1
4 1
Now move to the (1, 2)-entry. To calculate this, take the first row of "A, rotate
# it 90 degrees clockwise,
" # and
−1 8
lay it on top of the second column of B. You should be imagining laying on top of . Now
8 2
proceed the same way as before: multiply the numbers sitting on top of one another and add down the
vector. You should be calculating
(−1)(8) + 8(2) = −8 + 16 = 8.
This yields,
−1 8 " # 49 8
7 8 −3
AB = 2 −3 = .
7 2 1
4 1
Repeat this process to get the (1, 3)-entry. Doing so, you’ll find
−1 8 " # 49 8 11
7 8 −3
AB = 2 −3 = .
7 2 1
4 1
Now move to the (2, 1)−entry. To calculate this, take the second row" of A,#rotate it 90 degrees
" clockwise,
#
2 7
and lay it on top of the first column of B. You should be imagining laying on top of . You
−3 7
know what to do now: multiply the numbers sitting on top of one another and add down the vector. This
yields,
2(7) + (−3)(7) = 14 − 21 = −7,
and so
−1 8 " # 49 8 11
7 8 −3
AB = 2 −3 = −7 .
7 2 1
4 1
Now iterate this process to fill in the rest of AB. Each time you move down a row in AB, you move down
a row in A. In general, the (i, j)-entry of AB by taking the ith row of A, rotating it 90 degrees clockwise,
laying it on top of the jth column of B, multiplying the numbers sitting on top of one another, and adding
down the vector. Finish this process yourself to calculate the final matrix product:
−1 8 " # 49 8 11
7 8 −3
AB = 2 −3 = −7 10 −9 . ♦
7 2 1
4 1 35 34 −11
136 CHAPTER 3. ALGEBRA OF MATRICES
Example 3.1.8
2 1 1
−1 2 1 3 −8 7 0
Let A = 4 0 −8 −9 and B = . The product BA is a 4 × 4 matrix. Use
0 −2 −5
−9 10 5 6
1 1 1
the shortcut described in the previous example to calculate the (1, 2), (2, 1), (4, 3), and (3, 2) entries
of BA.
Solution. We calculate the (1, 2)-entry by taking the first row of B, rotating it 90 degrees clockwise, laying
it on top of the second column of A, multiplyingthe numbers sitting ontop of one another, and adding
2 2
down the vector. Therefore, you should imagines 1 sitting on top of 0 and then calculate
1 10
This is the (1, 2)-entry. For the (2, 1)-entry, lay the second row of B on the first column of A and repeat.
This yields,
(−8)(−1) + (7)(4) + (0)(−9) = 36.
The (4, 3)-entry is calculated by laying the fourth row of B on top of the third column of A to get,
Finally, the (3, 2)-entry is calculated by laying the third row of B on the second column of A to get,
The formula for calculating entries in a matrix product as specified by the shortcut above is easy to prove.
I leave this as an exercise to the reader.
Theorem 3.1.2
Let A be n × k and suppose that B is a matrix with k rows, so that AB is defined. Let (AB)ij denote
the (i, j)-entry of the matrix AB. Then
k
X
(AB)ij = ai1 b1j + ai2 b2j + . . . + aik bkj = ai` b`j .
`=1
3.1. ADDITION, SUBTRACTION, AND SCALAR MULTIPLICATION OF MATRICES 137
Exercise
Prove Theorem 3.1.2.
Let A be n × k and let B and C be two matrices that have sizes for which all of the following
multiplications are defined. Then,
Of particular note, pay attention to property 5. If A is n × n, this says that AIn = In A = A. This means
that the identity matrix functions the same way for square matrices that 1 does for real numbers.
Proof. We only prove a couple of these and the rest are left as exercises for the reader.
Proof of 1. First let A be n × k, B be k × m, and let ~v ∈ Rm . We first show that (AB)~v = A(B~v ). Start by
writing,
v1
v2
h i
~ ~ ~
B = b1 b2 . . . bm , ~v = . .
..
vm
By definition of matrix vector multiplication,
So,
v1
i v2
h
(A~b1 ) (A~b2 ) . . . (A~bm )
= .. by definition of matrix vector multiplication
.
vm
138 CHAPTER 3. ALGEBRA OF MATRICES
h i
= (A~b1 + A~c1 ) (A~b2 + A~c2 ) . . . (A~bm + A~cm ) by Matrix Vector Multiplication Properties,
h i
= A~b1 A~b2 . . . A~bm + [ A~c1 A~c2 . . . A~cm ] by definition of matrix addition,
Warning!
Here are 3 properties that hold for multiplication of real numbers but do not hold for matrix multi-
plication.
1) AB is NOT generally equal to BA. In fact, if AB is defined, BA generally isn’t.
3) If AB = 0, then it is NOT always the case that one of A or B is the zero matrix.
3.1. ADDITION, SUBTRACTION, AND SCALAR MULTIPLICATION OF MATRICES 139
Do not fall into the trap of thinking these hold for matrices. This is a very common mistake.
Example 3.1.9
" # " #
1 0 0 0
Let A = and B = . Then, AB = 0 but neither A nor B is a zero matrix.
0 0 1 0
Example 3.1.10
" # " #
1 2 2 1
Let A = and B = . Then,
0 2 1 0
" # " #
4 1 2 6
AB = , and BA = ,
2 0 1 2
so AB 6= BA.
Example 3.1.11
" # " # " #
1 1 2 0 5 1
Let A = ,B = ,C = . Then,
0 0 3 1 0 0
" #
5 1
AB = = AC
0 0
but B 6= C.
Matrix multiplication is defined in a bit of a strange way. We give some justification for why this definition
is chosen. Let F : Rk → Rn and G : Rm → Rk be linear transformations with standard matrices A and B
respectively; note that A is n × k and G is k × m. Consider the composition transformation,
It is a fact that F ◦ G is a linear transformation (try to prove this!). Therefore, there exists an m × n matrix
C such that
(F ◦ G)(~v ) = F (G(~v )) = C~v for all ~v ∈ Rm .
Then,
C~v = (F ◦ G)(~v ) = F (G(~v )) = F (B~v ) = A(B~v ) = (AB)~v ,
where the last equality follows by part 1 of Matrix Multiplication Properties. Since this holds for all vectors
~v ∈ Rm , this shows that C = AB; that is, AB is the standard matrix for F ◦ G, which shows that the
standard matrix of a composition of functions is calculated by taking the product of the standard matrices.
This is one of the reasons why matrix multiplication is defined in the way it is.
140 CHAPTER 3. ALGEBRA OF MATRICES
Exercise
Note
There is a definition of matrix multiplication wherein matrix products are calculate using component-
wise multiplication. This is called the Hadamard product. Unfortunately, it’s not a great definition
for matrix multiplication. For example, we lose the ability to calculate standard matrices of compo-
sitions by multiplication if we use this product. You can read more about this on Wikipedia if you
like, the link is here: Hadamard Product.
3.2. MATRIX INVERSES 141
This definition of inverses of matrices lines up with how real number inverses are defined. If r ∈ R is non-zero,
1
then r−1 = is defined as the unique number rr−1 = 1. Since In is the matrix analogue of 1, we see that
r
the two definitions line up with one another.
Note
Matrix inverses are only defined for square matrices. If a matrix is not square, it does not have an
inverse in the way I just defined them. It may have a left inverse, a right inverse, or none at all, but
these are beyond the scope of this course.
Example 3.2.1
" # " #
3 2 1/4 1/4
Let A = . Then, an inverse of A is B = because,
1 −2 1/8 −3/8
" #" # " # " #
3 2 1/4 1/4 3/4 + 1/4 3/4 − 6/8 1 0
AB = = = = In
1 −2 1/8 −3/8 1/4 − 2/8 1/4 + 6/8 0 1
and
" #" # " # " #
1/4 1/4 3 2 3/4 + 1/4 2/4 − 2/4 1 0
BA = = = = In .
1/8 −3/8 1 −2 3/8 − 3/8 2/8 + 6/8 0 1
Theorem 3.2.1
An invertible n × n matrix A has exactly one inverse.
AB = BA = In and AC = CA = In .
142 CHAPTER 3. ALGEBRA OF MATRICES
Since matrix inverses are unique, we denote the inverse of an invertible A by A−1 . Some matrices do not
have inverses. Such matrices ae called non-invertible.
1. (A−1 )−1 = A
2. In−1 = In
3. (AB)−1 = B −1 A−1
Proof.
1. By definition of inverses, AA−1 = A−1 A = In . Thus, the definition of matrix inverses with C = A and
A = A−1 implies that A is the inverse of A−1 . That is, (A−1 )−1 = A.
3. Since A and B are both invertible, their inverses B −1 and A−1 exist and both are n × n. Thus, the
product B −1 A−1 is defined. We then have,
Property 3 of Matrix Inverse Properties shows that a product of invertible matrices is invertible. This
property generalizes to any finite product of matrices. Suppose A1 , A2 , . . . , Am are invertible n × n matrices.
Then, the product A1 A2 . . . Am−1 Am is an invertible n × n matrix and its inverse is
−1 −1 −1
(A1 A2 . . . Am−1 Am )−1 = A−1
m Am−1 . . . A2 A1 .
Example 3.2.2
" #
3 2
Let A = . Use the formula to calculate the inverse of A.
1 −2
ad − bc = 3(−2) − 2(1) = −8 6= 0,
This is exactly the matrix we showed was the inverse for A in Example 3.2.1. ♦
Example 3.2.3
" #
2 6
Let A = . Find the inverse for A and check that it satisfies the inverse relations AB =
−1 7
BA = I2 .
ap + bs = 1, ar + bt = 0, cp + ds = 0, cr + dt = 1.
Subtracting a times the third equation from c times the first yields
Case 2. c = 0. Then the third and fourth equations of the system become ds = 0, dt = 1. Since dt = 1, d 6= 0.
It then follows from the third equation that s = 0. Thus, c = s = 0 and the system reduces to
ap = 1, ar + bt = 0, dt = 1.
Exercise
Fill in the details of the proof of Inverse of 2 × 2 Matrix.
If A is n×n and n ≥ 3, there are no nice formulas for the inverse of a matrix. There is, however, an algorithm
that determines if a given matrix is invertible and, if it is, outputs its inverse.
We’ll see the proof of this algorithm in the section on elementary matrices. For now, we do some examples.
Example 3.2.4
2 1 −1
Find the inverse for A = 0 2 −1 if it exists.
1 0 0
1 0 0 0 0 1
Step 2. Perform the same row operations on [ A | I3 ] that would transform A into RREF. We use Gauss-
Jordan Elimination to do this.
2 1 −1 1 0 0
0
A = 0 2 −1 0 1 0
1 0 0 0 0 1
1 0 0 0 0 1
∼ 0 2 −1 0 1 0
R1 ⇐⇒ R3
2 1 −1 1 0 0
146 CHAPTER 3. ALGEBRA OF MATRICES
1 0 0 0 0 1
∼ 0 2 −1 0 1 0
R3 ⇒R3 −2R1
0 1 −1 1 0 −2
1 0 0 0 0 1
∼ 0 1 −1/2 0 1/2 0
R2 ⇒(1/2)R2
0 1 −1 1 0 −2
1 0 0 0 0 1
∼ 0 1 −1/2 0 1/2 0
R3 ⇒R3 −R2
0 0 −1/2 1 −1/2 −2
1 0 0 0 0 1
∼ 0 1 −1/2 0 1/2 0
R3 ⇒−2R3
0 0 1 −2 1 4
1 0 0 0 0 1
∼ 0 1 0 −1 1 2 .
R2 ⇒R2 +(1/2)R3
0 0 1 −2 1 4
I3 | A−1 . Thus,
Step 3. Since A ∼ In , The Matrix Inverse Algorithm implies [ A | I3 ] ∼
0 0 1
A−1 = −1 1 2 . ♦
−2 1 4
Note
If you’re not sure that you’ve done this correctly, multiply AA−1 out to check that you get In .
Example 3.2.5
−1 0 2
Find the inverse of the matrix A = 1 5 3 if it exists.
1 −3 −5
1 −3 −5 0 0 1
Step 2. Apply the row operations on A0 that would put A into RREF. The first three in order are
R1 ⇒ −R1 , R2 ⇒ R2 − R1 , R3 ⇒ R3 − R1 .
0 −3 −3 1 0 1
The second row is a scalar multiple of the third. Thus, if we do the next row operation R3 ⇒ R3 + (5/3)R2 ,
we get a row of zeroes. Thus, A is not row equivalent to I3 and so The Matrix Inverse Algorithm implies A
is not invertible. ♦
Example 3.2.6
2 0 1 4
1 7 1 0
Find the inverse of A = if it exists.
−1 0 −2 3
8 6 5 8
Step 2. Apply the row operations on [ A | I4 ] that would put A in RREF. There are 16 row operations to
perform. They are,
R2 ⇐⇒ R1 ; R2 ⇒ R2 − 2R1 ; R3 ⇒ R3 + R1 ; R4 ⇒ R4 − 8R1 ; R2 ⇐⇒ R3 ;
R1 ⇒ R1 − 7R2
148 CHAPTER 3. ALGEBRA OF MATRICES
Theorem 3.2.4
Let A be an n × n invertible matrix. Then, for each ~b ∈ Rn , the equation A~x = ~b has exactly one
solution.
Theorem 3.2.4 provides another method for finding a solution to a linear system. Determine if the coefficient
matrix A is invertible. If it is, then A−1~b is the only solution to the linear system.
Example 3.2.7
3x1 − x2 = 5
−4x1 + x2 = 8.
" # " #
3 −1 5
Solution. Let A = and ~b = . The solution to the linear system is the same as the
−4 1 8
solution to the matrix equation A~x = ~b. Since 3(1) − (−4)(−1) = −1 6= 0, Inverse of 2 × 2 Matrix implies A
is invertible. Its inverse is " # " #
−1 1 1 1 −1 −1
A = = .
−1 4 3 −4 −3
that is, (x1 , x2 ) = (−13, −44) is the solution to the linear system. ♦
Proof. Since A is invertible, A−1 exists, and the product A−1~b ∈ Rn . Substitute ~x = A−1~b into the matrix
equation A~x = ~b. Then,
A~x = A(A−1~b) = (AA−1 )~b = In~b = ~b.
Thus, A~x = ~b has solution ~x = A−1~b.
We now show A−1~b is the only solution. Let ~c ∈ Rn be any other solution to the matrix equation. Then
A~c = ~b, and multiplying both sides by A−1 yields
We now give the first version of the Invertible Matrix Theorem. This is one of the most important results
in this course.
1. A is invertible.
2. The RREF of A is In .
6. FA is one-to-one.
9. FA is onto.
Note
Conditions 4 - 6 is the The Linear Independence Theorem Version and Conditions 7 - 9 is The Span
Theorem.
150 CHAPTER 3. ALGEBRA OF MATRICES
Example 3.2.8
−1 2 −6 5
7 −1 8 0 1 −1 2
Let A = 0 1 1 and B = . Determine if A or B are invertible.
0 2 −2 4
1 −1 0
2 3 5 4
B does not have a pivot in every column but A does. By The Invertible Matrix Theorem, B is not invertible
and A is. ♦
Proof. Many of these are straight forward and/or we have already seen their proofs. We will go through
this in whole for completeness and refer to other theorems if we have already proved the given statement.
1 =⇒ 2: Suppose A is invertible. By Theorem 3.2.4, the homogeneous equation A~x = ~0 has only the trivial
solution. By The Linear Independence Theorem Version, A has a pivot in every column. Since A is square
it must have a pivot in every row as well. Therefore, the only possible RREF is In .
2 =⇒ 3: Suppose A ∼ In . Then In is the RREF of A and has n pivot positions. Since pivot positions are
invariant among echelon forms, it follows that B has n pivot positions.
3 =⇒ 4: Suppose B has n pivot positions. Since A is n × n, this implies B has a pivot in every column. By
The Linear Independence Theorem Version, the homogeneous equation A~x = ~0 has only the trivial solution.
4 =⇒ 5: Suppose the homogeneous equation A~x = ~0 has only the trivial solution. By The Linear Indepen-
dence Theorem Version, the columns of A are linearly independent.
5 =⇒ 6: Suppose the columns of A are linearly independent. Then, by The Linear Independence Theorem
Version, the linear transformation FA is one-to-one.
6 =⇒ 7: Suppose FA is one-to-one. Then, since the domain and codomain of F have the same dimension,
Corollary 2.7.1 implies FA is onto. Therefore, A~x = ~b has at least one solution for all ~b ∈ Rn by The Span
Theorem. Since F is assumed one-to-one, the matrix equation has exactly one solution for all ~b ∈ Rn .
7 =⇒ 8 : Suppose A~x = ~b has exactly one solution for all ~b ∈ Rn . Then the columns of A span Rn by the
definition of spanning.
3.2. MATRIX INVERSES 151
9 =⇒ 10: Suppose FA is onto. Let {~e1 , ~e2 , . . . , ~en } be the standard basis for Rn . Since FA is onto, there
exist vectors ~c1 , ~c2 , . . . , ~cn such that FA (~ci ) = ~ei for each i = 1, 2, . . . , n. Therefore, A~ci = ~ei for each
i = 1, 2, . . . , n. Let C = [ ~c1 ~c2 . . . ~cn ] . Then C is n × n and, by definition of matrix multiplication, we
have
AC = A [ ~c1 ~c2 . . . ~cn ] = [ A~c1 A~c2 . . . A~cn ] = [ ~e1 ~e2 . . . ~en ] = In .
This shows C~x = ~0 has only the trivial solution. By The Linear Independence Theorem Version, the lin-
ear transformation FC : Rn → Rn with standard matrix C is one-to-one, and since C is square Corollary
2.7.1 implies FC is onto as well. Therefore, there exists ~vi ∈ Rn such that FC (~vi ) = C~vi = ~ei , for each
i = 1, 2, . . . , n.
CAC = C.
Subtracting C from both sides and using distributivity of matrix multiplication yields
for each i = 1, 2, . . . , n. Thus, every column of D is equal to the zero vector, which implies D = [0]. Hence,
D = [0] =⇒ CA − In = 0 =⇒ CA = In ,
11 =⇒ 1: Suppose there is an n × n matrix such that DA = In . With the roles of A and D reversed, the
same proof as in 10 =⇒ 11 shows that AD = In . Therefore, there exists an n × n matrix D such that
AD = DA = In . By definition, A is invertible with inverse D.
Warning!
The Invertible Matrix Theorem only works for square matrices. Be careful not to apply this to
matrices that are not square.
152 CHAPTER 3. ALGEBRA OF MATRICES
This theorem ties everything for n × n matrices. It is a really beautiful theorem. It states that The Span
Theorem and The Linear Independence Theorem Version are equivalent in the context of n × n matrices.
This is really cool because, taken at face value, the two concepts these theorems deal with have nothing to
do with one another. Moreover, condition 10 of The Invertible Matrix Theorem implies we need only check
that there exists an n × n matrix B such that AB = In to conclude invertibility of A and, in this case,
A−1 = B.
Every matrix defines a linear transformation. Invertible matrices define really useful linear transformations.
and
(G ◦ F )(~u) = G(F (~u)) = ~u for all ~u ∈ Rk .
Exercise
Suppose F is an invertible linear transformation with inverse G. Prove that G is an invertible linear
transformation.
It is not true that every linear transformation is invertible. In fact, a linear transformation F is invertible
if and only if it is one-to-one and onto. This is a fact that is true for any function, not just linear transfor-
mations. Therefore, by The Invertible Matrix Theorem, F is invertible if and only if its standard matrix is
invertible. From this, we might suspect the standard matrix of the inverse of F is the inverse of the standard
matrix of F . This is exactly right.
Theorem 3.2.6
Proof. First we prove that F : Rk → Rn is invertible if and only if F is one-to-one and onto.
G(F (w
~ 1 )) = G(F (w
~ 2 )) =⇒ w
~1 = w
~ 2.
3.2. MATRIX INVERSES 153
~ 3 ∈ Rn . Then, G(w
To show F is onto, let w ~ 3 ) ∈ Rk and so
F (G(w
~ 3 )) = w
~ 3.
Now suppose F is both one-to-one and onto. Then, for every ~u ∈ Rn , there exists exactly one ~vu ∈ Rk such
that F (~vu ) = ~u. Define a transformation G : Rn → Rk by G(~u) = ~vu for each ~u ∈ Rn . Since F is one-to-one,
this function is well defined. Let ~v ∈ Rk . Then, F (~v ) ∈ Rn , and by definition of G, G(F (~v )) = ~v . Thus,
(G ◦ F )(~v ) = ~v for all ~v ∈ Rk . Secondly, for any ~u ∈ Rn , G(~u) = ~vu . By definition,
We’ve shown that F : Rk → Rn is an invertible linear transformation if and only if it is one-to-one and onto.
The only instance when a linear transformation can be both one-to-one and onto is if the dimensions of their
domain and codomain are the same. Therefore, F is invertible if and only if k = n. Therefore, its standard
matrix A is square. Furthermore, by The Invertible Matrix Theorem, A is invertible. Let G : Rn → Rn be
the linear transformation with standard matrix A−1 . Then,
(G ◦ F )(~u) = (A−1 A)~u = ~u, and (F ◦ G)(~u) = (AA−1 )~u = ~u for all ~u ∈ Rn .
Finally, to show G is unique, suppose H : Rn → Rn is another inverse for F with standard matrix B. Then,
for any ~v ∈ Rn ,
(H ◦ F )(~v ) = (BA)~v = ~v .
Therefore, applying H ◦ F to every element in the standard basis for Rn yields BA = In . Hence, by The
Invertible Matrix Theorem, B = A−1 and consequently, H = G.
Note
This theorem implies that any linear transformation F : Rn → Rn that is either one-to-one or onto
is automatically invertible.
An elementary matrix E is an n×n matrix that results from performing one elementary row operation
on In .
Example 3.2.9
The following are examples of 3 × 3 elementary matrices with the corresponding row operation per-
formed on the identity matrix.
1 0 0 1 0 0
E1 = 0 0 1 (R2 ⇐⇒ R3 ), E2 = k 1 0 (R2 ⇒ R2 + kR1 ),
0 1 0 0 0 1
m 0 0
E3 = 0 1 0 (R1 ⇒ mR1 ), m 6= 0.
0 0 1
a b c
Let A = d e f . Describe the multiplications E1 A, E2 A, E3 A, E1 E2 A, E3 E2 A, E1 E3 E2 A.
g h i
Solution. We calculate
a b c a b c ma mb mc
E1 A = g h i , E2 A = ka + d kb + e kc + f , E3 A = d e f ,
d e f g h i g h i
a b c ma mb mc
E1 E2 A = g h i , E3 E2 A = ka + d kb + e kc + f ,
ka + d kb + e kc + f g h i
ma mb mc
E1 E3 E2 A = g h i .
ka + d kb + e kc + f
Notice that multiplication by the elementary matrices E1 , E2 , E3 is equivalent to performing the correspond-
ing row operations on A. ♦
Example 3.2.9 gives reason to believe that multiplying a matrix A on the left by an elementary matrix E
produces a matrix that results from performing the row operation on A that was used to create E. This is
true in general.
3.2. MATRIX INVERSES 155
Fact 3.2.1
Let A be an n × k matrix. Then, performing an elementary row operation on A is equivalent to
performing that same row operation on In to get an elementary matrix En , and then performing the
matrix multiplication En A.
Exercise
Prove Fact 3.2.1.
Theorem 3.2.7
Proof. The RREF of E is In because E differs from In by a single row operation. Therefore, Part 2 of
The Invertible Matrix Theorem implies E is invertible. The row operation used to create E can be reversed
by performing another row operation. Let E 0 be the elementary matrix corresponding to the row operation
that reverses the one used to create E. Then, E 0 E = In , so that E 0 = E −1 by part 11 of The Invertible
Matrix Theorem.
Example 3.2.10
Solution. The row operation performed on I3 to get E1 is swapping rows 2 and 3. To get this back to the
identity matrix, we perform the operation again. This means
1 0 0
E1−1 = 0 0 1 = E1 .
0 1 0
E2 is obtained by adding k times row 1 to row 2. To reverse this, subtract k times row 1 from row 2.
Therefore,
1 0 0
E2−1 = −k 1 0 .
0 0 1
Finally, E3 is obtained from I3 by multiplying row 1 by m 6= 0. To reverse this, multiply row by 1/m to get
1/m 0 0
E3−1 = 0 1 0 .
0 0 1
156 CHAPTER 3. ALGEBRA OF MATRICES
Proof of The Matrix Inverse Algorithm. Let A be an n × n matrix. If the RREF of A is not In , then A
is not invertible by The Invertible Matrix Theorem. If A is invertible, then A ∼ In by The Invertible Matrix
Theorem. Let E1 , . . . , Em be a sequence of elementary matrices that represent row operations needed to
transform A into In , where we start with the row operation represented by E1 and proceed in order until
we get to Em . Then, Fact 3.2.1 implies
Em Em−1 . . . E2 E1 A = In .
Em Em−1 . . . E1 In = A−1 .
It now follows from Fact 3.2.1 that performing the elementary row operations needed to transform A to In
on In itself exactly results in A−1 . This is exactly The Matrix Inverse Algorithm.
3.3. TRANSPOSE 157
3.3 Transpose
In this section, we introduce a new matrix operation called transpose. This is different than anything we
have for real numbers.
Example 3.3.1
8 −1 7 1
2 1 0 0 8 1
Let A = 3 −1 and B = . Calculate AT and B T .
1 2 1 −4
1 4
0 −4 −8 0
Let A and B denote matrices that have appropriate sizes so that the following expressions are defined.
Then,
1. (AT )T = A
2. InT = In
3. (A + B)T = AT + B T
5. (AB)T = B T AT .
Proof. The first four properties are straightforward. We only prove one of them and leave the other three
as exercises for the reader. The last two are more involved so we prove them.
158 CHAPTER 3. ALGEBRA OF MATRICES
(A + B)T = ([aij ] + [bij ])T = ([aij + bij ])T = [aji + bji ] = [aji ] + [bji ] = AT + B T .
Proof of 5. In order to prove this, we show that each entry of (AB)T is the same as that of B T AT . Let A
be n × k and B be k × m, so that AB is n × m. From Theorem 3.1.2,
k
X
(AB)ij = ai` b`j .
`=1
where B T = [b̂ij ], AT = [âij ] with b̂ij = bji and âij = aji . Therefore,
k
X k
X k
X
T T
(B A )ij = b̂i` â`j = b`i aj` = aj` b`i = (AB)Tij .
`=1 `=1 `=1
Thus, the (i, j)-entry of (AB) is the same as the (i, j)-entry of B T AT for each 1 ≤ i ≤ n and 1 ≤ j ≤ m.
T
Proof of 6. Suppose A is invertible. Apply the transpose on both sides of AA−1 = In and use parts 2 and 5
of this theorem to get
(AA−1 )T = InT =⇒ (A−1 )T AT = In .
Therefore, AT is invertible and (A−1 )T = (AT )−1 by parts 11 and 12 of The Invertible Matrix Theorem.
Conversely, if AT is invertible, then AT (AT )−1 = In . Taking transposes of both sides again gives
Part 6 of Transpose Properties implies that A being invertible is equivalent to AT being invertible. We
update the Invertible Matrix Theorem with this condition
1. A is invertible.
3.3. TRANSPOSE 159
2. The RREF of A is In .
6. FA is one-to-one.
9. FA is onto.
12. AT is invertible.
Similar to products of inverses, part 5 of Transpose Properties generalizes to any finite product of matrices
(as long as the product is defined). Indeed, let A1 , A2 , . . . , Am−1 , Am be matrices of appropriate size so that
the product A1 A2 . . . Am−1 Am is defined. Then,
Example 3.4.1
Let A, B, C, and X be n × n matrices with A and X invertible. Solve the following equation for X
BX = A + CX. (3.2)
Solution. If we were given this question with real numbers instead of matrices, you would solve for X by
isolating for the variable. The process is the same in this case, except there are two key differences we have
to be careful of:
1. We don’t know that all of the matrices involved are invertible. Therefore, if we need to invert a matrix
we must ensure it is invertible first.
2. Matrix multiplication is not commutative. This means that we have to keep track of which side
multiplication is occurring. In particular, if we wish to take out a common matrix from an expression,
we have to make sure that term is on the same side of all quantities we are factoring it out of.
The first step we take in isolating for X is to move all the X’s to one side of the equation. We can do this
by subtracting CX from both sides of Equation (3.2):
BX = A + CX =⇒ BX − CX = A. (3.3)
Both B and C are being multiplied by X on the right. Therefore, we can factor it out.
BX − CX = A =⇒ (B − C)X = A. (3.4)
If this were real numbers, we would divide by B − C to solve for X. However, we are working with matrices,
so we can’t divide. We can, however, multiply by the inverse of B − C if it exists. Therefore, we must show
that B − C is invertible.
We are given that B and C are invertible but, in general, sums of invertible matrices are not invertible, so
we can’t immediately conclude B − C is invertible. To show B − C is invertible, recall that X is invertible.
Therefore, we can multiply both sides of Equation (3.4) on the right by X −1 . This yields,
(B − C)X X −1 = AX −1 =⇒ (B − C)(XX −1 ) = AX −1 =⇒ B − C = AX −1 .
(3.5)
Now that we’ve shown B − C is invertible, we are free to multiply both sides of Equation (3.4) on the left
by (B − C)−1 .
Note
Pay attention to how the multiplication and factoring was done in Example 3.4.1. In particular, notice
how X needed to be factored out on the right in Equation (3.4). The only reason this is possible is
because both B and C are multiplied by X on the right. If one had been multiplied by X on the
left, factoring in this way is not possible. For example, the following three algebraic manipulations
are incorrect:
Moreover, notice how whenever we multiplied by a matrix in Example 3.4.1, we were always
consistent with what side we multiplied on. Even though it seems like it shouldn’t be that big of a
deal, it is absolutely necessary because matrix multiplication is not commutative.
Another thing to make note of is how we needed to show B − C is invertible before we could
multiply by its inverse. This is very important. You must always argue why a matrix is invertible
before multiplying by its inverse.
Example 3.4.2
Let A, B, C, and X be n × n matrices with A, B, and X invertible. Solve the following equation for
X.
AX + B T = AXC. (3.6)
A−1 (AXC − AX) = A−1 B T =⇒ A−1 (AXC) − A−1 (AX) = A−1 B T =⇒ XC − X = A−1 B T . (3.8)
We need C − In to be invertible in order to solve for X. Since X is assumed invertible, we can multiply both
sides of Equation (3.9) by X −1 on the left to get
B T is invertible because B is. Therefore, C − In is a product of invertible matrices and, therefore, it too is
invertible. This means we can multiply both sides of Equation (3.9) by (C − In )−1 on the right to get
Subspaces
In this chapter, we study special subsets of Rn called subspaces. These are one of the primary areas of
studies in linear algebra.
A subset S ⊆ Rn of vectors is called a subspace of Rn if the following three properties hold for S:
1. ~0 ∈ S,
2. For any two ~u, ~v ∈ S, the sum ~u + ~v is contained in S (closure under addition),
Example 4.1.1
n o
There are two obvious subspaces of Rn : Rn itself and ~0 . These are referred to as “improper”
subspaces of Rn .
Exercise
n o
Prove that ~0 and Rn are subspaces of Rn .
163
164 CHAPTER 4. SUBSPACES
Showing that subsets of Rn are subspaces is generally not too hard. Exactly like showing something is linear,
all we need to do is verify the conditions in the definition.
Example 4.1.2
2. For the second condition, pick any two arbitrary vectors ~v1 and ~v2 in S. Write
2a1 − b1 2a2 − b2
~v1 = 0 , ~v2 = 0 , a1 , a2 , b1 , b2 , c1 , c2 ∈ R.
a1 + b1 + c1 a2 + b2 + c3
We must show the sum of these two vectors is in S. Calculating their sum yields
2a1 − b1 + 2a2 − b2 2(a1 + a2 ) − (b1 + b2 ) 2a3 − b3
~v1 +~v2 = 0 = 0 = 0
a1 + b1 + c1 + a2 + b2 + c2 (a1 + a2 ) + (b1 + b2 ) + (c1 + c2 ) a3 + b3 + c3
where a3 = a1 + a2 ∈ R, b3 = b1 + b2 ∈ R, and c3 = c1 + c2 ∈ R. Since ~v1 + ~v2 satisfies the criteria for
a vector to be in S, the sum ~v1 + ~v2 is in S. Therefore, the sum of any two vectors in S is also in S,
which verifies the second condition in the definition of subspaces.
3. For the third condition, pick any vector ~v ∈ S and any scalar r. Write
2a − b
~v = 0 , a, b, c ∈ R.
a+b+c
We must show r~v ∈ S. Calculating this product yields
r(2a − b) 2(ra) − (rb) 2d − e
r~v = 0 = 0 = 0 ,
All three conditions have now been verified, therefore we conclude that S is a subspace of R3 . ♦
Theorem 4.1.1
Let {~v1 , ~v2 , . . . , ~vk } ⊆ Rn be any subset of vectors in Rn . Then, S = span {~v1 , ~v2 , . . . , ~vk } is a subspace
of Rn .
1. ~0 ∈ S because
~0 = 0~v1 + 0~v2 + . . . + 0~vk ∈ S.
Calculating ~u + ~v yields
Since ai + bi ∈ R for each i ∈ {1, 2, . . . , k}, this shows ~u + ~v is a linear combination of ~v1 , ~v2 , . . . , ~vk .
Therefore, ~u + ~v ∈ S.
w
~ = c1~v1 + c2~v2 + . . . + ck~vk .
rw
~ = r(c1~v1 + c2~v2 + . . . + ck~vk ) = (rc1 )~v1 + (rc2 )~v2 + . . . + (rck )~vk .
All three conditions in the definition of subspace are verified. Therefore, S = span {~v1 , ~v2 , . . . , ~vk } is a sub-
space of Rn .
Theorem 4.1.1 gives another way of showing a subset of Rn is a subspace: write it as the span of a set of
vectors in Rn .
166 CHAPTER 4. SUBSPACES
Example 4.1.3
a+b+c 1 1 1
2
−1 0
This shows any vector ~v ∈ S is in span 0 , 0 , 0 . Conversely, it is obvious that
1 1 1
any vector in this span is also in S. Therefore,
2
−1 0
S = span 0 , 0 , 0 ,
1 1 1
Example 4.1.4
The span of a single, non-zero vector in R2 or R3 is interpreted as a line through the origin. The span
of two non-zero, linearly independent vectors in R3 is interpreted as a plane through the origin. These
are subspaces by Theorem 4.1.1. However, arbitrary lines in R2 or R3 are not generally subspaces
because they may not pass through the origin. A similar argument is true for arbitrary planes in R3 .
4.2. COLUMN AND NULL SPACE 167
Let A be an n × k matrix and write A = [ ~a1 ~a2 . . . ~ak ] . The column space of A, denoted Col(A),
is the span of the columns of A. That is,
Exercise
Let A be an n × k matrix and let F : Rk → Rn be the linear transformation with standard matrix
A. Prove that Col(A) is equal to the range of F . (Note that this implies the range of a linear
transformation is always a subspace of its codomain).
Example 4.2.1
−2 3 −20 8 0
Let A = 1 3 −17 . Determine whether the vectors ~b = 5 , ~c = 2 are in the
−7 −1 −1 5 1
column space of A. If they are, write the vector as a linear combination of the columns of A.
Solution. By definition, Col(A) is equal to the span of the columns of A. Therefore, asking if a vector ~b is
in Col(A) is the same as asking if ~b can be written
h as a linear
i combination of the columns of A. We know
~
how to answer this! Form the augmented matrix A b and row reduce!
h i −2 3 −20 8 1 0 1 −1
A | ~b = 1 3 −17 5 ∼ 0 1 −6 2 .
−7 −1 −1 5 0 0 0 0
The matrix is consistent, hence the equation A~x = ~b has a solution. This shows that ~b ∈ Col(A). The vector
form of the solution is
x1 −1 −1
x2 = 2 + s 6 , s ∈ R.
x3 0 1
168 CHAPTER 4. SUBSPACES
5 −7 −1 −1
−7 −1 −1 1 0 0 0 1
Since the rightmost column is a pivot column, this matrix corresponds to an inconsistent linear system.
Thus, A~x = ~c has no solution and, hence, ~c is not a linear combination of the columns of A. This shows
~c 6∈ Col(A). ♦
The second important subspace attached to any matrix is called the null space.
Let A be an n × k matrix. The null space of A, denoted Null(A), is the set of all solutions to the
homogeneous equation A~x = ~0. In set notation,
n o
Null(A) = ~v ∈ Rk : A~v = ~0 .
Determining if a given vector ~v is in the null space of a matrix A is easy. All you need to do is calculate A~v
and see if it is zero. If it’s zero, then ~v is in Null(A). If it’s not zero, then ~v is not in Null(A).
Example 4.2.2
" # 0 2
2 1 3
Let A = . Determine if the vectors ~u = 1 , ~v = −1 are in Null(A).
1 2 0
−1 −1
Solution. We calculate A~u and A~v and see if either of these products is zero. For the first,
" # 0 " # " #
2 1 3 2(0) + 1(1) + 3(−1) −2
A~u = 1 = = 6 ~0.
=
1 2 0 1(0) + 2(1) + 0(−1) 2
−1
4.2. COLUMN AND NULL SPACE 169
The definition of Null(A) does not immediately imply it is a subspace. Therefore, we need to prove this.
Theorem 4.2.1
Proof. We need to check the three conditions in the definition of a subspace. The first condition is trivial
because A~0 = ~0 always, so ~0 ∈ Null(A).
For the second, let ~u, ~v ∈ Null(A). Then A~u = ~0 and A~v = ~0. Then, using part 1 of Matrix Vector
Multiplication Properties yields,
A(~u + ~v ) = A~u + A~v = ~0 + ~0 = ~0.
~ = ~0, so rw
Thus, A(rw) ~ ∈ Null(A).
All three conditions in the definition of a subspace have been verified. Therefore, Null(A) is a subspace of
Rk .
Warning!
For an n × k matrix A, Col(A) is subspace of Rn and Null(A) is a subspace of Rk . Try not to confuse
these.
170 CHAPTER 4. SUBSPACES
Example 4.3.1
Theorem 4.3.1
Let ~v1 , ~v2 , . . . , ~vm ∈ Rn . Let S = span {~v1 , ~v2 , . . . , ~vm } and suppose ~v ∈ S. Then,
~ ∈ span {~v1 , ~v2 , . . . , ~vm , ~v } . Then, there exist scalars e1 , e2 , . . . , em , e such that
Now suppose w
w
~ = e1~v1 + e2~v2 + . . . em~vm + e~v .
w
~ = e1~v1 + e2~v2 + . . . em~vm + e(c1~v1 + c2~v2 + . . . + cm~vm )
Therefore, span {~v1 , ~v2 , . . . , ~vm , ~v } ⊆ S. This shows the two sets are equal.
1. If we know a basis for a subspace S, then we can reconstruct all of S from the vectors in the basis.
Thus, if we know a basis, we know everything about the subspace.
2. The linear independence condition in the definition tells us that any representation of a vector ~v ∈ S
as a linear combination of the basis vectors is unique. To see this, let S be a subspace of Rn and let
{~v1 , ~v2 , . . . , ~vm } be a basis for S. Let ~v ∈ S and suppose there are two representations of ~v as a linear
combination of the basis vectors:
Since ~v1 , ~v2 , . . . , ~vm are linearly independent, all the scalars in the above equation must be zero. This
implies
c1 = d1 , c2 = d2 , . . . cm = dm
so that the two representations are the same. Therefore, any ~v ∈ S has a unique representation as a
linear combination of the basis vectors.
Does every subspace have a basis? If the subspace is not the zero subspace, then the answer is yes.
Theorem 4.3.2
Every non-zero subspace S ⊆ Rn has a basis.
Proof. Start by picking a set of linearly independent vectors B = {~v1 , ~v2 , . . . , ~vm } in S that is maximal in
the following sense: if we add any other vector from S into B, then B becomes linearly dependent. Such a set
necessarily exists because S ⊆ Rn and, therefore, any linearly independent subset of vectors in S contains
at most n vectors by Corollary 2.6.4.
Pick a vector ~v ∈ S not in B. Then, the set {~v1 , ~v2 , . . . , ~vm , ~v } is linearly dependent. Therefore, there exist
scalars c1 , c2 , . . . , cm , c, not all zero, such that
This shows ~v ∈ span {~v1 , ~v2 , . . . , ~vm } . Since ~v ∈ S is arbitrary, this shows S = span {~v1 , ~v2 , . . . , ~vm } and,
therefore, B is a basis for S.
Exercise
Use Theorem 4.3.2 to prove every subspace can be written as the span of a finite set of vectors. This
is the converse to Theorem 4.1.1.
n o
What about the zero subspace ~0 ? Does this subspace have a basis? I leave the determination of this
question as an exercise to the reader.
Exercise
Determine if the zero subspace has a basis.
The rest of this section is dedicated to methods for calculating bases of Col(A) and Null(A). We begin with
calculating bases for null spaces. This is easiest shown using an example.
Example 4.3.2
" #
1 3 0 −1
Let A = . Find a basis for Null(A).
3 2 1 −4
Solution. The first step is to determine the vector form of the solution to the homogeneous equation. The
RREF of A is the following. " #
1 0 3/7 −10/7
A∼ .
0 1 −1/7 1/7
The vector form of the solution to A~x = ~0 is
x1 −3/7 10/7
x
2
1/7 −1/7
~x = = s + t = s~v1 + t~v2 , s, t ∈ R.
x3 1 0
x4 0 1
This shows every solution to the homogeneous equation is a linear combination of ~v1 and ~v2 above. Further-
more, the vectors ~v1 and ~v2 are linearly independent by construction; s = t = 0 can be the only solution to
s~v1 + t~v2 = ~0 due to the third and fourth components. Hence, a basis for Null(A) is
−3/7 10/7
1/7 −1/7
, . ♦
1 0
0 1
This method to find a basis for the null space of a matrix always works. All you need to do is write the
vector form of the solution to the homogeneous equation A~x = ~0. The vectors in the vector form of the
4.3. BASES FOR SUBSPACES 173
Theorem 4.3.3
Let A be an n × k matrix. Let B be an echelon form of A. Then, the columns of A that correspond
to the pivot columns in B form a basis for the column space of A.
Note
Be careful with this theorem. When you pick the basis for Col(A), make sure you are taking the
columns of A that correspond to the pivot columns in B. Do not take the pivot columns of B.
This will give you the wrong answer in general.
Example 4.3.3
2 8 −2
−1 −4 1
Let A =
3 −6 . Find a basis for Col(A).
3
5 5 0
1 −2 1
The first two columns of the echelon form are pivot columns. Therefore, by Theorem 4.3.3, a basis for the
column space of A consists of the first two columns of A. That is, a basis for Col(A) is
2 8
−1 −4
3 , −6 . ♦
5 5
1 −2
Proof. Switching the order of the columns of A if necessary, we lose no generality in assuming that the
first m columns of B are pivot columns, and the last n − m columns of B are non-pivot columns. Consider
the matrix A0 = [ ~a1 ~a2 . . . ~am ] . Then, the same row operations that transform A into B will put A0 into
echelon form. By construction, this echelon form of A0 has a pivot in every column. Therefore, ~a1 , ~a2 , . . . , ~am
174 CHAPTER 4. SUBSPACES
Now consider A0j = [ ~a1 ~a2 . . . ~am | ~am+j ] for any j ∈ {1, 2, . . . , n − m} . Once again, the same row op-
erations that transform A into B will put A0j into echelon form. By construction, the first m columns of
this echelon form are pivot columns, so the last column is a non-pivot column. Therefore, ~am+j is a linear
combination of ~a1 , ~a2 , . . . , ~am for every j ∈ {1, 2, . . . , n − m} . Repeated applications of Theorem 4.3.1 then
yields
Col(A) = span {~a1 , ~a2 , . . . , ~am , ~am+1 , . . . , ~an } = span {~a1 , ~a2 , . . . , ~am } .
This shows the columns of A that correspond to the pivot columns of B form a basis for Col(A).
Suppose you are given a subspace S and a set of vectors that spans that subspace. You can always reduce
the spanning set to a basis of S using a column space argument.
Example 4.3.4
In other words, S = span {~v1 , ~v2 , ~v3 , ~v4 } . Determine a basis for S.
Solution. Form the matrix A = [~v1 ~v2 ~v3 ~v4 ] . Then, Col(A) = span {~v1 , ~v2 , ~v3 , ~v4 } = S. Therefore,
determining a basis for S is the same as determining a basis for Col(A)! The RREF of A is,
−2 −6 10 −2 1 0 1 0
1 3 −5 1 0 1 −2 0
A= ∼ .
3 1 1 −3 0 0 0 1
1 5 −9 6 0 0 0 0
The first, second, and fourth columns are pivot columns. Therefore, by Theorem 4.3.3, a basis for Col(A) = S
is
−2 −6 −2
1 3 1
{~v1 , ~v2 , ~v4 } = , , . ♦
3 1 −3
1 5 6
We end this section by summarizing the steps for finding bases of column and null space.
Let A be an n × k matrix.
4.3. BASES FOR SUBSPACES 175
Step 3: A basis for Col(A) consists of the columns of the original matrix A that correspond to the
pivot columns of B.
Step 2: Use the RREF of A to write down the vector form of the solution to the homogeneous
equation A~x = ~0.
Step 3: The vectors that appear in the vector form of the solution calculated in step 2 are a basis
for Null(A).
176 CHAPTER 4. SUBSPACES
Theorem 4.4.1
Let S ⊆ Rn be a non-zero subspace. Then, any two bases for S contain the same number of vectors.
Proof. Let B1 = {~u1 , ~u2 , . . . , ~ur } and B2 = {~v1 , ~v2 , . . . , ~vs } be two bases for S. First, suppose r < s. Since
the vectors in B2 are contained in S, and B1 spans S, there exist scalars cij for each i ∈ {1, 2, . . . , s} and
j ∈ {1, 2, . . . , r} such that
~v1 = c11 ~u1 + c12 ~u2 + . . . + c1r ~ur ,
~v2 = c21 ~u1 + c22 ~u2 + . . . + c2r ~ur ,
..
.
~vs = cs1 ~u1 + cs2 ~u2 + . . . + csr ~ur .
Let x1 , x2 , . . . , xs be variables and consider the vector equation
The ~vi ’s form a basis for S, hence are a linearly independent set. Therefore, the only solution to this vector
equation is x1 = x2 = . . . = xs = 0. Now substitute the expressions for the ~vi ’s in terms of the ~ui ’s into the
vector equation (4.2)
x1 (c11 ~u1 + c12 ~u2 + . . . + c1r ~ur ) + x2 (c21 ~u1 + c22 ~u2 + . . . + c2r ~ur ) + . . . + xs (cs1 ~u1 + cs2 ~u2 . . . + csr ~ur ) = ~0.
Rearranging gives
(c11 x1 +c21 x2 +. . .+cs1 xs )~u1 +(c12 x1 +c22 x2 +. . .+cs2 xs )~u2 +. . .+(c1r x2 +c2r x2 +. . .+csr xs )~ur = ~0 (4.3)
Since the ~ui ’s also form a linear independent set, the coefficients on each must be zero. Therefore, we get
the following linear system from Equation (4.3).
This linear system has r equation and s variables. Since r < s, the system necessarily has infinitely many
solutions. Each such non-trivial solution gives a non-trivial solution to the vector equation in (4.2), which is
not possible. Therefore, it must be the case that r ≥ s. In this case, if r > s the exact same argument with
the roles of the two bases reversed gives the same contradiction we derived above. The only option left is
that r = s which means B1 and B2 have the same number of elements. Since these were any two arbitrary
bases of S, the result follows.
4.4. DIMENSION OF SUBSPACES 177
Theorem 4.4.1 states that all bases of a non-zero subspace S have the same number of elements. Therefore,
we are permitted to make the following definition.
Example 4.4.1
The standard basis {~e1 , ~e2 , . . . , ~en } is a basis for Rn for each positive integer n. This means the
dimension of Rn is n (whence the name n-dimensional Euclidean space).
Example 4.4.2
The dimension of the column space in Example 4.3.3 is 2. The dimension of the null space in Example
4.3.2 is 2. The dimension of the subspace in Example 4.3.4 is 3.
Exercise
n o
What is the dimension of the zero subspace ~0 ?
4.4.1 The Basis Theorem and Three Useful Corollaries for Calculating Bases
and Dimensions
In this section, we introduce a theoretical result called The Basis Theorem. This is a theorem that tells
you how to construct a basis of a subspace S from any spanning set of S and how to turn any linearly
independent set of S into a basis of S.
1. If B is linearly independent, then either B is a basis for S, or at most finitely many vectors from
S can be added to B to make it a basis.
2. If the vectors in B span S, then either B is a basis for S, or vectors can be removed from B to
turn it into a basis for S.
Proof.
1. If B is already a basis there is nothing to prove. Thus, suppose B is a linearly independent set that
is not a basis. Pick any w~ 1 ∈ S that does not belong to span {B} = span {~v1 , ~v2 , . . . , ~vm } . Then, the
178 CHAPTER 4. SUBSPACES
set B1 = {~v1 , ~v2 , . . . , ~vm , w~ 1 } is linearly independent. The proof of this is left as an exercise. If B1
is a basis, then we are done. If not, pick another vector w ~ 2 ∈ S that is not in span {B1 }. Then,
B2 = {~v1 , ~v2 , . . . , ~vm , w ~ 2 } is a linearly independent set. Iterate this process; it must terminate
~ 1, w
eventually because all of these vectors are in Rn and, in Rn , a set with more than n vectors is not
linearly independent by Corollary 2.6.4. Once the process terminates, we are left with a set of linearly
independent vectors whose span is equal to S, i.e. a basis for S. Therefore, we have added finitely
many vectors to B to get a basis for S.
2. Suppose B spans S. Form the matrix A = [ ~v1 ~v2 . . . ~vm ]. Then, S = Col(A). Row reduce A to
an echelon form B. Switching the order of the columns if necessary, we lose no generality in assum-
ing that the first ` columns of B are pivot columns. By Theorem 4.3.3, a basis for Col(A) = S is
{~v1 , ~v2 , . . . , ~v` } ⊆ B where ` ≤ m. Therefore, we have found a basis for S by removing finitely many
vectors from B.
Exercise
Let Let ~v1 , ~v2 , . . . , ~vk ⊆ Rn be a linearly independent set of vectors and suppose that ~v ∈ Rn is such
that,
~v 6∈ span {~v1 , ~v2 , . . . , ~vk } .
The Basis Theorem is a useful result, particularly from a theoretical standpoint. Our main use of it is that
it leads to the following three results that allow us to easily answer a wide variety of questions about bases
and dimension of subspaces.
Let S ⊆ Rn be a non-zero subspace of dimension m. Let B = {~v1 , ~v2 , . . . , ~vm } be a set of vectors in
S.
2. If B spans S (so span {~v1 , ~v2 , . . . , ~vm } = S), then B is a basis for S.
Example 4.4.3
−1
2 0
Is B = 2 , −1 , 9 a basis for R3 ?
1 0 1
Solution. Since R3 has dimension 3, by The Basis Corollary, we need only check that B spans R3 . Row
4.4. DIMENSION OF SUBSPACES 179
1 0 1 0 0 −5
This echelon form of A has a pivot in every row. Therefore, its columns span R3 by The Span Theorem.
Hence, by The Basis Corollary, B is a basis for R3 . ♦
Example 4.4.4
Solution. Form the matrix A = [ ~v1 ~v2 ~v3 ] and row reduce to an echelon form.
−1 0 1 1 0 0
2 1 2 0 1 0
A= ∼ .
1 1 1 0 0 1
1 2 5 0 0 0
The echelon form has a pivot in every column. By The Linear Independence Theorem Version, {~v1 , ~v2 , ~v3 }
is a linearly independent set. Therefore, since dim(S) = 3, The Basis Corollary implies they form a basis for
S. ♦
Proof. I prove part 1 and leave part 2 as an exercise for the reader because it is similar.
Suppose B is a linearly independent set. By way of contradiction, suppose B does not span S. Then, part
1 of The Basis Theorem implies we can add a finite number of vectors in S to B in order to get a basis for
S. But then, a basis for S would consist of more than m vectors. This is impossible as S has dimension m.
Therefore, B must span S, so it is a basis.
Exercise: P
rove the second part of The Basis Corollary.
Corollary 4.4.1
Let S ⊆ Rn be a non-zero subspace of dimension m. Let B = {~v1 , ~v2 , . . . , ~v` } be a set of vectors in S.
Example 4.4.5
By The Linear Independence Theorem Version, {~v1 , ~v2 , ~v3 } is linearly independent set. Since dim(S) = 2,
part 1 of Corollary implies any linearly independent subset of S contains at most 2 vectors. Therefore,
~v1 , ~v2 , ~v3 can not all simultaneously be in S. ♦
Proof. I only prove the first part and leave the second as an exercise for the reader.
Suppose B is a linearly independent set. Part 1 of The Basis Theorem implies either B is a basis for S or
that we may add vectors from S to B in order to get a basis for S. B clearly can’t be a basis because it has
more vectors in it than the dimension of S. Moreover, we can’t add vectors from S to B to get a basis for S
because B already has too many vectors in it. Thus, B can not be linearly independent.
Exercise
Prove the second part of Corollary 4.4.1.
The final corollary tells us how subspaces of different dimensions “fit inside” of one another.
4.4. DIMENSION OF SUBSPACES 181
The Nested Dimensions Corollary allows us to answer a number of questions about subspaces even in cases
where it seems like we don’t have enough information.
Example 4.4.6
Let S ⊆ R4 be a subspace with S 6= R4 . Suppose ~v ∈ S and ~v 6= ~0. What are the possible dimensions
of S.
Solution. Since S 6= R4 , The Nested Dimensions Corollary implies that dim(S) < dim(R4 ) = 4. Since
~0 6= ~v ∈ S, S is not the zero subspace. Therefore, dim(S) ≥ 1. Hence, 1 ≤ dim(S) < 4 so the possible values
for dim(S) are 1, 2, and 3. ♦
Proof. Let dim(S1 ) = m1 and dim(S2 ) = m2 . Let B1 = {~v1 , ~v2 , . . . , ~vm1 } be a basis for S1 . Since S1 is a
subset of S2 , it follows that B1 ⊆ S2 . As B1 is a linearly independent set, either B1 is a basis for S2 , in which
case m1 = m2 , or, by part 1 of The Basis Theorem, we can add vectors from S2 to B1 to get a basis for S2 ,
in which case m1 < m2 . In either case, m1 ≤ m2 .
For the last statement, first suppose dim(S1 ) = dim(S2 ). Let B1 be as above. Then, B1 ⊆ S1 ⊆ S2 , so that
B1 is a linearly independent set of vectors in S2 , and the number of vectors in B1 is equal to the dimension
of S2 . Therefore, by The Basis Corollary, B1 is a basis for S2 , so span {B1 } = S2 . But span {B1 } = S1 as B1
is a basis for S1 . Therefore, S1 = S2 .
Conversely, suppose S1 = S2 . Then, since B1 is a basis for S1 , B1 is also a basis for S2 (since the two
subspaces are equal). Thus, dim(S2 ) is equal to the number of vectors in B1 , which is exactly dim(S1 ).
The following exercise is useful for getting used to the The Nested Dimensions Corollary.
Exercise
Prove that the only subspace S of Rn that has dimension n is Rn itself.
Here are some more examples of the types of questions we can solve using all three of the corollaries from
this section.
Example 4.4.7
Let S ⊆ R3 be a non-zero subspace of R3 . Suppose ~v1 , ~v2 ∈ S with ~v1 6= c~v2 for any scalar c ∈ R.
Furthermore, suppose S 6= R3 . Is {~v1 , ~v2 } necessarily a basis for R3 ? Explain your answer.
182 CHAPTER 4. SUBSPACES
Solution. Yes it is. Since ~v1 6= c~v2 for any c ∈ R, {~v1 , ~v2 } is a linearly independent set of vectors by
Corollary 2.6.2. Therefore, by part 1 of The Basis Theorem, dim(S) ≥ 2. Since S ⊆ R3 , but S 6= R3 , The
Nested Dimensions Corollary implies dim(S) < 3. Therefore, 2 ≤ dim(S) < 3 which implies dim(S) = 2.
Therefore, {~v1 , ~v2 } is a set of 2 vectors in a subspace of dimension 2, hence it is a basis for S by part 1 of
The Basis Corollary. ♦
Example 4.4.8
5
Let S ⊆ R be
a subspace that contains 4 linearly independent vectors ~v1 , ~v2 , ~v3 , ~v4 . Suppose the
1
0
0 is not in S. Is {~v1 , ~v2 , ~v3 , ~v4 } necessarily a basis for S?
vector ~v =
0
0
Solution. Yes. Since S contains 4 linearly independent vectors, part 1 of The Basis Theorem implies
4 ≤ dim(S). Since S ⊆ R5 , The Nested Dimensions Corollary implies dim(S) ≤ 5. If dim(S) = 5, then
S = R5 by The Nested Dimensions Corollary. This is impossible because ~v is in R5 but is not in S. There-
fore, dim(S) 6= 5 so that dim(S) = 4. Therefore, since {~v1 , ~v2 , ~v3 , ~v4 } is a linearly independent set, part 1 of
The Basis Corollary implies these vectors are a basis for S. ♦
4.5. THE RANK-NULLITY THEOREM 183
Let A be an n × k matrix. The rank of A, denoted rank(A), is the dimension of the column space
of A. The nullity of A, denoted nul(A), is the dimension of the null space of A.
Example 4.5.1
The rank of the matrix in Example 4.3.3 is 2. The nullity of the matrix in Example 4.3.2 is 2.
Let A be an n × k matrix and let B be an echelon form of A. rank(A) is equal to the number of pivot
columns of B. What about nul(A)? A basis for Null(A) consists of the vectors in the vector form of the
solution to A~x = ~0. Each of these vectors corresponds to a non-pivot column of B. Therefore, nul(A) equals
the number of non-pivot columns of B. With this observation, the following important theorem is evident.
Example 4.5.2
Example 4.5.3
Let A be 8 × 6. Can rank(A) = 7? Justify your answer using The Rank-Nullity Theorem.
Solution. No it can’t. By The Rank-Nullity Theorem, rank(A) + nul(A) = 6. Thus, if rank(A) = 7, then
which is impossible. Therefore, rank(A) can not be 7. In fact, the maximum that rank(A) could be is 6. ♦
Proof. Let B be an echelon form of A. Every column of B is either a pivot column or a non-pivot column.
rank(A) is equal to the number of pivot columns of B and nul(A) is the number of non-pivot columns of B.
184 CHAPTER 4. SUBSPACES
Therefore,
rank(A) + nul(A) = total number of columns of B = k.
In the context of this course, The Rank-Nullity Theorem might seem a bit trivial. This is because we’ve
rendered many of the problems to row reducing a matrix and counting pivot columns and rows. However,
this theorem is very important. The Rank-Nullity Theorem holds in much more general settings. In our case,
it is easy because we have matrix representations of linear transformations. In many situations, however,
matrices are a non-available luxury.
4.6. ROW SPACE 185
Let A be an n × k matrix. The row space of A, denoted Row(A), is the span of the rows of A when
they’re considered as vectors in Rk .
Note
Example 4.6.1
2 7 0
9 −1 7
Let A = . The rows of A, considered as vectors in R3 , are
1 −1 1
0 0 1
2 9 1 0
7 , −1 , −1 , 0 .
0 7 1 1
Therefore, Row(A) is
2
9 1 0
Row(A) = span 7 , −1 , −1 , 0 .
0 7 1 1
The main result of this section is dim(Row(A)) = rank(A). This is a very interesting result. Here is why:
Let A = [ ~a1 ~a2 . . . ~ak ] be n × k. Let m = rank(A). Then, any set of linearly independent columns of A
contains at most k vectors. The result dim(Row(A)) = rank(A) means that the same relation holds for the
rows of A as well. This means, regardless of how we construct A, the maximal number of linearly indepen-
dent columns of A is always the same as the maximal number of linearly independent rows! Here, a subset
S of the columns/rows of A is maximally linearly independent if S is linearly independent, but if any other
column/row is added to S, then the subset becomes linearly dependent. How cool is this!?
We need a number of preliminary results to build to the main result of this section. The aim of these results
is to show how we can find a basis of the row space of a matrix using Gauss-Jordan Elimination. The first
result shows that the non-zero rows of an echelon form of a matrix are linearly independent (when considered
as column vectors).
186 CHAPTER 4. SUBSPACES
Theorem 4.6.1
Let A be an n×k matrix and let B be an echelon form of A. Then, the non-zero rows of B, considered
as vectors in Rk , are a linearly independent set.
Proof. Let ~r1 , ~r2 , . . . , ~rm denote the non-zero rows of B, in the correct order, considered as column vectors
in Rk , 1 ≤ m ≤ n. Then, ~r1 is the first non-zero row of B considered as a column vector, ~r2 is the second
non-zero row of B considered as a column vector, and so on. Because B is in echelon form, every non-zero
row has a pivot position, and every pivot position is in a column to the right of the pivot position above it.
We exploit this fact to show these vectors are linearly independent.
Suppose the first non-zero entry in ~r1 occurs in the i1 th component, where 1 ≤ i1 ≤ k. Denote this entry by
a1 . Then a1 is the left most pivot in B. Because B is in echelon form, every entry in B in the same column
as a1 below a1 is zero. Therefore, the i1 th component in all of ~r2 , . . . , ~rm is zero. Consider
Now let a2 be the first non-zero entry in ~r2 , and suppose it occurs in the i2 th component, i1 < i2 ≤ k.
As before, the i2 th component must be equal to zero in ~r3 , . . . , ~rm because a2 corresponds to a pivot in B.
Thus, if we look at the i2 th component of the vector equation in Equation (4.5), we have
Repeating this process for each ~ri yields c1 = c2 = . . . = cm = 0. Therefore, the vector equation
has only the trivial solution. This shows {~r1 , . . . , ~rm } is linearly independent. ♦
Theorem 4.6.2
Let A be an n × k matrix and let B be an echelon form of A. Then, the non-zero rows of B, when
considered as vectors in Rk , form a basis for Row(A).
Example 4.6.2
−2 2 0 −5
Let A = 3 1 −1 2 . Calculate a basis for Row(A).
4 2 −4 1
0 0 −5/2 −3/4
Warning!
The row space basis consists of the rows of the echelon form of A, not the rows in the original matrix.
This is different from the column space basis. Be careful not to confuse the two!
Proof. Let ~a1 , ~a2 , . . . , ~an denote the rows of A considered as column vectors in Rk and, let ~r1 , ~r2 , . . . , ~rm
denote the non-zero rows of B considered as column vectors in Rk , 1 ≤ m ≤ n. We start with a claim.
Proof. The rows of B are obtained from the rows of A using elementary row operations. It suffices to show
the span of the rows of A is unchanged under these elementary row operations.
Swapping row. If we interchange two rows ~ai and ~aj , this clearly does not change span {~a1 , ~a2 , . . . , ~an } .
Scaling rows. Suppose we scale the row ~ai by a non-zero scalar c. Then, it is a fact that
span {~a1 , ~a2 , . . . , ~ai , . . . , ~an } = span {~a1 , ~a2 , . . . , c~ai , . . . , ~an } .
Row replacement. Suppose we perform the row operation ai + raj for some non-zero scalar r. We show
S1 = span {~a1 , ~a2 , . . . , ~ai , . . . , ~an } = span {~a1 , . . . , ~ai + r~aj , . . . , ~an } = S2 .
First, let ~v ∈ S1 . Without loss of generality, suppose i < j. Then, there exist scalars c1 , c2 , . . . , ci , . . . , cj , . . . cn
such that
~v = c1~a1 + c2~a2 . . . + ci~ai + . . . + cj~aj + . . . + cn~an .
Adding and subtracting rci~aj from the right hand side yields
which is a linear combination of the vectors ~a1 , ~a2 , . . . , ~ai + r~aj , . . . , ~an . Thus, ~v ∈ S2 . The other direction
follows similarly (convince yourself!)
This shows the elementary row operations do not change the span of the rows of A. The rows of B are
constructed from the rows of A using elementary row operations. Therefore,
n o
span {~a1 , ~a2 , . . . , ~an } = span ~r1 , ~r2 , . . . , ~rm , ~0, . . . , ~0 = span {~r1 , ~r2 , . . . , ~rm }
By the claim,
Row(A) = span {~r1 , . . . , ~rm } .
By Theorem 4.6.1, {~r1 , . . . , ~rm } is a linearly independent set. Thus, {~r1 , . . . , ~rm } is a basis for Row(A).
Exercise
Prove the 3 things left as exercises in the proof of Theorem 4.6.2.
Example 4.6.3
0 −4 0
0 0 0
By Theorem 4.6.2, the non-zero rows of this matrix, when considered as column vectors, form a basis for
Row(A1 ). Since Row(A1 ) = S1 , a basis for S1 is
1
0
2 , −4 .
1 0
5 2 1
0 0 −7
Note
When using row space to calculate bases for subspaces, we can row reduce to whatever echelon form
we want. This will give different bases, of course, but they are all correct.
Theorem 4.6.3
Proof. Let B be an echelon form of A. Theorem 4.6.2 implies that dim(Row(A)) is the number of non-zero
rows in B. By Theorem 4.3.3, rank(A) is the number of columns of B with a pivot in them. Each non-zero
row in B has exactly one pivot, and each of these different pivots have to occur in different columns by
definition of the echelon form. Thus, the number of columns with a pivot is exactly equal to the number of
non-zero rows of B; i.e. dim(Row(A)) = rank(A).
The following relation between the rank of A and its transpose is easily proved using Theorem 4.6.3.
Exercise
In this section, we give an alternative method for calculating a basis for Col(A). This method calculates one
particular basis, called the canonical basis for Col(A). This is the basis most computer software will output
if you ask it to calculate a basis for the column space of a matrix.
Let A be n × k. By definition, the rows of A are the columns of AT , and vice versa. Therefore, the span
of the rows of A, when considered as vectors in Rk , is the same as the span of the columns of AT . That
is, Row(A) = Col(AT ) and, similarly, Row(AT ) = Col(A). With this observation in mind, we define the
following.
Let A be an n × k matrix. The canonical basis for Col(A) is the basis obtained from the non-zero
rows of the reduced row echelon form of AT .
It is clear that the non-zero rows of the RREF of AT form a basis for Col(A) because Row(AT ) = Col(A).
We give some examples of calculating the canonical basis.
4.6. ROW SPACE 191
Example 4.6.4
Let
−2 −4 −1 3 −7
3 2 6 12 −1 11 17
" #
1 −1 −2 3
A = 4 3 , B= , C= 2 4 −1 7 .
5
2 0 3 −1
5 6 0 0 2 −10 2
5 10 1 0 16
For B,
1 2
−1 0
BT = ,
−2 3
3 −1
and its RREF is
1 0
0 1
BT ∼ .
0 0
0 0
Thus, the canonical basis for Col(B) is (" # " #)
1 0
, .
0 1
For C,
−2 6 2 0 5
−4 12 4 0 10
CT =
−1 −1 −1 2 1 ,
3 11 7 −10 0
−7 17 5 2 16
192 CHAPTER 4. SUBSPACES
Why do we bother with the canonical basis? There are a couple reasons. First, it is unique. Secondly, if A
is n × k and Col(A) = Rk , then the canonical basis is always the standard basis for Rk . The algorithm for
column space bases we developed prior will not return this in general.
4.7. THE INVERTIBLE MATRIX THEOREM 193
1. A is invertible.
2. The RREF of A is In .
6. FA is one-to-one.
9. FA is onto.
12. AT is invertible.
14. Col(A) = Rn .
15. rank(A) = n.
16. nul(A) = 0.
n o
17. Null(A) = ~0 .
18. dim(Row(A)) = n.
19. Row(A) = Rn
Proof. We have already shown the first 12 conditions are equivalent. Therefore, it doesn’t matter which
one of these we pick to prove the chain of equivalences. For convenience, we show the new conditions are
equivalent to 5, the columns of A being linearly independent.
194 CHAPTER 4. SUBSPACES
5 =⇒ 13: Suppose the columns of A are linearly independent. Then, the columns of A are a set of n
linearly independent vectors in Rn . Since dim(Rn ) = n, part 1 of The Basis Corollary implies the columns
of A are a basis for Rn .
Determinants
In this chapter, we introduce determinants. The idea of a determinant goes back some 2000 years. Ancient
mathematicians had some idea of what a determinant was in the context of solutions to linear systems, but
didn’t have the language of matrices to make it precise. In fact, Leibniz knew that if certain coefficient iden-
tities were satisfied, then homogeneous linear systems with three equations in three variables had non-trivial
solutions. These identities are essentially what the determinant of a 3 × 3 matrix is.
Determinants have many uses which we will see in this document. For example, they can be used to deter-
mine whether or not a square matrix is invertible and they generalize volume to higher dimensional Euclidean
spaces. Another important property of determinant is that they are continuous. This is useful for theoretical
purposes that, unfortunately, extend beyond the score of this document.
Note
From now on, unless otherwise stated, all matrices are square.
Defining determinants is a little bit tricky. The definition I’ll use in this document is based off of a recursion.
There are other ways to define determinants, but for us, this one will suffice.
First we introduce some notation. Let A be an n × n matrix. Denote by Aij the (n − 1) × (n − 1) sub-matrix
of A that results from deleting the ith row jth column of A.
195
196 CHAPTER 5. DETERMINANTS
Example 5.1.1
2 7 1 0
0 8 −2 6
Let A = . Determine A32 , A14 , A23 , A11 , A21 ?
1 1 1 1
1 2 −7 0
Solution. A32 is the 3 × 3 sub-matrix of A that results from deleting the third row and second column.
Thus,
2 1 0
A32 = 0 −2 6 .
1 −7 0
Similarly,
0 8 −2 2 7 0 8 −2 6 7 1 0
A14 = 1 1 1 , A23 = 1 1 1 , A11 = 1 1 1 , A21 = 1 1 1 . ♦
1 2 −7 1 2 0 2 −7 0 2 −7 0
This definition probably looks a bit dubious. How can we define cofactors based off of determinants, and
then define determinants based off of cofactors? Here is how it works: Cofactors of 3 × 3 matrices can be
calculated because they are determinants of 2 × 2 matrices. Therefore, we can calculate 3 × 3 determinants.
To calculate the determinant of a 4 × 4 matrix, we need to calculate the cofactors, which are determinants of
3 × 3 matrices. We know we can calculate these, therefore we can calculate determinants of 4 × 4 matrices.
The same goes for 5 × 5 matrices, 6 × 6 and etcetera.
For a general n × n matrix A, calculation of the determinant is based off of calculating n cofactors, each
of which is a determinant of an (n − 1) × (n − 1) matrix. To calculate each of these cofactors, we need to
5.1. CALCULATION OF DETERMINANTS 197
n
X
A
det(A) = aij0 Cij 0
, This is called cofactor expansion down the j0 th column.
i=1
Cofactor Expansion implies we can expand across/down which ever row/column we like to calculate det(A)
so we aren’t restricted to doing it along the first row. Generally, Cofactor Expansion greatly reduces the
amount of work needed to calculate det(A). The best strategy is to do Cofactor Expansion across/down the
row/column with the most zeroes.
Example 5.1.2
1 0 2 0
0 −7 10 0
Let A = . Calculate det(A) using Cofactor Expansion.
0 2 0 −3
4 1 −1 0
Solution. The fourth column contains three zeroes, which is the most of any row or column in the matrix.
Therefore, we perform cofactor expansion down the fourth column.
4
X
A
det(A) = ai4 Ci4
i=1
A A A A
= a14 C14 + a24 C24 + a34 C34 + a44 C44
A A A A
= 0 · C14 + 0 · C24 − 3 · C34 + 0 · C44
A
= −3 · C34 .
198 CHAPTER 5. DETERMINANTS
Now we need to calculate the (3, 4)-cofactor for the matrix A. From definition, it is
1 0 2
A 3+4
C34 = (−1) det
0 = − det(B).
−7 10
4 1 −1
| {z }
=B
To calculate det(B), we do cofactor expansion again. This time, there is at most one zero in any row or
column. We can pick any of these to do cofactor expansion across/down. Let’s do cofactor expansion across
the first row.
X3
B
det(B) = b1j C1j
j=1
B B B
= b11 C11 + b12 C12 + b13 C13
B B B
= 1 · C11 + 0 · C12 + 2 · C13
B B
= C11 + 2 · C13
We need to calculate the two above cofactors for B. We get
" #!
B −7 10
C11 = (−1)1+1 det = (7 − 10) = −3,
1 −1
" #!
B 0 −7
C13 = (−1)1+3 det = (0 − (−28)) = 28.
4 1
Therefore,
det(B) = −3 + 2(28) = 53.
Putting it all together with what we calculated above, we have
Next, we calculate the determinant of A from Example 5.1.2 using the definition of determinants to ensure
that Cofactor Expansion gives the correct answer. To ease the calculation, we give a formula for determi-
nants of 3 × 3 matrices. This formula can be derived using the definition of the determinant. The proof is
left as an exercise.
is, det(A) = (a11 a22 a33 + a12 a23 a31 + a13 a21 a32 ) − (a13 a22 a31 + a12 a21 a33 + a11 a23 a32 ).
Example 5.1.3
Calculate the determinant of the matrix A from Example 5.1.2 using the definition of a determinant.
4
X
A
det(A) = a1j C1j
j=1
4
X
A
= a1j C1j
j=1
A A A A
= 1 · C11 + 0 · C12 + 2 · C13 + 0 · C14
A A
= C11 + 2C13 .
We need to calculate the two cofactors. Using the formula for a 3 × 3x determinant,
−7 10 0
A 1+1
C11 = (−1) det 2 0 −3 = 1 · (−9) = −9,
1 −1 0
0 −7 0
A
C13 = (−1)1+3 det 0 2 −3 = 1 · 84 = 84.
4 1 0
Therefore, det(A) = −9 + 2 · 84 = 168 − 9 = 159 which is the answer we arrived at in Example 5.1.2. ♦
Example 5.1.4
3 2 0 1
4 0 1 2
Find the determinant of A = .
3 0 2 1
9 2 3 1
Solution. The second column has the most zeroes. Therefore, we do Cofactor Expansion down the second
200 CHAPTER 5. DETERMINANTS
column.
4 1 2 3 0 1
det(A) = −2 det 3 2 1 + 2 det 4 1 2
9 3 1 3 2 1
= −2(−16) + 2(−4)
= 32 − 8
= 24. ♦
We end this section with the following result that follows immediately from Cofactor Expansion.
Corollary 5.1.1
Let A be an n × n matrix that has either a row or a column of all zeroes. Then det(A) = 0.
Proof. If A has a row/column of all zeroes, do Cofactor Expansion across/down this row/column. It follows
immediately from the formulas in Cofactor Expansion that det(A) = 0.
5.2. PROPERTIES OF DETERMINANTS 201
Let A be an n × n matrix. A is called upper triangular if every entry below the main diagonal is
zero. A is called lower triangular if every entry above the main diagonal is zero. A triangular
matrix is a one that is either upper triangular or lower triangular.
Example 5.2.1
Note
The definition of triangular matrices places no restriction on what the entries on the main diagonal
or above/below the main diagonal can be. For example, the zero matrix is both upper and lower
triangular. So is the identity matrix In . Every elementary matrix is either upper or lower triangular.
Triangular matrices are useful because they’re easy to work with. Many of the quantities associated to them
are easy to calculate. For example, determinants of triangular matrices are very easy.
Theorem 5.2.1
Let A be an n × n triangular matrix. Then, the determinant of A is the product of the entries on the
main diagonal.
Example 5.2.2
2 0 0 0 0
0 2 0 0 0
Let A =
1 −10 8 0 0 . Calculate det(A).
−6 −6 2 3 0
0 −2 −3 8 10
Example 5.2.3
In is an n × n matrix with all ones on the diagonal and zeroes elsewhere. Therefore In is triangular.
Then, by Theorem 5.2.1, det(In ) = 1, and
Proof. The proof for upper triangular matrices is similar for lower triangular matrices. Therefore, I only
give it for lower triangular matrices and leave the proof for upper triangular matrices to the reader.
By definition of determinants, det(A) = ad, the product of the entries on the main diagonal.
Now assume the determinant of any n × n lower triangular matrix A is equal to the product of the en-
tries on the main diagonal. This is the induction hypothesis. We must show that the determinant of any
(n + 1) × (n + 1) lower triangular matrix is the product of the entries along the main diagonal.
det(A) = a11 (−1)1+1 det(A11 ) + 0(−1)1+2 det(A12 ) + . . . + 0(−1)1+n+1 det(A1,n+1 ) = a11 det(A11 ). (5.1)
A11 is equal to
a22 0 0 ... 0
a32 a33 0 ... 0
a42 a43 a44 ... 0
A11 = .
.. .. .. ..
..
. . . . .
an+1,2 an+1,3 an+1,4 ... an+1,n+1 .
This is an n×n lower triangular matrix. Therefore, by the induction hypothesis, det(A11 ) = a22 a33 . . . ann an+1,n+1 .
Thus,
det(A) = a11 det(A11 ) = a11 a22 a33 . . . ann an+1,n+1
which completes the proof.
5.2. PROPERTIES OF DETERMINANTS 203
Exercise
Prove Theorem 5.2.1 for upper triangular matrices.
Theorem 5.2.2
Thus,
det(AT ) = ad − cb = ad − bc = det(A),
which verifies the base case.
Assume that det(A) = det(AT ) for all n × n matrices A. This is the induction hypothesis.
Let A be (n + 1) × (n + 1). Performing Cofactor Expansion along the first row of A yields
A A A
det(A) = a11 C11 + a12 C12 + . . . + a1n C1n .
A
Then, C1j = (−1)1+j det(A1j ) = (−1)1+j det((A1j )T ) by the induction hypothesis because A1j is n × n for
each j = 1, 2, . . . , n. Therefore,
det(A) = a11 (−1)1+1 det((A11 )T ) + a12 (−1)1+2 det((A12 )T ) + . . . + a1n (−1)1+n det((A1n )T ). (5.2)
det(A) = a11 (−1)1+1 det((AT )11 ) + a12 (−1)1+2 det((AT )21 ) + . . . + a1n (−1)1+n det((AT )n1 ).
Since a1j are the elements in the first column of AT , the above expression is just the cofactor expansion for
det(AT ) down the first column of AT . Hence,
det(A) = a11 (−1)1+1 det((AT )11 ) + a12 (−1)1+2 det((AT )21 ) + . . . + a1n (−1)1+n det((AT )n1 ) = det(AT )
204 CHAPTER 5. DETERMINANTS
Theorem 5.2.3
Let A be an n × n matrix. Let B be a matrix produced by swapping two rows of A. Then det(B) =
− det(A).
Note
Proof. It suffices to show the result after swapping row 1 with the kth row for some k > 1. This is
because swapping any 2 rows, say the kth row with the `th row is equivalent to doing three row swaps
R1 ⇐⇒ Rk , R1 ⇐⇒ R` , R1 ⇐⇒ Rk .
We use induction on n. If n = 1, this is trivial because no row swaps can be made. If n = 2, write
" #
a b
A= .
c d
Then, det(A) = ad − bc. Swapping the only two rows we can yields
" #
c d
B= .
a b
Thus, det(B) = bc − ad = −(ad − bc) = − det(A). This verifies the base case.
Now assume if A is n × n, then making the row swap R1 ⇐⇒ Rk produces a matrix B with det(B) =
− det(A). This is the induction hypothesis.
If i 6= 1, k, then it is obvious Bi1 differs from Ai1 by one row interchange. Therefore, det(Ai1 ) = − det(Bi1 )
by the induction hypothesis since Ai1 and Bi1 are n × n matrices.
The result will follow given we can show
−ak1 (−1)k+1 det(Ak1 ) = b11 (−1)1+1 det(B11 ), mboxand − a11 (−1)1+1 det(A11 ) = bk1 (−1)k+1 det(Bk1 ).
5.2. PROPERTIES OF DETERMINANTS 205
For the first equation, notice b11 = ak1 by construction of B. Consider Ak1 . Then, we can swap the rows of
Ak1 as follows in order to transform Ak1 into B11 : swap row 1 with row 2, then that row with row 3, then
we getto row k − 1. We will show how this works by performing each
that row with row 4, and iterate until
a12
a22
..
.
swap on the first column of Ak1 : k−1,2 .
a
ak+1,2
..
.
an,2
a12
a22
..
.
ak−1,2
ak+1,2
..
.
an,2
a22
a12
..
.
=⇒ a
k−1,2
right]
ak+1,2
..
.
an,2
a22
a32
a12
..
.
=⇒
ak−1,2
ak+1,2
..
.
an,2
206 CHAPTER 5. DETERMINANTS
a22
a32
a42
a12
..
=⇒ .
ak−1,2
a
k+1,2
..
.
an,2
..
.
a22
a32
a42
..
.
=⇒ ak−1,2 .
a12
a
k+1,2
..
.
an,2
This is precisely the first column of B11 . To move the first row down to the (k − 1)th position, we performed
k − 2 different row swaps. Since Ak1 is n × n, repeated application of the induction hypothesis implies that
Thus,
b11 (−1)2 det(B11 ) = ak1 (−1)k det(Ak1 ) = −(ak1 (−1)k+1 det(Ak1 ))
which is what we wanted to prove. The second equation we need to prove is true by the same argument as
above replacing Ak1 with Bk1 and B11 with A11 . This proves the theorem.
Corollary 5.2.1
Proof. Suppose A has two identical rows. Let B be the matrix that results from interchanging these two
rows. Then, det(A) = − det(B). But the two rows are identical, thus B = A and so, det(A) = − det(A).
This implies det(A) = 0. The proof for two identical columns follows from the fact that det(A) = det(AT ).
I leave the details to the reader.
5.2. PROPERTIES OF DETERMINANTS 207
Exercise
Finish the proof of Corollary 5.2.1.
Showing how the other two row operations change the determinant is much easier. First, we need a lemma.
Lemma 5.2.1
A A
ai1 Cj1 + . . . + ain Cjn
A
where Cji denotes the (j, i)-cofactor of A. If i 6= j, then this sum is zero.
Then, the i0 th row of B is the same as the j0 th. Therefore, by Corollary 5.2.1, det(B) = 0. Since A differs
from B by the j0 th row, and we are deleting the j0 th row in the calculation of CjA0 k for any k between 1
and n, it follows that the (j0 , k)-cofactors for B are the same as they are for A. Hence, taking the cofactor
expansion for B along the j0 th row gives
Theorem 5.2.4
Let A be an n × n matrix.
Proof.
B B B
det(B) = (caj1 + ai1 )Ci1 + (caj2 + ai2 )Ci2 + . . . + (cajn + ain )Cin .
208 CHAPTER 5. DETERMINANTS
B B B B B B
det(B) = (ai1 Ci1 + ai2 Ci2 + . . . + ain Cin ) + c(aj1 Ci1 + aj2 Ci2 + . . . + ajn Cin ). (5.3)
Since we are doing cofactor expansion along the ith row of B, and A differs only from B in this row,
B
the (i, j)-cofactors in Equation (5.3) for B are the same as they are for A. Therefore, (ai1 Ci1 + ... +
B B
ain Cin ) = det(A) as this is just the cofactor expansion of A along the ith row, and aj1 Ci1 + . . . +
B
ajn Cin = 0 by Lemma 5.2.1 since i 6= j. Thus, Equation (5.3) implies det(B) = det(A).
B B A A A A
det(B) = bi1 Ci1 + . . . + bin Cin = cai1 Ci1 + . . . + cain Cin = c(ai1 Ci1 + . . . + ain Cin ) = c det(A).
Echelon forms of matrices are triangular and we know how elementary row operations affect determinants.
Therefore, we can calculate the determinant of a matrix by row reducing it to echelon form, making the
appropriate alterations along the way, and then using Theorem 5.2.1.
Example 5.2.4
−2 2 −1
Let A = 3 9/2 −1 . Calculate det(A).
1 −11/2 2
1 −11/2 2
A1 ∼ 0 21 −7 = A2
|{z}
R2 ⇒R2 −3R1 ,R3 ⇒R3 +2R1 0 −9 3
1 −11/2 2
A2 ∼ 0 −3 1 = A3
|{z}
R2 ⇒−(1/7)R2 0 −9 3
1 −11/2 2
A3 ∼ 0 −3 1 = A4 .
|{z}
R3 ⇒(1/3)R3 0 −3 1
5.2. PROPERTIES OF DETERMINANTS 209
Since A4 has two identical rows, det(A4 ) = 0 by Corollary 5.2.1. From Theorems 5.2.3 and 5.2.4, the
determinant of A4 is related to the determinant of A as follows:
1
det(A4 ) = · det(A3 ) by part 2 of Theorem 5.2.4,
3
1
det(A3 ) = − · det(A2 ) by part 2 of Theorem 5.2.4,
7
Example 5.2.5
2 7 6 1
1 2 9 −1
Use elementary row operations to calculate the determinant of A = .
8 6 2 −1
2 3 3 0
We can use this technique to calculate determinants of matrices whose entries aren’t specified.
210 CHAPTER 5. DETERMINANTS
Example 5.2.6
a b c 2(g + a) 2(h + b) 2(i + c)
Let A = d e f ,B = −a −b −c . If det(A) = 2, calculate det(B).
g h i d e f
Solution. We have
d e f
A |{z}∼ a b c = A1
R1 ⇐⇒ R2 g h i
g h i
A1 |{z}∼ a b c = A2
R1 ⇐⇒ R3 d e f
g+a h+b i+c
A2 ∼ a b c = A3
|{z}
R1 ⇒R1 +R2 d e f
2(g + a) 2(h + b) 2(i + c)
∼
A3 |{z} a b c = A4
R1 ⇒2R1 d e f
2(g + a) 2(h + b) 2(i + c)
∼
A4 |{z} −a −b −c = B.
R2 ⇒−R2 d e f
Using Theorems 5.2.3 and 5.2.4 yields
The following are immediate from the results in this section. The proof o these are left as exercises to the
reader.
Corollary 5.2.2
Let A be an n × n matrix.
1. Let B be a matrix row equivalent to A. Then, det(B) = C det(A) for some non-zero C ∈ R.
Exercise
Prove Corollary 5.2.2.
5.2. PROPERTIES OF DETERMINANTS 211
Example 5.2.7
Let A be n × n. Suppose that we perform row operations on A to get B and that det(B) = 0. Then,
applying part 1 of Corollary 5.2.2, we immediately conclude that det(A) = 0, so we don’t have to
trace back through each row operation.
Theorem 5.2.5
Proof. Suppose A is invertible. Then, the RREF of A is equal to In by part 2 of The Invertible Matrix
Theorem. By part 1 of Corollary 5.2.2, det(A) = C det(In ) = C where C is some non-zero scalar. Thus,
det(A) 6= 0.
We prove the converse with contraposition. Assume A is not invertible. Let B be the RREF of A. By part 3
of The Invertible Matrix Theorem, B does not have a pivot in every row and, therefore, must contain a row
of zeroes. By Corollary 5.2.1, det(B) = 0. By part 1 of Corollary 5.2.2, det(A) = C det(B) for a non-zero
scalar C. Hence, det(A) = 0.
In view of Theorem 5.2.5, we can add another condition to The Invertible Matrix Theorem.
1. A is invertible.
2. The RREF of A is In .
6. FA is one-to-one.
9. FA is onto.
12. AT is invertible.
14. Col(A) = Rn .
15. rank(A) = n.
16. nul(A) = 0.
n o
17. Null(A) = ~0 .
18. dim(Row(A)) = n.
19. Row(A) = Rn .
20. det(A) 6= 0.
Lemma 5.2.2
Let A be an n × n matrix and let E be any n × n elementary matrix. Then, det(EA) = det(E) det(A).
Proof. First suppose E is an elementary matrix produced from In by swapping the ith and jth rows. Then
det(E) = − det(In ) = −1 by Theorem 5.2.3. The product EA is the same matrix as A after Ri ⇐⇒ Rj .
Thus, det(A) = − det(EA) by Theorem 5.2.3 and so
Now suppose E is an elementary matrix obtained from In by multiplying the ith row of In by a non-zero scalar
c. Then, det(E) = c det(In ) = c by Theorem 5.2.4. Moreover, EA is the matrix produced by multiplying
the ith row of A by c. Thus, det(EA) = c det(A) by Theorem 5.2.4. Hence,
Finally, suppose E is an elementary matrix produced from In by adding c times row j to row i. Then
det(E) = det(In ) = 1 by Theorem 5.2.4. Moreover, EA is the matrix produced by performing Ri ⇒ Ri +cRj
5.2. PROPERTIES OF DETERMINANTS 213
Lemma 5.2.2 generalizes to any finite set of elementary matrices using induction. That is, if E1 , . . . , Ek is
any sequence of elementary matrices, then
Proof. If A and B are n × n matrices with at least one of A or B not invertible, then AB is not invertible.
We leave the proof of this as an exercise. In this case, it follows that det(AB) = 0 = det(A) det(B) by
Theorem 5.2.5.
Now assume both A and B are invertible. By part 2 of The Invertible Matrix Theorem, the RREF of A is In .
Therefore, there exists a sequence of elementary matrices E1 , E2 , . . . , Ep−1 , Ep such that Ep Ep−1 . . . E2 E1 A =
In . Thus,
By Theorem 3.2.7, Ei−1 is an elementary matrix for each i = 1, 2, . . . , p. Taking determinants of both sides
gives
det(AB) = det(E1−1 E2−1 . . . Ep−1
−1
Ep−1 B)
= det(A) det(B)
Exercise
Prove that if A and B are n × n matrices with one of A or B not invertible, then the product AB is
not invertible.
214 CHAPTER 5. DETERMINANTS
Warning!
In general, the property det(A + B) = det(A) + det(B) does not hold. Do not confuse this with the
multiplicative property.
Corollary 5.2.3
Proof. We prove 1 and leave the proof of the second as an exercise for the reader. By 5.2.7, we have
But det(A), det(B) are real numbers so we can swap their order of multiplication. Therefore,
where the last equality follows from another application Theorem 5.2.7.
Exercise
Prove the second part of Corollary 5.2.3.
5.3. CRAMER’S RULE 215
Example 5.3.1
−2 4 6 9 −9
9 0 1 1 0
Let A = and ~b = . Then,
−1 2 3 −2 0
−8 −7 6 6 2
−9 4 6 9 −2 −9 6 9
0 0 1 1 9 0 1 1
A1 (~b) = , A2 (~b) = ,
0 2 3 −2 −1 0 3 −2
2 −7 6 6 −8 2 6 6
−2 4 −9 9 −2 4 6 −9
9 0 0 1 9 0 1 0
A3 (~b) = , A4 (~b) = .
−1 2 0 −2 −1 2 3 0
−8 −7 2 6 −8 −7 6 2
Let A be an invertible n × n matrix, let ~b ∈ Rn . Let ~v ∈ Rn denote the unique solution to A~x = ~b,
and write
v1
v2
~v =
..
.
.
vn
216 CHAPTER 5. DETERMINANTS
Then,
det(Ai (~b))
vi = , for each i ∈ {1, 2, . . . , n} .
det(A)
Example 5.3.2
Thus,
det A1 (~b) = 40 + 1 = 41, and det A2 (~b) = 2 − 15 = −13,
and so,
det(A1 ~b) 41 det A2 (~b) 13
v1 = =− , and v2 = = .
det(A) 19 det(A) 19
Example 5.3.3
4 1 1 1
Solution. We have
0 −11 2 −2 0 2 −2 −11 0
~
A1 (b) = 3
1 1 ,
~
A2 (b) = 1
3 1 ,
~
and A3 (b) = 1
1 3 .
1 1 1 4 1 1 4 1 1
Thus,
det(A) = −39, det A1 (~b) = 26, det A2 (~b) = −26, and det A3 (~b) = −117,
5.3. CRAMER’S RULE 217
and so,
det(A1 ~b) 26 2 det A2 (~b) 26 2 det A3 (~b) 117
v1 = =− =− , v2 = = = , and v3 = = = 3.
det(A) 39 3 det(A) 39 3 det(A) 39
Proof. Write A = [ ~a1 ~a2 . . . ~an ] and let {~e1 , ~e2 , . . . , ~en } denote the standard basis of Rn . Then, In =
[ ~e1 ~e2 . . . ~en ] and,
= Ai (~b).
Therefore,
det(Ai (~b))
det(A · (In )i (~v )) = det(Ai (~b)) ⇒ det(A) det (In )i (~v ) = det(Ai (~b)) ⇒ det (In )i (~v ) =
.
det(A)
Doing Cofactor Expansion along the ith row of (In )i (~v ) yields
det (In )i (~v ) = vi det(In−1 ) = vi for each i ∈ {1, 2, . . . , n} .
Hence,
det(Ai (~b))
vi = for each i ∈ {1, 2, . . . , n} .
det(A)
This method for computing solutions to A~x = ~b isn’t too bad for small matrices but, if the matrix is large,
then there are too many determinant calculations to justify using Cramer’s Rule in practice. That said,
Cramer’s Rule is used in theory a lot. In particular, there are a number of proofs in algebraic number theory
that make use of it.
Then, h i
AA−1 = In ⇒ A~b1 A~b2 . . . A~bn = [ ~e1 ~e2 . . . ~en ] ,
so that ~bj is the unique solution to A~x = ~ej for each j ∈ {1, 2, . . . , n}. Therefore, by Cramer’s Rule, the
(i, j)-entry of A−1 , bij , is given by
det (Ai (~ej ))
bij = , for all i, j ∈ {1, 2, . . . , n} .
det(A)
Compute det (Ai (e~j )) using Cofactor Expansion down the ith column of Ai (~ej ). Because the only non-zero
entry in this column is a 1 in the jth position, it follows that
Therefore,
A
Cji
bij = for all i, j ∈ {1, 2, . . . , n} ,
det(A)
and so, a formula for A−1 is
A A A
C11 C21 ... Cn1
A A A
C12 C22 ... Cn2
−1 1
A = .. .. .. .. .
det(A)
. . . .
A A A
C1n C2n ... Cnn
Example 5.3.4
Let
1 −1 2
A= 0 2 1 .
2 0 4
i) Calculate adj(A).
5.3. CRAMER’S RULE 219
Solution.
" #! " #!
A 2 1 A 0 1
C11 = det(A11 ) = det = 8, C12 = − det(A12 ) = − det = 2,
0 4 2 4
" #! " #!
A 0 2 A −1 2
C13 = det(A13 ) = det = −4, C21 = − det(A21 ) = − det = 4,
2 0 0 4
" #! " #!
A 1 2 A 1 −1
C22 = det(A22 ) = det = 0, C23 = − det(A23 ) = − det − 2,
2 4 2 0
" #! " #!
A −1 2 A 1 2
C31 = det(A31 ) = det = −5, C32 = − det(A32 ) = − det − 1,
2 1 0 1
" #!
A 1 −1
C33 = det(A33 ) = det = 2.
0 2
Therefore,
A A A
C11 C21 C31 8 4 −5
A A A =
adj(A) = C12 C22 C32 2 0 −1 .
A A A
C13 C23 C33 −4 −2 2
det(A) = (8 − 2 + 0) − (8 + 0 + 0) = −2.
Therefore,
8 4 −5
1 1
A−1 = adj(A) = − 2 0 −1 . ♦
det(A) 2
−4 −2 2
220 CHAPTER 5. DETERMINANTS
Theorem 5.4.1
Let ~v1 , ~v2 ∈ R2 and let A = [ ~v1 ~v2 ] . Let P(~v1 , ~v2 ) be the parallelogram in R2 determined by ~v1 and
~v2 . Then,
area(P(~v1 , ~v2 )) = |det(A)| .
Example 5.4.1
" # " #
−2 3
Calculate the area of the parallelogram defined by the vectors ~v1 = and ~v2 = .
1 1
Proof. If {~v1 , ~v2 } is linearly dependent, then P(~v1 , ~v2 ) is a line, hence has zero area, and det(A) = 0, so the
theorem is verified. Therefore, we assume {~v1 , ~v2 } is a linearly independent set. Write
" # " #
a b
~v1 = , and ~v2 = .
c d
Let Fθ : R2 → R2 be a linear transformation that rotates vectors counterclockwise through an angle θ where
θ is the angle between ~v1 and the positive horizontal axis. Then,
" #! " # " #! " #
a a0 b b0
Fθ = = ~u1 , and Fθ = = ~u2 ,
c 0 d d0
Let B = [ ~u1 ~u2 ] . Then the base length of P(~u1 , ~u2 ) is a0 , and the vertical height is |d0 |, so
" #!
0 0 0 0 a0 b0
area(P(~u1 , ~u2 )) = a |d | = |a d | = det = |det(B)| ,
0 d0
Let Rθ be the standard matrix for Fθ . Since Fθ is a rotation, it follows that Rθ has the form,
" #
cos(θ) − sin(θ)
Rθ = .
sin(θ) cos(θ)
Now,
B = [ ~u1 ~u2 ] = [ Fθ (~v1 ) Fθ (~v2 ) ] = [ Rθ ~v1 Rθ ~v2 ] = Rθ A.
Hence,
|det(B)| = |det(Rθ A)| = |det(Rθ ) det(A)| = |det(Rθ )| · |det(A)| .
Finally,
det(Rθ ) = cos2 (θ) − (− sin2 (θ)) = cos2 (θ) + sin2 (θ) = 1.
Therefore,
area(P(~v1 , ~v2 )) = area(P(~u1 , ~u2 )) = |det(B)| = |det(Rθ )| · |det(A)| = |det(A)|
Similarly, given ~v1 , ~v2 , ~v3 ∈ R3 , we denote the parallelepiped having ~v1 , ~v2 , ~v3 as edges by P(~v1 , ~v2 , ~v3 ). The
volume of this parallelepiped is also related to determinants of matrices.
Theorem 5.4.2
Let ~v1 , ~v2 , ~v3 ∈ R3 and let P(~v1 , ~v2 , ~v3 ) be the parallelepiped defined above. Let A = [ ~v1 ~v2 ~v3 ] .
222 CHAPTER 5. DETERMINANTS
Then,
volume(P(~v1 , ~v2 , ~v3 )) = |det(A)| .
where · denotes the vector dot product, and × denotes the vector cross product.
Write
v11 v12 v13
A = [ ~v1 ~v2 ~v3 ] = v21 v22 v23 .
Then,
v22 v33 − v32 v23
~v2 × ~v3 = v32 v13 − v12 v33 ,
and
~v1 · (~v2 × ~v3 ) = v11 (v22 v33 − v32 v23 ) + v21 (v32 v13 − v12 v33 ) + v31 (v12 v23 − v22 v13 )
= (v11 v22 v33 + v21 v13 v32 + v31 v12 v23 ) − (v31 v22 v13 + v12 v21 v33 + v11 v23 v32 )
= det(A).
The set F (S) is called the image of S under the linear transformation F , or S under F for short.
A similar definition holds for subsets of R3 and linear transformations mapping R3 to itself. Let’s see what
some of these image sets look like.
5.4. DETERMINANTS AS AREAS AND VOLUMES 223
Example 5.4.2
where a, b > 0 are real numbers. Then, F scales vectors by a factor of a in the horizontal direction,
and scales vectors by a factor of b in the vertical direction. Therefore, F (S1 ) is an ellipse centred at
the origin whose horizontal radius is a and whose vertical radius is b.
Example 5.4.3
Let Fx1 ,θ : R3 → R3 be the linear transformation that rotates vectors in R3 about the x1 -axis
counter-clockwise through an angle of θ. Then, the standard matrix for Fx1 ,θ is
1 0 0
Ax1 ,θ = 0 cos(θ) − sin(θ)
0 sin(θ) cos(θ)
Linear transformations play nicely with parallelograms/parallelepipeds as the next result shows.
Theorem 5.4.3
and
volume(G(P)) = |det(B)| · volume(P).
224 CHAPTER 5. DETERMINANTS
Example 5.4.4
" # " #
−2 3
Let ~v1 = and ~v2 = . Let F be the linear transformation whose standard matrix is
1 1
" #
a 0
where a and b are non-zero real numbers. Calculate the area of F (P(~v1 , ~v2 )).
0 b
Solution. We’ve already calculated that P(~v1 , ~v2 ) has area 5. Therefore,
" #!
a 0
area(F (P(~v1 , ~v2 )) = det · 5 = 5 |ab| .
0 b
" #
3 0
Here is an example of the transformation applied to P(~v1 , ~v2 ).
0 −1
Here, the black parallelogram is P(~v1 , ~v2 ) and the blue parallelogram is the transformation. The transformed
parallelogram has area 15. ♦
Proof. Given any parallelogram P in R2 , there is a vector p~ ∈ R2 that translates P so its leftmost vertex
is at the origin. This new parallelogram P + p~ has the same area as P and, moreover, because F is linear,
we have F (P + p~) = F (P) + F (~
p). Therefore,
p + P)) = area(F (~
area(F (~ p) + F (P)) = area(F (P)),
Let ~v1 , ~v2 be the two vectors that define the parallelogram P + p~, and let M1 = [ ~v1 ~v2 ]. Then,
P + p~ = {s1~v1 + s2~v2 : 0 ≤ s1 ≤ 1, 0 ≤ s2 ≤ 1} .
Therefore,
F (P + p~) = {s1 F (~v1 ) + s2 F (~v2 ) : 0 ≤ s1 ≤ 1, 0 ≤ s2 ≤ 1}
is a parallelogram defined by the vectors F (~v1 ) and F (~v2 ). Let M2 = [ F (~v1 ) F (~v2 ) ], so that M2 = AM1 .
Then, by Theorem 5.4.1,
area(F (P)) = area(F (P + p~)) = |det(M2 )| = |det(A)| |det(M1 )| = |det(A)| · area(P + p~) = area(P),
where the last equality follows because translation does not change area. The proof for the 3-dimensional
case is the same.
Exercise
Let F : Rn → Rn be a linear transformation and let S ⊆ Rn . For any ~v ∈ Rn , prove that
F (S + ~v ) = F (S) + F (~v ).
Theorem 5.4.3 gives a formula for the area/volume of a parallelogram/parallelepiped under a linear trans-
formation. The question now is how do linear transformations effect the area of any region of bounded
area/volume? The answer lies in approximation of such areas by infinitesimally small squares/cubes. The
argument involves a limiting process similar to what you see in calculus when you prove formulas for inte-
grals. The formula is given in the following theorem which we state without proof.
Theorem 5.4.4
and
volume(G(R2 )) = |det(A)| · volume(R2 ).
Example 5.4.5
Let S2 denote the unit sphere in R3 and let Fx1 ,θ be the linear transformation in Example 5.4.3.
Calculate the volume of F (S2 ).
Solution. We have,
1 0 0 " #!
cos(θ) − sin(θ)
det(Ax1 ,θ ) = det 0 cos(θ) − sin(θ) = det = 1.
sin(θ) cos(θ)
0 sin(θ) cos(θ)
226 CHAPTER 5. DETERMINANTS
Example 5.4.6
We can use Theorem 5.4.4 to derive common formulas for areas and volumes of shapes. Use this
theorem to derive a formula for the volume of f an ellipse with horizontal radius a and vertical radius
b.
Solution. Denote such an ellipse by Eab . From Example 5.4.2, Eab is obtained by applying the linear
transformation " #
a 0
A=
0 b
to the unit circle S1 . Then, by Theorem 5.4.4,
" #!
a 0
area(Eab ) = det · area(S1 ) = abπ
0 b
In this chapter, we introduce eigenvalues and eigenvectors. Eigenvalues are special scalars associated with
matrices. Eigenvalues and eigenvectors are used a lot in mathematics. We start with an example.
Example 6.0.1
" # " √ #
2 3 3
Let A = and let ~v = . Describe the vector A~v .
1 2 1
The scalar in the previous example is an example of an eigenvalue and the corresponding vector is an example
of an eigenvector. We make these terms precise.
Let A be an n × n matrix. An eigenvector of A is a non-zero vector ~v such that A~v = λ~v for some
scalar λ ∈ R. The value λ is called an eigenvalue.
229
230 CHAPTER 6. EIGENVALUES AND DIAGONALIZATION
It is important that eigenvectors are defined to be non-zero. If ~0 were an eigenvector, then since A~0 = ~0 is
always true, the relation A~0 = r~0 is true for all scalars r ∈ R. This would imply all scalars are eigenvalues
of A which would make their definition trivial.
Note
Just because eigenvectors must be non-zero does not mean eigenvalues must be non-zero. It is
perfectly reasonable for a matrix to have zero as an eigenvalue.
Example 6.1.1
1 −3 3
Let A = 3 −5 3 . Determine which of the following vectors (if any) are eigenvectors of A and
6 −6 4
determine their corresponding eigenvalue:
2 2 3
~v1 = 7 , ~v2 = 3 , ~v3 = 3 .
1 1 6
6 −6 4 1 −26
As A~v1 is not a scalar multiple of ~v1 , it is not an eigenvector of A.
For ~v2 ,
1 −3 3 2 −4 2
A~v2 = 3 −5 3 3 = −6 = −2 3 = −2~v2 .
6 −6 4 1 −2 1
Since A~v2 = −2~v2 , ~v2 is an eigenvector of A with corresponding eigenvalue −2.
For ~v3 ,
1 −3 3 3 12 3
A~v3 = 3 −5 3 3 = 12 = 4 3 = 4~v3 .
6 −6 4 6 24 6
6.1. CALCULATING EIGENVECTORS AND EIGENSPACES 231
Finding eigenvectors of matrices is a little bit more involved than checking to see if a given vector is an
eigenvector. If you’re given an eigenvalue, it is not too bad as the next theorem shows.
Theorem 6.1.1
Let A be an n × n matrix with eigenvalue λ. Then, ~v is an eigenvector of A corresponding to λ if and
only if ~v ∈ Null(A − λIn ) and ~v 6= ~0.
Example 6.1.2
7 12 8
Let A = 14 29 20 . It is a fact that λ = 1 is an eigenvalue of A. Find an eigenvector
The RREF of A − I3 is
1 2 0
A − I3 ∼ 0 0 1 .
0 0 0
x3 0
Any non-zero value of s ∈ R produces a solution to (A − I3 )~x = ~0. Thus, if say s = 1, an eigenvector for A
corresponding to λ = 1 is
−2
1 ,
0
so a basis for Null(A − I3 ) is,
−2
1 .
0
232 CHAPTER 6. EIGENVALUES AND DIAGONALIZATION
−2
Therefore, 1 is an eigenvector of A corresponding to the eigenvalue λ = 1 and so is any scalar multiple
0
of this vector. ♦
Note
We can always check that we have a correct eigenvector corresponding to an eigenvalue λ by simply
evaluating A~v and making sure that it equals λ~v .
A~v = λ~v .
A~v − λ~v = ~0 =⇒ A~v − λ(In~v ) = ~0 =⇒ A~v − (λIn )~v = ~0 =⇒ (A − λIn )~v = ~0.
so that ~v ∈ Null(A − λIn ). The converse is proved by tracing through these steps in the reverse order and
is left as an exercise to the reader.
Exercise
Finish the proof of Theorem 6.1.1.
Let A be an n × n matrix with eigenvalue λ. The null space Null(A − λIn ) is called the eigenspace of
A corresponding to λ. This is denoted by EλA and it consists of all eigenvectors of A that correspond
to λ along with the zero vector.
6.1. CALCULATING EIGENVECTORS AND EIGENSPACES 233
Example 6.1.3
23 18 −36 36
−36 −31 36 −36
Let A = . Given λ = −13 is an eigenvalue for A, find a basis for the
12 6 −25 12
20 10 −20 7
A
eigenspace E−13 .
A
Solution. We need to find a basis for E−13 . Start by calculating,
36 18 −36 36
−36 −18 36 −36
A + 13I4 = .
12 6 −12 12
20 10 −20 20
A
Therefore, a basis for E−13 is,
−1/2 1 1
1 0
0
, , . ♦
0 1 0
0 0 1
Example 6.1.4
−14 14 −63 14
30 −30 135 −30
Let A = . Given that λ = 0 is an eigenvalue of A, find a basis for the
32 −32 144 −32
−60 60 −270 60
corresponding eigenspace E0A .
234 CHAPTER 6. EIGENVALUES AND DIAGONALIZATION
Solution. For this one, we need to find a basis for the null space of A − 0I4 = A. The RREF of A is
1 −1 9/2 −1
0 0 0 0
A∼ .
0 0 0 0
0 0 0 0
In each of the examples we’ve done, the eigenspaces we’ve calculated have dimension at least 1. This is
necessarily the case: if λ is an eigenvalue of A, then the dimension of EλA is at least one. I leave it as an
exercise to determine why this is true.
Theorem 6.1.1 implies that the set of all eigenvectors of a matrix corresponding to a specific eigenvalue λ,
along with ~0, form a subspace of Rn . This can be shown via the definition of a subspace as well and is left
as an exercise for the reader.
Exercise
Let A be an n × n matrix with eigenvalue λ.
ii) Show that the set of all eigenvectors of A corresponding to λ, along with ~0, are a subspace of
Rn using the definition of a subspace.
6.2. CALCULATING EIGENVALUES 235
Theorem 6.2.1
Example 6.2.1
" #
2 −1
Find the eigenvalues of A = .
0 2
This is a polynomial in λ. Clearly, the only root of this polynomial is λ = 2. Therefore, λ = 2 is the only
eigenvalue of A. ♦
Proof. First suppose λ is an eigenvalue of A with eigenvector ~v . Then, ~v ∈ Null(A − λIn ) is non-zero so
that Null(A − λIn ) is not the zero subspace. Thus, by The Invertible Matrix Theorem, det(A − λIn ) = 0.
The converse is similar and its proof is left as an exercise to the reader.
Exercise
Finish the proof of Theorem 6.2.1.
Example 6.2.2
4 20 2
Find the eigenvalues of A = 0 −3 0 .
1 26 5
236 CHAPTER 6. EIGENVALUES AND DIAGONALIZATION
1 26 5−λ
= −60 + 7λ + 6λ2 − λ3 + 6 + 2λ
= −54 + 9λ + 6λ2 − λ3
−54 + 9λ + 6λ2 − λ3 = 0.
That is, we need to find the roots of the above polynomial. Factoring this polynomial yields,
Can we always use this method of factoring a polynomial to find eigenvalues? The answer is yes!
Theorem 6.2.2
Proof. We prove this using induction on n. If n = 1, then det(A − λI1 ) = a11 − λ which is a degree one
polynomial.
Now suppose det(A − λIn ) is a polynomial in λ of degree n for all n × n matrices A. This is the induction
hypothesis.
det(A − λIn+1 ) = (a11 − λ)(−1)1+1 det((A − λIn+1 )11 ) + . . . + a1n (−1)1+n det((A − λIn+1 )1n ).
Each of (A − λIn+1 )1j is an n × n matrix. Furthermore, regardless of which column of A − λIn+1 we delete, it
is easy to see that (A−λIn+1 )1j is a matrix of the form Bj −λIn where Bj is an n×n matrix for j = 1, . . . , n.
6.2. CALCULATING EIGENVALUES 237
Therefore, by the induction hypothesis, each of the above determinants is a polynomial in λ of degree n.
Letting det((A − λIn+1 )1j = pj (λ) for j = 1, . . . , n, we have
Every term in this sum is a polynomial of degree n, except for the first, which is a polynomial in λ of degree
n + 1 (because λp1 (λ) has degree n + 1). Thus, when we sum everything together, we get a polynomial in λ
of degree n + 1.
Let A be an n × n matrix. The degree n polynomial det(A − λIn ) is called the characteristic
polynomial for A. The equation
det(A − λIn ) = 0
It is clear from Theorem 6.2.1 that the eigenvalues of an n × n matrix A are exactly the roots of the char-
acteristic polynomial. Combining this with 6.2.2, the following is evident.
Corollary 6.2.1
Exercise
Prove Corollary 6.2.1.
We have seen that calculating determinants of triangular matrices is really easy. Finding their eigenvalues
is easy as well.
Theorem 6.2.3
The eigenvalues of a triangular matrix are the entries along the main diagonal.
238 CHAPTER 6. EIGENVALUES AND DIAGONALIZATION
Example 6.2.3
1 0 0 0 0
0 −1 0 0 0
Find the eigenvalues of A =
2 .
3 4 0 0
1 −1 0 0 0
1 1 1 1 1
Solution. This is a 5 × 5 matrix, so the characteristic polynomial for A has degree 5. Typical, this is not
easy to factor. Luckily A is lower triangular so, by Theorem 6.2.3, the eigenvalues are the entries on the
main diagonal: 1, −1, 4, and 0. ♦
Proof. Let λ be an eigenvalue of A. Then, A − λIn differs from A only on the main diagonal, hence is also
triangular. By Theorem 5.2.1, its determinant is the product of the entries on the main diagonal. Thus,
Clearly, the roots of this polynomial are aii for i = 1, 2, . . . , n. Therefore, the eigenvalues of A, are the entries
on the main diagonal.
Warning!
Do not fall into the trap of row reducing a matrix A to echelon form, and then taking the elements
on the main diagonal as the eigenvalues of A. You will get the wrong answer.
The zero vector can not be an eigenvector, but an eigenvalue of 0 is perfectly fine. In fact, an eigenvalue of
0 tells you something about a matrix.
Theorem 6.2.4
Let A be an n × n matrix. Then, λ = 0 is an eigenvalue of A if and only if A is not invertible.
Proof. 0 is an eigenvalue of A if and only if Null(A − 0In ) = Null(A) contains a non-zero vector, which is
equivalent to A being non-invertible. You fill in the details.
With Theorem 6.2.4 in tow, we make one final amendment to the Invertible Matrix Theorem.
6.2. CALCULATING EIGENVALUES 239
1. A is invertible.
2. The RREF of A is In .
6. FA is one-to-one.
9. FA is onto.
12. AT is invertible.
14. Col(A) = Rn .
15. rank(A) = n.
16. nul(A) = 0.
n o
17. Null(A) = ~0 .
18. dim(Row(A)) = n.
19. Row(A) = Rn .
20. det(A) 6= 0.
Note
In this course, we will refer to the algebraic multiplicity of an eigenvalue simply as multiplicity.
Calculating multiplicities is easy as long as we can factor the characteristic polynomial. Suppose that
λ1 , λ2 , . . . , λk are the distinct eigenvalues of an n × n matrix A. Then, the characteristic polynomial for A
factors as
det(A − λIn ) = (λ1 − λ)r1 (λ2 − λ)r2 ) . . . (λk − λ)rk
Example 6.2.4
41 27 −18
Determine the multiplicity of the eigenvalue λ = 50 for the matrix A = 0 50 0 .
−8 24 34
Therefore, m(50) = 2. ♦
Example 6.2.5
Suppose that the characteristic polynomial of a matrix A is given by λ8 − 17λ7 + 80λ6 − 64λ5 .
Determine the eigenvalues of A and their corresponding multiplicities.
The roots of this polynomial are 0, 8, and 1. Therefore, the eigenvalues of A are 0, 8, and 1 and, their
corresponding multiplicities are
We end this section with the following fact that relates algebraic multiplicities of eigenvalues to the dimen-
sions of the corresponding eigenspaces.
Fact 6.2.1
The quantity dim(EλA ) is called the geometric multiplicity of λ. This is is a really nice result because it
relates something purely algebraic to something purely geometric. However, the proof of this is hard and is
well beyond the scope of the course.
242 CHAPTER 6. EIGENVALUES AND DIAGONALIZATION
6.3 Diagonalization
Analogous to how we can factor numbers and polynomials, matrices can be factored as well. A diagonaliza-
tion is a special type of matrix factorization. The goal of this section is to give criteria for determining when
a matrix is diagonalizable and to determine such a factorization.
An n × n matrix D is called diagonal if it has all zeroes both above and below the main diagonal.
Diagonal matrices have nice properties. In particular, it is really easy to perform many types of calculations
with them. Here are some examples of such properties. We leave their proofs as exercises for the reader.
is given by
dk1 0 ... 0
0 dk2 ... 0
k
D = .. .. .. .. .
.
. . .
0 0 ... dkn
If di 6= 0 for each i = 1, 2, . . . , n, this definition can be extended to any negative integer −m,
where m ≥ 1, and
1
dm 0 ... 0
1
1
0 ... 0
−m dm2
D = . .
.. .. .. ..
. . .
0 0 . . . d1m
n
5. Diagonal matrices are simultaneously upper and lower triangular, so the determinant of a di-
agonal matrix is the product of the entries on the diagonal and its eigenvalues are the entries
on its diagonal.
Functions that are familiar to us from caclulus can be defined on matrices as well as long as we are clever
√
about it. For example, one can define square roots of matrices as follows: if A is n × n, define A to be the
√ √
n × n matrix that satisfies A · A = A.
Similarly, one can define functions on matrices using Taylor series. For example, recall that the Taylor series
for ex is:
∞
X xn
ex = .
n=0
n!
244 CHAPTER 6. EIGENVALUES AND DIAGONALIZATION
In a similar fashion, one can define trig functions on matrices, logarithms, you name it!
The reason diagonal matrices are so useful is because they are so easy to calculate with. Indeed, suppose you
wanted to compute the first 100 terms of the power series for eA . If A is not diagonal, then this calculation
is really intensive because large powers of matrices require lots of computation to calculate. However, if we
calculate the exponential of a diagonal matrix,
d1 0 . . . 0
0 d2 . . . 0
D= .. .. . . . ,
. . . ..
0 0 . . . dn
then the terms in the power series of eD , are easy to calculate! In particular,
k
d1 0 . . . 0
∞
Dk
∞ 0 dk2 . . . 0
X X 1
eD = = . .. . . .
k! ..
k! . ..
k=0 k=0 .
0 0 . . . dkn
because then,
√ 2
d1 0 ... 0 d1 0 ... 0
√ 2
√ √ 0 d2 ... 0 0 d2 ... 0
D D= .. .. .. .. = .. .. .. .. = D.
. .
. . . . . .
√ 2
0 0 ... dn 0 0 ... dn
So, we can see that all of these things are easier to work with as long as we are working with diagonal
matrices. But certainly not all matrices are diagonal. This is where diagonalization comes in! If A is
diagonalizable, then it can be written in a special form using a diagonal matrix. Such matrices are not quite
as easy as diagonal matrices to calculate with, but it is the next best thing.
Example 6.3.1
1 −4 12 9 0 0
The matrix A = −4 7 18 is similar to B = 0 1 0 because A = P BP −1 where
0 0 1 0 0 −1
−1 9 2
P = 2 3 1 .
0 1 0
Similarity defines something called an equivalence relation on the set of n × n matrices. We leave the proof
of this as an exercise.
Exercise
Prove that similarity is an equivalence relation. To do this, you must prove the following three things
for all n × n matrices A, B, C
1. A ≡ A
2. If A ≡ B, then B ≡ A
3. If A ≡ B and B ≡ C, then A ≡ C.
If A and B are similar, then they share some of the same properties.
Theorem 6.3.1
Let A and B be two n × n similar matrices.
2. det(A) = det(B).
Proof. The proof of 1 is beyond the scope of this course. The proof of 2 is left as an exercise for the reader.
We prove part 3.
246 CHAPTER 6. EIGENVALUES AND DIAGONALIZATION
Suppose A is similar to B. Then, there exists an invertible n × n matrix P such that A = P BP −1 . Noting
In = P P −1 , we have
Therefore, (A − λIn ) ≡ (B − λIn ). By part 2, det(A − λIn ) = det(B − λIn ). Therefore, the characteristic
polynomials for A and B are the same, so they have the same roots, which implies that A and B have the
same eigenvalues.
Exercise
Show that if A and B are similar then they have the same determinant.
The converse of this theorem is not true. That is, two matrices with the same rank, determinant, and
eigenvalues are not necessarily similar. Here is a counter example.
Example 6.3.2
Both of these matrices have one eigenvalue, 2, of multiplicity 2, they both have determinant equal to
4, and
" they both
# have rank 2, but they are not similar. To see this, suppose there exists a matrix
a b
P = such that
c d
" # " #
2 1 2 0
P = P.
0 2 0 2
Doing the matrix multiplication gives
" #" # " #" # " # " #
a b 2 1 2 0 a b 2a a + 2b 2a 2b
= =⇒ = .
c d 0 2 0 2 c d 2c c + 2d 2c 2d
which is not invertible. Therefore, the two matrices are not similar despite having the same eigenval-
ues.
Suppose that A and B are similar. Then, there exists an n × n matrix P such that A = P BP −1 . Suppose
6.3. DIAGONALIZATION 247
For A3 ,
A3 = A2 · A = (P BP 2 P −1 )(P BP −1 ) = P B 2 (P −1 P )BP −1 = P B 3 P −1 .
Repeating the pattern, we can see that for any positive integer k ≥ 1 that
Ak = P B k P −1 .
6.3.3 Diagonalization
In this section, we show how to determine if a matrix is diagonalizable and, if it is, how to find the diago-
nalization.
A = P DP −1 ⇐⇒ AP = P D.
A given n × n matrix A is not guaranteed to be diagonalizable. The next theorem gives an exact criterion
for determining when a matrix is diagonalizable and, even better, its proof gives a recipe for finding the
diagonalization!
Generally we would do an example before the proof but, because the proof of this theorem also gives a recipe
for the diagonalization, we’ll do the proof first and follow it with an example.
248 CHAPTER 6. EIGENVALUES AND DIAGONALIZATION
Proof. Let λ1 , λ2 , . . . , λn be the eigenvalues of A counted with multiplicity and suppose that {~v1 , ~v2 , . . . , ~vn }
is a linearly independent set of eigenvectors where ~vi is an eigenvector of A corresponding to λi for each
i = 1, 2, . . . , n. Consider the matrix P = [ ~v1 ~v2 . . . ~vn ] and define,
λ1 0 ... 0
0 λ2 ... 0
D= .. .. .. .. .
.
. . .
0 0 ... λn
where ~e1 , ~e2 , . . . , ~en is the standard basis for Rn . Since λi is an eigenvalue of A corresponding to the
eigenvector ~vi , the relation A~vi = λi~vi is satisfied for each i = 1, 2, . . . , n. Therefore, we have
AP = A [ ~v1 ~v2 . . . ~vn ] = [ A~v1 A~v2 . . . A~vn ] = [ λ1~v1 λ2~v2 . . . λn~vn ] . (6.1)
P D = [ P (λ~e1 ) P (λ~e2 ) . . . P (λn~en ) ] = [ λ1 (P ~e1 ) λ2 (P ~e2 ) . . . λn (P ~en ) ] = [ λ1~v1 λ2~v2 . . . λn~vn ] . (6.2)
Combining Equations (6.1) and (6.2) gives AP = P D. Since the columns of P are linearly independent
by assumption, P is invertible by The Invertible Matrix Theorem. This shows that A ≡ D so that A is
diagonalizable.
Conversely, suppose A is diagonalizable. Then, there exists an n × n invertible matrix P and a diagonal
matrix D such that AP = P D. Write P = [ ~v1 ~v2 . . . ~vn ] and
d1 0 ... 0
0 d2 ... 0
D= .. .. .. .. .
.
. . .
0 0 ... dn
Then,
AP = [A~v1 A~v2 . . . A~vn ] = [ d1~v1 d2~v2 . . . dn~vn ] = P D.
Since these matrices are equal, their columns must be equal. This means,
Since P is invertible, none of the ~vi ’s are ~0, and so each di is an eigenvalue of A with corresponding eigen-
vector ~vi . Finally, since P is invertible, the ~vi ’s must be linearly independent by The Invertible Matrix
Theorem. Thus, there exists a linearly independent set of n eigenvectors of A.
6.3. DIAGONALIZATION 249
Example 6.3.3
" #
−2 3
Diagonalize the matrix A = .
−1 2
This shows the eigenvalues of A are ±1. Now, we find bases for the eigenspaces. For λ = 1,
" # " #
−3 3 1 −1
A − I2 = ∼ .
−1 1 0 0
(" #)
1
Thus, a basis for the eigenspace E1A is .
1
the diagaonlization is A = P DP −1 . ♦
This might seem a bit complicated, but it’s not too bad. Before we continue on with an algorithm for
diagonalizing matrices, we have a few results.
Theorem 6.3.3
Let A be an n × n matrix with distinct eigenvalues λ1 , λ2 , . . . , λk , 1 ≤ k ≤ n. Let ~v1 , ~v2 , . . . ~vk be
eigenvectors of A where ~vi is an eigenvector of A corresponding to the eigenvalue λi , i = 1, . . . , k.
Then, {~v1 , ~v2 , . . . , ~vk } is a linearly independent set.
Proof. By way of contradiction, suppose {~v1 , ~v2 , . . . , ~vk } is a linearly dependent set. Since ~v1 is non-zero,
Theorem 2.6.3 implies the following:
250 CHAPTER 6. EIGENVALUES AND DIAGONALIZATION
1. There is a smallest index p, 1 ≤ p ≤ k − 1, such that ~vp+1 is a linear combination of the previous
vectors ~v1 , ~v2 , . . . , ~vp ,
Multiplying both sides of Equation (6.3) by A and using the relation A~vi = λi~vi , for each i = 1, 2, . . . , p, we
get
c1 (A~v1 ) + c2 (A~v2 ) + . . . + cp (A~vp ) = A~vp+1 =⇒ (c1 λ1 )~v1 + (c2 λ2 )~v2 + . . . + (cp λp )~vp = λp+1~vp+1 . (6.4)
Multiplying Equation (6.3) by λp+1 and subtracting from the right hand side of Equation (6.4) gives
c1 (λ1 − λp+1 )~v1 + c2 (λ2 − λp+1 )~v2 + . . . + cp (λp − λp+1 )~vp = ~0.
Since {~v1 , . . . , ~vp } is linearly independent, this equation implies ci (λi − λp+1 ) = 0 for each i = 1, 2, . . . , p.
Since one of the ci ’s is non-zero by assumption, this implies that λi0 = λp+1 for some i0 ≤ p < p + 1. This
contradicts the assumption that the λi ’s are distinct for each i = 1, 2, . . . , k. Therefore, we conclude that
{~v1 , ~v2 , . . . , ~vk } is linearly independent.
Corollary 6.3.1
Proof. Suppose A has n distinct eigenvalues. Then, any set of eigenvectors corresponding to these eigenval-
ues is linearly independent by Theorem 6.3.3. Thus, there exists a set of n linearly independent eigenvectors
of A, so it is diagonalizable by The Diagonalization Theorem.
1. A is diagonalizable,
Let Bi be a basis for EλAi for each i = 1, 2, . . . , k. Then, each Bi is a linearly independent set containing
dim(EλAi ) eigenvectors of A. Furthermore, from the definition of basis, these are the largest sets of linearly
independent eigenvectors each eigenspace can provide. Take all of these bases and combine them into one set
S. Then, since eigenvectors corresponding to distinct eigenvalues are linearly independent, Theorem 6.3.3
implies that S is a linearly independent set, and there are
To see this, we first note that S is the largest set of linearly independent eigenvectors corresponding to A
that we can find. Any other set with more eigenvectors would imply the existence of an eigenspace, say
EλAi , with dim(EλAi ) + 1 linearly independent eigenvectors in it; a contradiction due to construction of S.
Therefore, S is the largest possible set of linearly independent eigenvectors of A. Since the existence of a set
of n linearly independent eigenvectors of A is assumed, it follows that
2 =⇒ 3 : We prove the contrapositive. By Fact 6.2.1, we may assume dim(EλAi ) < m(λi ) for some i. Then,
As before, let Bi be a basis of EλAi for each i = 1, 2, . . . , k. Then, each Bi is a set of dim(EλAi ) linearly
independent eigenvectors of A. Put all of these vectors into a set S. Such a set is necessarily linearly
independent by Theorem 6.3.3 and contains n elements by assumption. Therefore, there is a set of n lin-
early independent eigenvectors for A, and therefore A is diagonalizable by The Diagonalization Theorem.
Diagonalization and Dimension is a great theorem because it relates diagonalizability to a completely geo-
metric concept: dimensions of eigenspaces! It also allows us a bit of a shortcut in finding diagonalizations.
252 CHAPTER 6. EIGENVALUES AND DIAGONALIZATION
Step 1. Calculate the distinct eigenvalues of A and their respective multiplicities. List them, taking
into account their multiplicities, as λ1 , λ2 , . . . , λn . This list will not, in general, be distinct.
Step 2. Calculate bases for eigenspaces corresponding to each distinct eigenvalue. Start with
eigenvalues that have algebraic multiplicity greater than 1. If any of the geometric multiplicities
are strictly less than the corresponding algebraic multiplicities, then stop. The matrix is not
diagonalizable by Diagonalization and Dimension. If the geometric multiplicity equals the algebraic
multiplicity fore each distinct eigenvalue, then A is diagonalizable by Diagonalization and Dimension,
so go to the next step.
Step 3. Let ~v1 , ~v2 , . . . , ~vn denote the basis vectors you found in Step 2 labelled so that A~vi = λi~vi
for each i = 1, 2, . . . , n. Form the matrix
Example 6.4.1
−λ 0
det(A − λI2 ) = = −λ(2 − λ) − (−1)(0) = λ(λ − 2).
−1 2−λ
6.4. AN ALGORITHM FOR DIAONGALIZATION & EXAMPLES 253
The roots of this polynomial are 0 and 2, hence the eigenvalues of A are 0 and 2. Since A has 2 distinct
eigenvalues, A is diagonalizable by Corollary 6.3.1.
Step 2. Now calculate bases for the eigenspaces corresponding to the two eigenvalues in step 1. For λ = 0,
" # " #
0 0 1 −2
A − 0I2 = ∼ .
−1 2 0 0
For λ = 2,
" # " #
−2 0 1 0
A − 2I2 = ∼ .
−1 0 0 0
Step 3. Let
" # " #
2 0 0 0
P = [ ~v1 ~v2 ] = , D= .
1 1 0 2
Then A = P DP −1 . ♦
Note
Example 6.4.2
73 25 67
A
Step 2. We calculate a basis for the eigenspace E57 .
−80 −125 −50 1 0 0
A − 57I3 = 22 70 28 ∼ 0 1 2/5 .
73 25 10 0 0 0
x3 1
A
Thus, dim(E57 ) = 1 < 3 = m(57), therefore A is not diagonalizable by 6.3.4. ♦
Example 6.4.3
4 0 −1
Thus, the eigenvalues of A are -1 and 3 with multiplicities m(−1) = 1 and m(3) = 2.
6.4. AN ALGORITHM FOR DIAONGALIZATION & EXAMPLES 255
Step 2. Calculate bases for the eigenspaces corresponding to each eigenvalue. For λ = 3,
0 0 0 1 0 −1
A − 3I3 = 0 0 0 ∼ 0 0 0 .
4 0 −4 0 0 0
x3 0 1
For λ = −1,
4 0 0 1 0 0
A + I3 = 0 4 0 ∼ 0 1 0 .
4 0 0 0 0 0
x3 1
A
Thus, a basis for E−1 = Null(A + I3 ) is
0
0 .
1
A
and we have dim(E−1 ) = 1 = m(−1). Therefore, the matrix is diagonalizable by Diagonalization and Di-
mension.
Step 3. Let
0 0 1 −1 0 0
P = 0 1 0 , D= 0 3 0 .
1 0 1 0 0 3
Then A = P DP −1 . Other valid diagonalizations for A are
1 0 0 3 0 0
P = 0 1 0 , D = 0 3 0 ,
1 0 1 0 0 −1
256 CHAPTER 6. EIGENVALUES AND DIAGONALIZATION
or
1 0 0 3 0 0
P = 0 0 1 , D= 0 −1 0 . ♦
1 1 0 0 0 3
Example 6.4.4
Therefore, the eigenvalues of A are −76 and −38 with multiplicities m(−38) = 3 and m(−76) = 2.
A
and dim(E−76 ) = 2 = m(−76).
Step 3. Let
0 7 −5 9 −3 −76 0 0 0 0
0 2 1 0 0 0 −76 0 0 0
P =
1/2 1 0 0 ,
0 D=
0 0 −38 0 0 .
1 0 0 1 0 0 0 0 −38 0
0 1 0 0 1 0 0 0 0 −38
Then A = P DP −1 . ♦
Exercise
Write down some other valid diagonalizations for the matrix in Example 6.4.4.
Example 6.4.5
Let A be a 5×5 matrix and suppose that A has eigenvalues 2 and 3. If the dimension of the eigenspace
corresponding to 2 is 3, is A diagonalizable? What if it has dimension 4?
258 CHAPTER 6. EIGENVALUES AND DIAGONALIZATION
Solution. If the eigenspace corresponding to 2 has dimension 3, then this is not enough to conclude that A
is diagonalizable because it is possible the dimension of the eigenspace corresponding to 3 has dimension 1.
In this case, the sum of the dimensions of the eigenspaces would be 3 + 1 = 4, so that A is not diagonalizable
by Diagonalization and Dimension. However, if the eigenspace corresponding to 2 has dimension 4, then the
eigenspace corresponding to 3 necessarily has dimension exactly 1. Therefore, the sum of the dimensions of
the eigenspaces is 4 + 1 = 5 so that A is diagonalizable by Diagonalization and Dimension. ♦
6.5. SYSTEMS OF FIRST ORDER DIFFERENTIAL EQUATIONS 259
Example 6.5.1
Solution. A solution to this differential equation is a function x(t) whose derivative is equal to a times
itself. To find such a function, divide both sides by x(t) to get
x0 (t)
= a.
x(t)
x0 (t)
By the chain rule, it is clear that is the derivative of ln(x(t)). Therefore, if we integrate both sides with
x(t)
respect to t, we get Z 0 Z
x (t)
dt = a dt =⇒ ln(x(t)) = at + C
x(t)
where C is an arbitrary constant. Solving for x(t) yields,
where A is an arbitrary real constant. We leave it to the reader to verify that this function does in fact
satisfy the above differential equation. ♦
In Example 6.5.1, we get an entire family of solutions to the differential equation x0 (t) = ax(t): one for each
real value of A. If we want a specific solution, we introduce a condition x(t) must satisfy at t = 0. Such a
condition is called an initial condition.
Example 6.5.2
Find a solution to the following differential equation subject to the given initial condition.
Solution. Example 6.5.1 shows that the solution to the differential equation is x(t) = Ae2t where A is an
arbitrary constant. If the initial condition is also to be satisfied, we must have
x(0) = 4 =⇒ Ae2·0 = 4 =⇒ A = 4.
260 CHAPTER 6. EIGENVALUES AND DIAGONALIZATION
Therefore, a solution to the differential equation subject to the initial condition is x(t) = 4e2t . ♦
Note
A problem that asks to find a solution to a differential equation subject to an initial condition is
called an initial value problem.
The matrix A is called the coefficient matrix for the system. We introduce some notation that allows us
to further simplify this equation.
Let f1 (t), f2 (t), . . . , fn (t) be differentiable functions of a real variable t. Consider the vector of func-
tions
f1 (t)
f2 (t)
f~(t) =
.
..
.
fn (t)
The derivative of f~(t), denoted f~ 0 (t), is the vector obtained by differentiating each component of
f~(t) component-wise,
0
f1 (t)
0
f2 (t)
f~ 0 (t) =
.. .
.
fn0 (t)
The vector derivative satisfies the following two properties. Both are based off of properties of derivatives so
the proofs are left to the reader.
6.5. SYSTEMS OF FIRST ORDER DIFFERENTIAL EQUATIONS 261
Let f1 (t), f2 (t), . . . , fn (t), g1 (t), g2 (t), . . . , gn (t) be differentiable functions of a real variable t. Define
f1 (t) g1 (t)
f2 (t) g2 (t)
~
f (t) =
.. ,
and ~g (t) =
.. .
. .
fn (t) gn (t)
Then,
Exercise
Prove Properties of Vector Derivatives.
and this matrix equation represents the linear system of differential equations in Equation (6.5).
Notice the similarity between Equation (6.7) and the differential equation in Example 6.5.1. The solution
to the differential equation suggests that a solution to Equation (6.7) may have the form
v1 eλt v1
v2 eλt v2
= eλt = eλt~v
~x(t) =
.. ..
. .
vn eλt vn
where λ ∈ R and ~v ∈ Rn . We show this is the case under the assumption that A is diagonalizable.
(e i v1i )0
λt
(λi eλi t )v1i
(e v2i )0 (λi eλi t )v2i
λi t
v~i 0 (t) =
..
= .. = λi eλi t~vi = eλi t (λi v~i ) = eλi t (A~vi ) = A(eλi t~vi ) = A~vi (t) i = 1, 2, . . . , n.
. .
λi t 0 λi t
(e vni ) (λi e )vni
262 CHAPTER 6. EIGENVALUES AND DIAGONALIZATION
Therefore, ~x(t) = ~vi (t) is a solution to the matrix equation in (6.7) for each i = 1, 2, . . . , n. Furthermore, any
linear combination of the ~vi (t)’s is a solution to this equation. This is called the superposition principle.
To see this, define
~v (t) = c1~v1 (t) + c2~v2 (t) + . . . + cn~vn (t)
where c1 , c2 , . . . , cn ∈ R. Then,
~v 0 (t) = (c1~v1 (t) + c2~v2 (t) + . . . + cn~vn (t))0
= c1~v1 0 (t) + c2~v2 0 (t) + . . . + cn~vn 0 (t) by repeated application of Properties of Vector Derivatives
= A~v (t).
This shows ~x(t) = ~v (t) is a solution to the matrix differential equation in (6.7). In fact, if A is diagonalizable,
every solution to this equation has the form given (this isn’t proved in this course). Therefore, ~v (t) is called
the general solution to the matrix differential equation. We summarize all of this in the following theorem.
Theorem 6.5.1
Consider a system of first order, linear, differential equations
where
a11 a12 ... a1n x1 (t) b1
a21 a22 ... a2n x2 (t) b2
~b =
A= .. .. .. .. , ~x(t) = .. , .. .
.
. . . . .
an1 an2 . . . ann xn (t) bn
Assume A is diagonalizable. Let λ1 , λ2 , . . . , λn be the eigenvalues of A, counted with multiplicity, and
let {~v1 , ~v2 , . . . , ~vn } be a set of n linearly independent eigenvectors of A, where ~vi is an eigenvector
corresponding to λi for each i = 1, 2, . . . , n. Then, the general solution to the matrix differential
equation is
~v (t) = c1~v1 (t) + c2~v2 (t) + . . . + cn~vn (t)
where c1 , c2 , . . . , cn are arbitrary real constants. The ci ’s can be solved by applying the initial condi-
tion ~v (0) = ~b and solving the resulting linear system.
6.5. SYSTEMS OF FIRST ORDER DIFFERENTIAL EQUATIONS 263
Example 6.5.3
Solution.
" #
1 4
i) The coefficient matrix for the system is A = . We leave it as an exercise to show that a
1 1
diagonalization of A is " # " #
−1 1 −1 0
P = , D= .
2 2 0 3
The columns of P are linearly independent eigenvectors, with the first corresponding to the eigenvalue
−1 and the second corresponding to the eigenvalue 3. Therefore, the general solution to the system is
" # " #
−1 1
~v (t) = c1 e−t + c2 e3t , c1 , c2 ∈ R.
2 2
Example 6.5.4
7 12 5
A diagonalization for A is
−6 −4 −1 −1 0 0
P = 1 1 0 , D= 0 1 0 .
5 4 1 0 0 −2
5 4 1
5 4 1 −1
5 4 1 −1 0 0 1 −4
5 4 1
The initial condition does not need to be at t = 0. It can be at t = a for any constant a.
Example 6.5.5
2
Repeat Example 6.5.4 with the initial condition ~x(1) = 1 .
−1
6.5. SYSTEMS OF FIRST ORDER DIFFERENTIAL EQUATIONS 265
5 4 1
5 4 1 −1
5 4 1
266 CHAPTER 6. EIGENVALUES AND DIAGONALIZATION
Bibliography
[1] Buss, S. Some proofs about determinants. Link here. Document pulled May 3, 2017.
[2] Holt, J. Linear Algebra with Applications, 2nd Edition. W.H Freeman and Company, New York NY,
2017.
[3] Lay, D.C. Linear Algebra and its Applications, 3rd Edition. Pearson Education Inc. Boston MA, 2006.
267