0% found this document useful (0 votes)

94 views273 pages

Math 221 Notes - Chad Davis (Last Update July 2022)

The Math 221 Course Notes by Chad Davis cover various topics in linear algebra, including solving linear systems with matrices, vectors in Rn, algebra of matrices, subspaces, and determinants. Each section provides detailed explanations, examples, and theorems relevant to the subject matter. The document is structured with a table of contents for easy navigation through the topics.

Uploaded by

frankiewillz1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

94 views273 pages

Math 221 Notes - Chad Davis (Last Update July 2022)

Uploaded by

frankiewillz1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 273

Math 221 Course Notes

Chad Davis

Updated July 4, 2022

2
Contents

Foreword iv

1 Solving Linear Systems With Matrices 1

1.1 Linear Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Solutions to Linear Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.2 The Geometry of Solution Sets: Linear Systems in Two Variables and Beyond . . . . 8
1.1.3 Augmented and Coefficient Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.2 Row Reducing Matrices: The Key to Solving Linear Systems . . . . . . . . . . . . . . . . . . 17
1.2.1 Row Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.2.2 Echelon Forms of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.2.3 Gauss-Jordan Elimination: A Recipe for RREF . . . . . . . . . . . . . . . . . . . . . . 26
1.3 Solving Linear Systems with Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
1.3.1 Solutions to Linear Systems and Matrices . . . . . . . . . . . . . . . . . . . . . . . . . 31
1.3.2 Different Solution Sets and Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
1.3.3 The Solutions Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
1.3.4 More Examples of Solving Linear Systems With Matrices . . . . . . . . . . . . . . . . 41

2 Vectors in Rn 49
2.1 2-Dimensional Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.2 Vectors in Euclidean Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
2.2.1 Vector Forms of Solutions to Linear Systems . . . . . . . . . . . . . . . . . . . . . . . 65
2.3 Linear Combinations and Vector Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
2.4 Matrix Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
2.5 Spanning Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
2.5.1 Geometry of Spanning Sets in R2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
2.5.2 The Span Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
2.6 Linear Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
2.6.1 The Homogeneous Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
2.6.2 Linear Independence of Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
2.6.3 Special Cases of Linear Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
2.7 Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
2.7.1 Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
2.7.2 Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

i
ii CONTENTS

2.7.3 The Geometry of Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . . . 110

2.7.4 Linear Transformations are Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
2.7.5 One-to-One and Onto . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

3 Algebra of Matrices 129

3.1 Addition, Subtraction, and Scalar Multiplication of Matrices . . . . . . . . . . . . . . . . . . 129
3.1.1 Matrix Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
3.2 Matrix Inverses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
3.2.1 Calculating Inverses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
3.2.2 The Invertible Matrix Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
3.2.3 Elementary Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
3.3 Transpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
3.4 Solving Algebraic Equations Involving Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 160

4 Subspaces 163
4.1 Introduction to Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
4.2 Column and Null Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
4.3 Bases for Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
4.4 Dimension of Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
4.4.1 The Basis Theorem and Three Useful Corollaries for Calculating Bases and Dimensions177
4.5 The Rank-Nullity Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
4.6 Row Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
4.6.1 The Canonical Basis for Col(A) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
4.7 The Invertible Matrix Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193

5 Determinants 195
5.1 Calculation of Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
5.2 Properties of Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
5.2.1 Determinants of Triangular Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
5.2.2 Determinants of Transposes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
5.2.3 Elementary Row Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
5.2.4 Determinants and Invertibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
5.2.5 The Multiplicative Property of Determinants . . . . . . . . . . . . . . . . . . . . . . . 212
5.3 Cramer’s Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
5.3.1 Cramer’s Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
5.3.2 A New Formula for A−1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
5.4 Determinants as Areas and Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
5.4.1 Determinants as Areas and Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
5.4.2 Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222

6 Eigenvalues and Diagonalization 229

6.1 Calculating Eigenvectors and Eigenspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
6.2 Calculating Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
6.2.1 Eigenvalues of Triangular Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
CONTENTS iii

6.2.2 Matrices with Eigenvalue 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238

6.2.3 Multiplicities of Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
6.3 Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
6.3.1 Diagonal Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
6.3.2 Similar Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
6.3.3 Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
6.4 An Algorithm for Diaongalization & Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 252
6.5 Systems of First Order Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
iv CONTENTS

Foreword
Linear algebra is the study of linear transformations of vector spaces. A linear transformation is a function
that has sets of vectors as its domain and range. A linear transformation is an abstract mathematical con-
struction. The amazing thing about linear algebra is that, under the right conditions, these abstract objects
can always be represented by something concrete: an array of numbers known as a matrix.

The focus of this course is matrices and their various properties. Learning about matrices in a tangible
sense will allow us to learn about the abstract linear transformations. This is a remarkably wonderful thing.
We will learn various ways to manipulate matrices and learn how to perform algebraic operations on them,
such as addition and multiplication. We will see how all of these things relate to linear transformations in
a meaningful way. This is what makes linear algebra such a beautiful subject. We are able to study and
learn about complicated, abstract mathematics via objects that we can manipulate in a completely hands
on manner.
Chapter 1

Solving Linear Systems With Matrices

In this chapter, we introduce linear systems and matrices. You have been solving systems of linear equations
since high school. Matrices are a natural way to encode all the relevant data in a linear system and can
be manipulated to obtain solutions to such systems in a very efficient manner. There are many real world
problems that can be modelled using linear systems, so we see immediately that matrices find application in
many areas.

Before we jump into the content, we fix some notation. The symbol R stands for the set of all real numbers,
Q stands for the set of all rational numbers, Z stands for the set of all integers, and C stands for the set of
all complex numbers.

If S is a set, then the notation x ∈ S means that x is in the set S; the symbol “∈” is read “is an element
of.” For example, x ∈ R means that x is a real number.

1.1 Linear Systems

We begin with a type of problem you are likely to have seen before.

Example 1.1.1

Wentworth Music sells two types of guitars: Fender Stratocasters for $350 each and Gibson Les Pauls
for $600 each. Last year, they sold 230 guitars and made $100,000 from their total sales. How many
of each guitar did they sell?

Solution

Let x1 be the total number of Stratocasters sold and let x2 be the total number of Les Pauls sold. Since

1
2 CHAPTER 1. SOLVING LINEAR SYSTEMS WITH MATRICES

230 guitars were sold in total, we have the following equation,

x1 + x2 = 230. (1.1)

We are also given that the total revenue Wentworth Music made off of guitars last year is $100,000.
Since each Stratocaster is sold for $350 and each Les Paul is sold for $600, we get another equation,

350x1 + 600x2 = 100, 000. (1.2)

Presumably, you have seen how to simultaneously solve equations of this type before. You likely have
seen a number of different ways to do it. Some methods for solving these equations include graphing
both equations and determining the point where they intersection; doing substitution by solving for one
variable in one equation, substituting this into the other, and solving for the remaining variable; or you
might have seen how to solve such equations by subtracting multiples of one equation from the other
to eliminate variables. Even though substitution might be the method of preference to do this, I am
going to use the latter method.

Multiplying Equation 1.1 by 350 and subtracting it from Equation 1.2 gives

350x1 + 600x2 = 100, 000

− 350x1 + 350x2 = 350(230)
250x2 = 19, 500
We can now solve for x2 by dividing by 250. This yields x2 = 78. Therefore, there were 78 Les Paul
guitars sold.

To solve for x1 , we subtract 600 times Equation 1.1 from Equation 1.2 to get

350x1 + 600x2 = 100, 000

− 600x1 + 600x2 = 600(230)
−250x1 = −38, 000
Dividing by -250 on both sides yields x1 = 152. Thus, Wentworth Music sold 152 Fender Stratocasters
and 78 Gibson Les Pauls. ♦

The equations in the previous examples are special types of equations called linear equations.

Definition 1.1.1: Linear Equation

Let k be a positive integer. A linear equation in k variables is one that can be written in the
form,
a1 x1 + a2 x2 + . . . + ak xk = b

where x1 , x2 , . . . , xk are variables and a1 , a2 , . . . , ak , b are real (or complex) numbers. The values
a1 , a2 , . . . , ak are called the coefficients of the linear equation and b is called the constant term.
The above form is called the standard form of a linear equation.
1.1. LINEAR SYSTEMS 3

Identifying linear equations is straightforward. You are only allowed to have variables multipled by coeffi-
cients that are linked with addition and subtraction. No other functions of those variables - such as raising
to a power, products, quotients, trigonometric function etc. - can be present.

Example 1.1.2

The equation x1 − 3 = 6 − 2x2 is a linear equation in k = 2 variables. Its standard for is,

x1 + 2x2 = 9.

The coefficients are 1 and 2 and the constant term is 9. ♦

Example 1.1.3

The equation x21 + x2 = 1 is not linear as it involves a quadratic term x21 . Likewise, the following
equations are not linear:
x1 + x2
x1 + sin(x2 ) = 5, x1 + ln(x2 ) + x3 = 1, x1 x2 = 1, − = 1.
x3
♦

Example 1.1.4

Determine if the following equation is linear:

−4x1 + 3x3
− 2x2 − 1 = 3
5
If it is, identity the number of variables, write it in standard form, and identify the coefficients and
constant term.

Solution

Don’t be followed by the division! We can write the given equation as follows,
−4x1 + 3x3 4 3
− 2x2 − 1 = 3 =⇒ − x1 − 2x2 + x3 = 4,
5 5 5
which is the standard form of a linear equation. Its coefficients are − − 4/5, −2, and 3/5, and the
constant term is 4. ♦

The main goal of this chapter is to devise a method for solving not one linear equation, but for solving a
collection of linear equation simultaneously, just as we did in Example 1.1.1.
4 CHAPTER 1. SOLVING LINEAR SYSTEMS WITH MATRICES

Definition 1.1.2: Linear Systems

Let k and n be positive integers. A linear system (or a system of linear equations) in k variables
with n equations is a collection of n linear equations in the same k variables. Symbolically, a linear
system is a collection of equations of the form,

a11 x1 + a12 x2 + ... + a1k xk = b1

a21 x1 + a22 x2 + ... + a2k xk = b2
.. .. .. ..
. . . .
an1 x1 + an2 x2 + ... + ank xk = bn

where x1 , . . . , xk are variables and aij , bi are real (or complex) numbers for each i ∈ {1, 2, . . . , n} and
j ∈ {1, 2, . . . , k}. The aij ’s are called the coefficients of the linear system and the bi ’s are called the
constant terms or the constant coefficients.

The coefficients aij are read “a eye jay.” For example, the a12 entry is read “a one two” and not “a twelve”.
Moreover, the coefficients are always indexed by equation and then variable it is multiplying. That is, the i
picks out the equation the coefficient is in and the j picks out the variable that coefficient multiplies.

Example 1.1.5

Consider the linear system from Example 1.1.1:

x1 + x2 = 230
350x1 + 600x2 = 100, 000

The first equation is x1 + x2 = 230; the second is 350x1 + 600x2 = 100, 000. Therefore, the coefficients
of this linear system are,

a11 = 1, a12 = 1, a21 = 350, a22 = 600.

The constant terms are,

b1 = 230, b2 = 100, 00

Obviously, the coefficients are dependent on how the linear system is written down. If you’re asked to write
down the coefficients for a linear system, then you do this based on how the linear system is given to you.
1.1. LINEAR SYSTEMS 5

Example 1.1.6

Determine the coefficients and constant terms for the following linear system.
√ 1 5
−7x1 − 2x2 + x3 =
3 7

πx1 + 2πx2 − 5x3 − x4 = 0

3x2 − 4x4 = 2

Solution

The first thing we notice is that this is a linear system in 4 variables with 3 equations. The fourth
variable is missing in the first equation. This means the coefficient on x4 in the first equation is a zero.
Similarly for the coefficients on x1 and x3 in the second equation. Thus, the coefficients for this linear
system are,
√ 1
a11 = −7, a12 = − 2, a13 = , a14 = 0,
3

a21 = π, a22 = 2π, a23 = −5, a24 = −1

a31 = 0, a32 = 3, a33 = 0 a34 = −4

The constant terms are,
5
b1 = , b2 = 0, b3 = 2
7
♦

1.1.1 Solutions to Linear Systems

One of the main goals of matrix algebra is to find solutions to linear systems. In this section, we discuss
solutions to linear systems and their geometry.

Definition 1.1.3: Solutions to Linear System & Solution Sets

A solution to a linear system

a11 x1 + a12 x2 + ... + a1k xk = b1

a21 x1 + a22 x2 + ... + a2k xk = b2
.. .. .. ..
. . . .
an1 x1 + an2 x2 + ... + ank xk = bn

is a k-tuple of real (or complex) numbers (s1 , s2 , . . . , sk ) such that when we make the substitutions.

x1 = s1 , x2 = s2 , . . . , xk = sk ,
6 CHAPTER 1. SOLVING LINEAR SYSTEMS WITH MATRICES

then each of the equation in the linear system is satisfied. That is, each equation becomes the
corresponding constant term on the right hand side.

The solution set of a linear system is the set of all solutions to that linear system; that is, it is a
set of k-tuples each of which are a solution to the linear system.

Example 1.1.7

In example 1.1.1, we had the linear system:

x1 + x2 = 230
350x1 + 600x2 = 100, 000.
We saw that x1 = 158 and x2 = 72 is a solution to this system because, when we substitute these
values into each equation in the system, we have,

158 + 72 = 230;

350(158) + 600(72) = 100, 000.

This means (x1 , x2 ) = (158, 72) is a solution to the given linear system. In fact, it is the only solution
to this linear system. This means that the solution set to this linear system is exactly {(152, 78)} .
♦

Example 1.1.8

Verify that the 4-tuples,

(x1 , x2 , x3 , x4 ) = (13, 0, −2, −4), (x1 , x2 , x3 , x4 ) = (21, −4, 2, −4)

are solutions to the linear system,

−x1 − 2x2 − 7x4 = 15

2x1 − 4x3 + x4 = 30
3x1 + x2 − 5x3 + x4 = 45

Solution

To verify this is true, we substitute the values into each equation and see if all of them are satisfied.
For the first,
−(13) − 2(0) − 7(−4) = 28 − 13 = 15;

2(13) − 4(−2) + (−4) = 26 + 8 − 4 = 30;

3(13) + 0 − 5(−2) + (−4) = 39 + 10 − 4 = 45.

1.1. LINEAR SYSTEMS 7

This shows that (x1 , x2 , x3 , x4 ) = (13, 0, −2, −4) is a solution to the linear system. To verify the other,
we calculate:
−21 − 2(−4) − 7(−4) = −21 + 8 + 28 = 36 − 21 = 15;

2(21) − 4(2) − 4 = 42 − 8 − 4 = 30;

3(21) + (−4) − 5(2) − 4 = 63 − 4 − 10 − 4 = 45.

This shows (x1 , x2 , x3 , x4 ) = (21, −4, 2, −4) is also a solution to the linear system. This means the
solution set for the system consists at least of the 4-tuples {(13, 0, −2, −4), (21, −4, 2, −4)}. We will see
later that, in fact, there are infinitely many different solutions to this linear system. ♦

In the previous two examples, the linear systems in question had at least one solution. It is not necessary
that this happens.

Example 1.1.9

Show that there is no solution to the following linear system.

3x1 + x2 = 1
6x1 + 2x2 = 5

Solution

We can use the same method as in Example 1.1.1 to try and find a solution. If we try to subtract twice
the first equation from the second, we wind up with:

6x1 + 2x2 = 5

−2 3x1 + x2 = 1
0 = 3

Clearly it is not possible that 3 = 0. This shows that the linear system has no solution.

If you don’t like this method, try substitution. First, solve for x2 in the first equation:

3x1 + x2 = 1 =⇒ x2 = 1 − 3x1 .

Now substitute into the second:

6x1 + 2x2 = 5 =⇒ 6x1 + 2(1 − 3x1 ) = 5 =⇒ 2 = 5.

Once again, this is not possible, which implies that the system is not solvable. ♦

The previous examples show that linear systems may have solutions or they may not. This prompts the
following definition.
8 CHAPTER 1. SOLVING LINEAR SYSTEMS WITH MATRICES

Definition 1.1.4: Consistency

A linear system is called consistent if it has at least one solution. It is called inconsistent if it does
not have a solution.

For example, the linear systems in Examples 1.1.1 and 1.1.8 are consistent since they have exactly one and
at least two solutions respectively. The linear system in Example 1.1.9 is inconsistent.

1.1.2 The Geometry of Solution Sets: Linear Systems in Two Variables and
Beyond
Let’s consider the linear system in Example 1.1.1:

x1 + x2 = 230
250x1 + 600x2 = 100, 000

We can graph these two equations in an x1 -x2 plane, which is the same as an x-y plane with the axes
relabelled. Doing so certainly yields two lines:

80
x1 + x2 = 230
350x1 + 600x2 = 100, 000

78
x2

75
150 151 152 153 154
x1

We see that the intersection point of these two lines is (152, 78) which is exactly the solution to the linear
system. This should make sense if we think about what a solution is: If (x1 , x2 ) = (s1 , s2 ) satisfies a linear
equation then (s1 , s2 ) is a point on the line represented by that equation. Since a solution to a linear system
must satisfy every linear equation in the system, it must be a point that lies on each line concurrently. In
the case of this example, it should now be clear why there is exactly one solution to the linear system: There
is only one point of intersection between these two lines.
1.1. LINEAR SYSTEMS 9

Let’s look at another two examples to see what else can happen.

Example 1.1.10

Graph the following linear system and, from the graph, determine the number of solutions the linear
system has. Moreover, state whether the system is consistent or inconsistent.

−2x1 + x2 = 4
−2x1 + x2 = 0

Solution

If we write these two equations in the usual y = mx + b form we typically see for linear equation, we
have
x2 = 2x1 + 4, x2 = 2x1 ,

which shows the lines have the same slope with different x2 -intercepts. Therefore, the lines are parallel.
Their graph is,
10
x2
9

1
x1
−7 −6 −5 −4 −3 −2 −1 1 2 3 4 5 6 7
−1

−2

−3

−4

−5

−6

−7

−8
−2x1 + x2 = 4
−9 −2x1 + x2 = 0
−10

Since these lines are parallel, we know there is no intersection point. Since they never intersect, there
is no solution to this linear system. This is an example of an inconsistent linear system. ♦

Note

If you graph the linear system in Example 1.1.9, you will also get two parallel lines with different x2 -
intercepts, which is why the system is inconsistent. We will see a bit later on that this is not the only
10 CHAPTER 1. SOLVING LINEAR SYSTEMS WITH MATRICES

way inconsistent linear systems can arise.

Example 1.1.11

Graph the following linear system and, from the graph, determine how many solutions the linear
system has. Moreover, state whether the system is consistent or inconsistent.

x1 + x2 = 5
−x1 − x2 = −5

Solution

Writing the two equations in y = mx + b form yields,

x2 = 5 − x1 ,

−x1 − x2 = −5 =⇒ x2 = 5 − x1 .

This means that these two equations represent the same line. Therefore, if we graph both of them, we
get the same two lines sitting on top of one another:
10
x2
x1 + x2 = 5
9
−x1 − x2 = −5
8

x1
−7 −6 −5 −4 −3 −2 −1 1 2 3 4 5 6 7

I have graphed the first equation with a thicker line so you can see the two lines sitting on top of one
another.

What is the solution set in this case? Here, every point on the two lines are concurrent. This means
every point on the line is a solution to the linear system. Since there are infinitely many points on a
1.1. LINEAR SYSTEMS 11

line, this means that this linear system has infinitely many solutions. Since there is at least one if
there are infinitely many, it follows that this linear system is consistent. ♦

The previous three examples show linear systems that have either,

i) Exactly one solution;

ii) No solutions;

iii) Infinitely many solutions.

If we do the thought experiment and keep in mind what straight lines look like, it is not much of a stretch
to see that these are the only three possibilities for solution sets. Even if more linear equations are added to
the linear system, these are still the only possibilities. For instance, here is a graph of a linear system with
four equations in two variables that has exactly one solution:

7
x2
6
−x1 + 2x2 = 1
−x1 + x2 = 0
5 x1 + x2 = 2
5x1 + x2 = 6
4

x1
−5 −4 −3 −2 −1 1 2 3 4 5

−1

−2

−3

−4

−5

−6

Here, we see that the four lines simultaneously intersect at the point (1, 1). This is the solution to the linear
system.

Below is a graph of an inconsistent linear system with three equations:

12 CHAPTER 1. SOLVING LINEAR SYSTEMS WITH MATRICES

7
x2
6
−x1 + 2x2 = 3
−x1 + x2 = 0
5 x1 + x2 = 2
4

x1
−5 −4 −3 −2 −1 1 2 3 4 5

−1

−2

−3

−4

−5

−6

Note that even though there are common points of intersection between pairs of lines, there is not a single
point where they all meet simultaneously. This means the linear system has no solution.

What about linear systems in three variables? The first thing we need to ask ourselves is “What does a
linear equation in three variables look like?” If we graph such an equation in a 3 dimensional coordinate
axes, we get a plane, which looks like a flat sheet of paper. As in the case of two variables, a solution to a
linear system in three variables is a point where all the planes in the linear system intersect simultaneously.
Keeping in mind that, like lines, planes don’t bend, it is somewhat easy to see that there is only a few ways
a set of planes can intersect:

i) Simultaneously in one single point;

ii) The planes lie on top of one another and, hence, intersect in infinitely many different points;

iii) The planes intersect along a line and, hence, intersect in infinitely many different points;

iv) They have no common intersection point.

In any event, we see that, as in the case of linear systems in three variables, that there are once again only
three possibilities for the number of solutions: one, none, or infinitely many.

What about linear equations in four variables? Unfortunately, such equations can not be graphed or visual-
ized since they lie in a space that is a dimension higher than we live in. If you want to try to do the thought
experiment, though, a linear equation in four variables (called a four-dimensional hyperplane) is a four
dimensional cube. And, once again, if you start looking at common intersection points in four space of such
objects, they can only intersect in either exactly one, none, or infinitely many points, corresponding to one,
none, or infinitely many solutions.
1.1. LINEAR SYSTEMS 13

In fact, this same observation is true for all linear systems! Any collection of k linear equations in n variables
(called n-dimensional hyperplanes) intersects in either 1 point in n-dimensional space, infinitely many
points in n-dimension space, or in no points in n-dimensional space. This leads to the following which we
state as a fact, but whose truth will become obvious once we introduce our methods for solving linear systems
with matrices.

Fact 1.1.1: Number of Solutions of Linear Systems

A linear system in k variables with n equations has either

i) One unique solution,

ii) Infinitely many solutions,

iii) No solutions.

1.1.3 Augmented and Coefficient Matrices

The main focus of this course are matrices. We introduce these objects in this section and see their connec-
tion to linear systems.

Definition 1.1.5: Matrices

An n × k (pronounced “n by k”) matrix A is an array of numbers with n rows and k columns:

 
a11 a12 ... a1k
a21 a22 ... a2k
 
 
A= .. .. .. .. 
.
 
 . . . 
an1 an2 ... ank

where the aij ’s are numbers called the entries of A. The quantity n × k is referred to as the
dimension or size of A.

We refer to specific entries in a matrix as follows: the (i, j)-entry of A is the number in the ith row and jth
column of A. This means the (i, j)-entry of A is aij .

Example 1.1.12

The following is a 3 × 4 matrix  

−1 3 2 1
A= 0 0 2 1 
 

−5 −5 0 8
The (2, 3)-entry of A is the number in the 2nd row and 3rd column. In this case, a23 = 2. Write
down and identify some of the other entries in this matrix!
14 CHAPTER 1. SOLVING LINEAR SYSTEMS WITH MATRICES

Example 1.1.13

Determine the size of the following matrix,

 
−1 2 −4
A =  4 1/2 −7  .
 

0 1 −2

Write down the following entries,

i) a21 ii) a11 iii) a32 iv) a23 v) a43

Solution

This matrix has 3 rows and 3 columns, so its size is 3 × 3. For the entries:

i) a21 is the entry in the second row and first column. In this case, a21 = 4.

ii) a11 is the entry in the first row and first column, so a11 = −1.

iii) a32 is the entry in the third row, second column, so a32 = 1.

iv) a23 is the entry in the second row, third column, so a23 = −7

v) a43 would be the entry in the fourth row and third column of A. Since A does not have four rows,
there is no (4, 3)-entry, so a43 does not exist.

There are two important matrices attached to every linear system.

Definition 1.1.6: Augmented and Coefficient Matrices

Consider a linear system with n equations in k variables:

a11 x1 + a12 x2 + ... + a1k xk = b1

a21 x1 + a22 x2 + ... + a2k xk = b2
.. .. .. ..
. . . .
an1 x1 + an2 x2 + ... + ank xk = bn

The augmented matrix for this linear system is the n × (k + 1) matrix,

 
a11 a12 ... a1k b1
a21 a22 ... a2k b2
 
 
.. .. .. .. .
..

.
 
 . . . . 
an1 an2 ... ank bn
1.1. LINEAR SYSTEMS 15

The coefficient matrix for this linear system is the n × k matrix,

 
a11 a12 . . . a1k
 a21 a22 . . . a2k 
 
.
 . .. .. 
 . ..
 . . . . 
an1 an2 . . . ank

In both cases, the first k columns of the matrix correspond to the coefficients of the variables
x1 , x2 , . . . , xk .

We draw the vertical line in the augmented matrix so we don’t confuse between the augmented matrix and
the coefficient matrix. Think of the vertical bar as representing the equals signs in the linear system. This
is not standard notation so, if you are reading other references, make sure you are aware of their notational
convention.

Example 1.1.14

Write down the coefficient and augmented matrices for the linear systems in Examples 1.1.1 and 1.1.6
and state their sizes.

Solution

The linear system in Example 1.1.1 is

x1 + x2 = 230
350x1 + 600x2 = 100, 000

The coefficient matrix is the following 2 × 2 matrix:

" #
1 1
350 600

The augmented matrix is the following 2 × 3 matrix:

" #
1 1 230
350 600 100, 000

The linear system in Example 1.1.6 is

√ 1 5
−7x1 − 2x2 + x3 =
3 7

πx1 + 2πx2 − 5x3 − x4 = 0

3x2 − 4x4 = 2
16 CHAPTER 1. SOLVING LINEAR SYSTEMS WITH MATRICES

The coefficient matrix is the following 3 × 4 matrix,

 √ 
−7 − 2 1/3 0
 π 2 −5 −1 
 

0 3 0 −4

The augmented matrix is the following 3 × 5 matrix

 √ 
−7 − 2 1/3 0 5/7
 π 2 −5 −1 0 
 

0 3 0 −4 2

Example 1.1.15

Suppose the augmented matrix for a linear system is given by

 
1 12 −1 3 13
√ 
 1 −2 9 12 − 2  .


π −4 1/2 9 10

State the size of this matrix and write down the corresponding linear system.

Solution

This matrix has 3 rows and 5 columns, so is 3 × 5. Since the matrix is augmented, the values in the
last column are the constant terms in the corresponding linear system. The rest of the values are
the coefficients on the variables in the linear system, of which there are four which we can denote by
x1 , x2 , x3 , and x4 . Thus, the corresponding linear system is,

x1 + 12x2 − x3 + 3x4 = 13
√
x1 − 2x2 + 9x3 + 12x4 = − 2
πx1 − 4x2 + (1/2)x3 + 9x4 = 10

♦
1.2. ROW REDUCING MATRICES: THE KEY TO SOLVING LINEAR SYSTEMS 17

1.2 Row Reducing Matrices: The Key to Solving Linear Systems

It may not be clear at present, but the augmented matrix for a linear system encodes all the information
necessary to find all solutions, if they exist, to the system. In fact, there are certain operations that can be
performed on the rows of an augmented matrix, aptly called row operations, that allows one to determine
this information.

In Example 1.1.1, we solved the linear system by subtracting multiples of one equation from another to
eliminate variables. While it might be easier to do substitution in this example, the reason I chose to use
this elimination method is because it provides concrete example of the aforementioned row operations. Let’s
see how this works.

The linear system in Example 1.1.1 and its corresponding augmented matrix are given below

" #
x1 + x2 = 230 1 1 230
⇐⇒ .
350x1 + 600x2 = 100, 000 350 600 100, 000

Denote the first row by R1 . R1 represents the equation x1 + x2 = 230. The second row, R2 , represents
the equation 350x1 + 600x2 = 100, 000. To find a solution to this linear system, the first thing we did was
subtract 350 times the first equation from the second. The corresponding operation on the matrix is to
replace R2 with 350 times R1 subtracted from R2 . This is denoted by R2 ⇒ R2 − 350R1 . The resulting
linear system and corresponding augmented matrix is,

" #
x1 + x2 = 230 1 1 230
R2 ⇒ R2 − 350R1 : ⇐⇒ .
250x2 = 19, 500 0 250 19500

The second row of the new matrix represents the equation 250x2 = 19500. We solve for x2 by dividing by
250. The corresponding operations performed on the matrix is to replace R2 with (1/250) times R2 . This is
written R2 ⇒ (1/250)R2 . The resulting linear system and corresponding augmented matrix is,

" #
x1 + x2 = 230 1 1 230
R2 ⇒ (1/250)R2 : ⇐⇒ .
x2 = 78 0 1 78

We can now solve for x1 by subtracting the second equation from the first. This is written as R1 ⇒ R1 − R2
and the resulting linear system and corresponding augmented matrix is

" #
x1 = 152 1 0 152
R1 ⇒ R1 − R2 : ⇐⇒ .
x2 = 78 0 1 78

Now look! This new linear system clearly has solution (x1 , x2 ) = (152, 78), which is precisely the solution to
the original linear system!

Keeping this example in mind, it is certain that there is more than meets the eye here. Indeed, by doing
matrix manipulations similar to what we did above, we can find the solution set to any linear system that
we’re given! This is a beautiful process and the rest of the chapter is dedicated to fleshing it out.
18 CHAPTER 1. SOLVING LINEAR SYSTEMS WITH MATRICES

1.2.1 Row Operations

The primary operations we perform on an augmented matrix to solve a linear system are the elementary row
operations. In the previous, we have seen two of these operations in action. There is also one more we can
perform. These are defined as follows.

Definition 1.2.1: Elementary Row Operations

The elementary row operations that can be performed on an n × k matrix A are the following.

1. Row Replacement: Add a non-zero multiple of one row to another. If c is a non-zero number,
then replacing row i by row i plus c times row j is denoted by Ri ⇒ Ri + cRj . The arrow is
pronounced “is replaced by” so the expression Ri ⇒ Ri + cRj is read “row i is replaced by row
i plus c times row j.”

2. Scaling Rows: Multiply one row by a non-zero number. If c is a non-zero number, then row
i replaced by c times row i is denoted Ri ⇒ cRi . This is read “row i is replaced by c times row
i.”

3. Swapping Rows: Interchange two rows. Interchanging row i with row j is denoted Ri ⇔ Rj .
This is read “row i is replaced by row j and row j is replaced by row i.”

I generally refer to elementary row operations simply as “row operations.”

Example 1.2.1

Consider the following 4 × 3 matrix,

 
−2 3 1
 1 5 1/2 
A=
 
−2 −1/3

 8 
−2/5 0 1

Perform the following row operations on the matrix in the order given:

i) R4 ⇔ R1 ii) R3 ⇒ R3 + 5R2 iii) R2 ⇒ −5R2 iv) R1 ⇒ R1 − 4R3

Solution

i) We need to swap the first and fourth rows. This yields,

 
−2/5 0 1
 1 5 1/2 
R4 ⇐⇒ R1 : 
 
−2 −1/3

 8 
−2 3 1
1.2. ROW REDUCING MATRICES: THE KEY TO SOLVING LINEAR SYSTEMS 19

ii) In the new matrix, we need to replace the third row with 5 times the second. This yields,
   
−2/5 0 1 −2/5 0 1
 1 5 1/2   1 5 1/2 
R3 ⇐⇒ R3 + 5R2 :  =
   
−2 + 5(5) −1/3 + 5(1/2)

 8 + 5(1)   13 23 13/6 
−2 3 1 −2 3 1

iii) Now we multiply row 2 by -5.

 
−2/5 0 1
 −5 −25 −5/2 
R2 ⇒ −5R2 : 
 

 13 23 13/6 
−2 3 1

iv) Finally, we need to replace row 1 with -4 times row 3.

   
−2/5 − 4(13) 0 − 4(23) 1 − 4(13/6) −262/5 −92 −23/3
 −5 −25 −5/2   −5 −25 −5/2 
R1 ⇒ R1 − 4R3 :  =
   

 13 23 13/6   13 23 13/6 
−2 3 1 −2 3 1

After all four row operations have been performed, we are left with the matrix,
 
−262/5 −92 −23/3

 −5 −25 −5/2 

 
 13 23 13/6 
−2 3 1

Exercise
Perform the row operations on A in Example 1.2.1 in the reverse order. Do you get the same matrix?

Some matrices can be transformed into others using elementary row operations and others can not. For
instance, consider the matrices,
" # " #
1 0 1 2
A1 = , A2 = .
0 1 0 1

It is clear we can transform A1 into A2 by doing the row operation R1 ⇒ R1 + 2R2 . You can also do the
reverse: change A2 into R1 by doing R1 ⇒ R1 − 2R2 .

On the other hand, consider the matrices,

" # " #
−1 0 1 0
B1 = , B2 = .
0 0 0 1
20 CHAPTER 1. SOLVING LINEAR SYSTEMS WITH MATRICES

It is impossible to transform B1 into B2 using row operations alone. This means we can classify matrices by
whether or not they can be transformed into one another using row operations. This motivates the following
definition.

Definition 1.2.2: Row Equivalence

Two matrices that can be transformed into one another using a finite sequence of row operations are
called row equivalent. If A and B are row equivalent matrices, we write A ∼ B.

Exercise: Challenge

Prove that row equivalence is an equivalence relation on the set of all n × k matrices. (Ignore this
exercise if you don’t know what an equivalence relation is).

Example 1.2.2

The following matrices are all row equivalent:

     
−2 3 1 −262/5 −92 −23/3 −2/5 0 1
 1 5 1/2   −5 −25 −5/2   1 5 1/2 
, , ,
     
−2 −1/3
  
 8   13 23 13/6   13 23 13/6 
−2/5 0 1 −2 3 1 −2 3 1
 
−2/5 0 1

 1 5 1/2 

−2 −1/3
 
 8 
−2 3 1
This is because they all arose in Example 1.2.1 by performing a sequence of row operations on a
particular matrix. ♦

Example 1.2.3

The following two matrices are row equivalent

   
−4 −2 −1 3 1 0 0 −1
A= 2 1 1 −3  , B= 0 1 0 2 .
   

1 3 1 2 0 0 1 −3

Show this is true by finding the sequence of row operations that transform A into B.

Solution

There are many ways to proceed that will give the same answer. However, the way I’m going to perform
1.2. ROW REDUCING MATRICES: THE KEY TO SOLVING LINEAR SYSTEMS 21

the operations is foreshadowing for the algorithm we see in the next section.

Start in the top left corner of both matrices. The (1,1)-entry of B is a 1. To transform A into B,
we must make the (1,1)-entry of A into 1. This can be done by either swapping the first and third
rows or by multiplying the first row by −1/4. However, when doing row operations, it is useful to keep
fractions out of the calculations unless it is absolutely necessary. Therefore, instead of multiplying row
1 by −1/2, we will swap the first and third rows.
 
1 3 1 2
R1 ⇔ R3 : A ∼  2 1 1 −3  = A1 .
 

−4 −2 −1 3
We now need the two entries below the 1 in the (1,1)-position to be zero. We can obtain this by doing
two row replacements.
 
1 3 1 2
R2 ⇒ R2 − 2R1 : A1 ∼  0 −5 −1 −7  = A2 ,
 

−4 −2 −1 3
 
1 3 1 2
R3 ⇒ R3 + 4R1 : A2 ∼  0 −5 −1 −7  = A3 .
 

0 10 3 11
Now we need a 0 below the -5 in the (2,2)-position of A3 . We can obtain this by doing the following
row replacement.
 
1 3 1 2
R3 ⇒ R3 + 2R2 A3 ∼  0 −5 −1 −7  = A4
 

0 0 1 −3
We now need zeroes in the (2, 3) and (1, 3) positions. This can be done with two row replacements.
 
1 3 1 2
R2 ⇒ R2 + R3 : A4 ∼  0 −5 0 −10  = A5
 

0 0 1 −3
 
1 3 0 5
R1 ⇒ R1 − R3 : A5 ∼  0 −5 0 −10  = A6 .
 

0 0 1 −3
Now we need a 0 above the −5 in the (2,2)-position. We can avoid fractions by first multiplying the
second row by −1/5, and then doing row replacement.
22 CHAPTER 1. SOLVING LINEAR SYSTEMS WITH MATRICES

 
1 3 0 5
R2 ⇒ −(1/3)R2 : A6 ∼  0 1 0 2  = A7
 

0 0 1 −3
 
1 0 0 −1
R1 ⇒ R1 − 3R2 : A7 ∼  0 1 0 2  = B.
 

0 0 1 −3
We have now transformed A into B by performing a finite sequence of elementary row operations. This
shows that A is row equivalent to B; that is, A ∼ B. ♦

The matrix B in the previous example is in a special form called reduced row echelon form. The steps
followed in this example illustrate a standard algorithm used to transform a given matrix into its reduced
row echelon form.

1.2.2 Echelon Forms of a Matrix

The routine for solving linear systems using a matrix is contingent on using row operations to transform
the matrix into a special form called reduced row echelon form. Before we can define this, we need some
terminology.

Definition 1.2.3
Let A be an n × k matrix.

i. A row of A that contains only zeroes is called a zero row.

ii. A row of A is called non-zero if it contains an entry that is not zero.

iii. The first non-zero entry in a row of A is called a leading entry of A.

Example 1.2.4

The leading entries in the following matrix are bolded and underlined.
 
5 −2 0 1 0

 0 0 0 0 0 

 
 0 0 7 0 0 
0 −1 0 −1/2 2

We can now define echelon and reduced row echelon form.

1.2. ROW REDUCING MATRICES: THE KEY TO SOLVING LINEAR SYSTEMS 23

Definition 1.2.4: Echelon Form and Reduced Row Echelon Form

An n × k matrix A is in echelon form if it satisfies the following two conditions:

1. All zero rows are below all non-zero rows; that is, all zero rows are at the bottom of the matrix.

2. Each leading entry of a row is in a column to the right of the leading entry of the row above it.

A is in reduced row echelon form (abbreviated RREF) if it satisfies the additional two conditions:

3. The leading entry in each non-zero row is 1.

4. Each leading 1 is the only non-zero entry in its column.

If A and B are row equivalent matrices and B is in echelon form, then B is called an echelon form
of A.

Reduced row echelon form is a special type of echelon form. This means every matrix in reduced row echelon
form is also in echelon form. The converse is not true: there are matrices in echelon form that are not in
reduced row echelon form.

Example 1.2.5

Determine which of the following six matrices are in reduced row echelon form, echelon form only, or
not in echelon form at all.
 
0 1 0 0 0
 
1 3 0  1 0 0 0 0
" #  

1 0 4 3 1  
i) A1 =  0 3 5  ii) A2 = iii) A3 = 
 0 0 1 0 0
  
0 1 3 9 10 
0 0 0  0 0 0 1 0
 

0 0 0 0 1
     
1 0 0 0 0 1 1 0 0 2 1 3 0
 0 1 0 2 0   0 1 0 0   0 4 0 0 
iv) A4 =  v) A5 =  vi) A6 = 
     
  
 0 0 0 0 0   0 0 0 0   0 0 0 17 
0 0 0 1 0 0 0 0 0 0 0 0 0

Solution

i) The leading entries in A1 are the (1, 1) and (2, 2)-entries. We can see that the leading entries are
in columns to the left of the ones above and that the only zero row is at the bottom of the matrix.
Therefore, A1 is in echelon form. It is not in reduced row echelon form, though. This is because
the leading entry in the (2, 2)-position is a 3 and not a 1, so condition 3 is violated. Condition 4
is violated as well because the entry about the (2, 2)-position is not a zero.
24 CHAPTER 1. SOLVING LINEAR SYSTEMS WITH MATRICES

ii) The leading entries of A2 are in the (1, 1)- and (2, 2)-positions. They obey condition 2, 3, and 4,
and there are no zero rows so condition 1 is vacuously satisfied. Therefore, A2 is in reduced row
echelon form.

iii) The leading entries of A3 are in the (2, 1), (1, 2), (3, 3), (4, 4), and (5, 5)-positions. We can see that
the leading entry in the (2, 1) position is in a column to the right of the leading entry above it in
the (1, 2)-position. This means condition 2 is violated so that A3 is not in echelon form.

iv) Condition 1 is violated by the third column of A4 . Therefore, this matrix is also not in echelon
form.

v) The leading entries of A5 are in the (1, 1) and (2, 2)-positions. They obey condition 2, they are
both 1’s, however condition 4 is not satisfied because there is a non-zero entry in the (1, 2)-position.
Finally, all the zero rows are at the bottom of the matrix. Therefore, this matrix is in echelon
form, but not in reduced row echelon form.

vi) The leading entries of A6 are in the (1, 1), (2, 2), and (3, 4)-positions, so they satisfy condition 2,
and condition 1 is also satisfied as the only zero row is at the bottom of the matrix. However,
there are leading entries that are not 1, and there is a non-zero entry in the (1, 2)-position, so
both conditions 3 and 4 are violated. Therefore, this matrix is in echelon form, but not in reduced
row echelon form.
♦

A given matrix A is row equivalent to infinitely many different echelon forms. Indeed, this is because you
can scale a row by any non-zero real number and remain in echelon form. For example,
" # " #
1 0 m 0
∼
0 0 0 0

for any real number m. The reduced row echelon form, on the other hand, is unique.

Fact 1.2.1: RREF Is Unique

Every n × k matrix A is row equivalent to exactly one n × k matrix B in reduced row echelon form.

This fact is certainly something that needs to be proved, but we will omit the proof at this time.

The fact permits us to talk about the reduced row echelon form of a matrix A. For language, we say “put
A into RREF” or “row reduce A to RREF” to mean “use elementary row operations on A to transform A
into RREF.”

Example 1.2.6

In Example 1.2.3, B is the RREF of A and the example explicitly shows how to transform A into its
RREF using elementary row operations.
1.2. ROW REDUCING MATRICES: THE KEY TO SOLVING LINEAR SYSTEMS 25

When determining if a matrix is in echelon form, we need to look at its leading entries as we did in Example
1.2.5. The leading entries of a matrix in echelon form and the positions they lie in are important.

Definition 1.2.5: Pivots and Pivot Positions

Let A be an n × k matrix in echelon form.

1. A pivot of A is a leading entry of A.

2. A pivot position of A is an (i, j)-position in which A has a pivot.

3. A pivot column (resp. pivot row) of A is a column (resp. row) of A that contains a pivot.

Example 1.2.7

The matrices A1 , A2 , A5 , A6 of Example 1.2.5 are all in echelon form. The pivots of each matrix have
been boldfaced and underlined:
   
  1 1 0 0 2 1 3 0
1 3 0 " #
1 0 4 3 1  0 1 0 0   0 4 0 0 
A1 =  0 3 5  , A 2 = , A5 =   , A6 =  .
     
0 1 3 9 10  0 0 0 0   0 0 0 17 
0 0 0
0 0 0 0 0 0 0 0

The first and second columns of A1 , A2 , and A5 are pivot columns. The first, second, and fourth
columns of A6 are pivot columns. The pivot positions of A1 , A2 , and A5 are the (1, 1) and (2, 2)-
positions. The pivot positions of A6 are the (1, 1), (2, 2), and (3, 4)-positions. ♦

Given any n × k matrix A, whether it is in echelon form or not, we will still refer to columns/rows of A
as pivot columns/rows if, once we transform A to echelon form, the corresponding column/row contains a
pivot. We are allowed to do this due to the following result.

Theorem 1.2.1
Suppose that A is an n × k matrix. Then, the pivot positions in any two echelon forms of A are the
same.

Proof

The first thing we note is that row operations are reversible in the sense that if B is obtained from A
using a single row operation, then A can be obtained from B by applying a row operation that reverses
the first one. I leave it as an exercise to determine how to reverse the three different types of row
operations. What this means is that if A can be transformed into B using a sequence of row operations,
then you can transform B into A using a (generally different) sequence of row operations. In other
words, if A is row equivalent to B, then B is row equivalent to A.
26 CHAPTER 1. SOLVING LINEAR SYSTEMS WITH MATRICES

Now, suppose that B and C are echelon forms of A and that B has at least one pivot in a different
position than C. Note that, since B and C are in echelon form, all values in a column below a pivot
must be zero. Both B and C can be transformed into reduced row echelon form using row operations
as follows:

1. For any pivot that is not 1, multiply the corresponding row by the one over the pivot value;

2. Zero out all entries in a pivot column above the pivot using row replacement. Note that this will
not affect any of the other pivot positions because every entry directly to the left of the pivot
you’re working with, and all entries below, are necessarily zero.

Proceeding as described, transform B and C to reduced row echelon form with row operations and
denote these by B ∗ and C ∗ . It is clear that the row operations described above do not change any pivot
positions; that is, B and B ∗ have the same pivot positions, as do C and C ∗ . Therefore, since B and C
differ by at least one pivot positions, B ∗ and C ∗ can not be the same matrix; they differ in at least one
entry. But, A is row equivalent to both B ∗ and C ∗ , which implies A is row equivalent to two different
matrices in reduced row echelon form. This contradicts RREF Is Unique. Hence, it follows that the
pivot positions in different echelon forms of A are the same.

Note

Another way to state Theorem 1.2.1 is that the pivot positions are invariant between echelon forms of
a matrix.

This theorem implies that the pivot columns of a matrix don’t change as we pass between different row
equivalent echelon forms. This has a number of useful consequences going forward. In particular, this means
we can refer to the pivot positions/columns of a matrix A whether it is in echelon form or not.

Example 1.2.8

In Example 1.2.3, B is the RREF of A. The first three columns of B contain a pivot. Therefore, we
refer to the first three columns of A as pivot columns even though it is not in echelon form.

1.2.3 Gauss-Jordan Elimination: A Recipe for RREF

The reduced row echelon form of a matrix is the most important tool we have for solving linear systems and
also for solving many of the problems we encounter in this course. Therefore, it is important that we have an
efficient method for transforming a given matrix into its RREF. Gauss-Jordan Elimination is an algorithm
that gives us exactly this.
1.2. ROW REDUCING MATRICES: THE KEY TO SOLVING LINEAR SYSTEMS 27

Algorithm 1: Gauss-Jordan Elimination

To transform an n × k matrix A into echelon form, perform the following steps:

Step 1. Start in the left-most non-zero column and, if necessary, interchange rows so that the
top entry is non-zero. This is a pivot position. If desired, scale the first row to make this first entry a 1.

Step 2. Use row replacement to create zeros in each position below the top pivot position.

Step 3. Ignore the row that contains the pivot in step 1 and any rows above this row. Apply steps
1 and 2 to the matrix that results from ignoring these rows. Repeat this process until there are no
more non-zero columns to apply it to. The resulting matrix will be in echelon form.

To put the matrix into RREF, perform the following additional step:

Step 4. Begin at the right most pivot. Working upwards, use replacement to create zeros in every
entry above each pivot. Use scaling to make sure each pivot is equal to 1.

Note

The process of using Gauss-Jordan elimination to put a matrix into echelon form or RREF is referred
to as row reduction. We say “Row reduce A to echelon form” to mean “Use Gauss-Jordan elimination
to put A into echelon form”.

The steps followed in Example 1.2.3 are exactly those outlined in Gauss-Jordan Elimination. We give another
example outlining all of the steps explicitly.

Example 1.2.9

Use Gauss-Jordan Elimination to row reduce the following matrix A to echelon form. Once in echelon
form, identify the pivot columns. Then, continue with Gauss-Jordan Elimination to put the matrix
in reduced row echelon form.
 
0 −4 −4 1 −7 −8
 3 −1 −7 1 −4 −14 
A= .
 
 1 1 −1 3 4 10 
2 2 −2 0 2 −4

Solution

We follow the steps of Gauss-Jordan Elimination.

Step 1. Starting in the left-most non-zero column, the first entry is a zero, so we need to swap to
28 CHAPTER 1. SOLVING LINEAR SYSTEMS WITH MATRICES

rows so that the first row is not zero. Theoretically, you can swap with any row you want so long as
this entry becomes non-zero. But, since you know this is a pivot position and that the pivot eventually
needs to be 1, you may as well swap with the row that makes this entry a 1. We see there is a 1 in the
(3, 1)-position, so swap row 1 with row 3.
 
1 1 −1 3 4 10
 3 −1 −7 1 −4 −14 
R1 ⇐⇒ R3 : A ∼  = A1 .
 
−4 −4 1 −7 −8 

 0
2 2 −2 0 2 −4

Step 2. We start at the pivot we just made. It is a 1 in the (1, 1)-position. We need to make everything
below this pivot a 0. Do two row replacement to do this:
 
1 1 −1 3 4 10
R2 ⇒ R2 − 3R1  0 −4 −4 −8 −16 −44 
: A1 ∼  = A2 .
 
R4 ⇒ R4 − 2R1 −4 −4 1 −7 −8

 0 
0 0 0 −6 −6 −24
Note the row operations are executed in order from the top down.

Step 3. Ignore the row that contains the pivot we were working from, and any above. In this case, we
only ignore the fist row of A and look at the following submatrix:
 
0 −4 −4 −8 −16 −44
B= 0 −4 −4 1 −7 −8  .
 

0 0 0 −6 −6 −24
The second column of this sub-matrix is the left-most non-zero column, which corresponds to the (2,2)-
position in A2 . Now repeat steps 1 and 2 starting at this pivot position. If we divide row 2 by −4, we
get a 1 in this position and we start with this row operation.
 
1 1 −1 3 4 10
 0 1 1 2 4 11 
R2 ⇒ (−1/4)R2 : A2 ∼  = A3 .
 
−4 −4 1 −7 −8

 0 
0 0 0 −6 −6 −24
This row operation isn’t necessary at this time, but all the pivots must become 1 eventually anyway, so
we may as well do it now. To get zeroes below this new pivot, we do one row replacement.
 
1 1 −1 3 4 10
 0 1 1 2 4 11 
R3 ⇒ R3 + 4R2 : A3 ∼   = A4 .
 
 0 0 0 9 9 36 
0 0 0 −6 −6 −24
Now we’re back at step 3. Ignoring the row containing the pivot in B, the new sub-matrix is
1.2. ROW REDUCING MATRICES: THE KEY TO SOLVING LINEAR SYSTEMS 29

" #
0 0 0 9 9 36
C= .
0 0 0 −6 −6 −24
The fourth column is the left-most non-zero column. The 9 that is in the (3, 4)-position of A4 is the
new pivot position. Again, we can scale by 1/9 to turn it into 1 and then do a row replacement to get
a zero below:
 
1 1 −1 3 4 10
R3 ⇒ (1/9)R3  0 1 1 2 4 11 
: A4 ∼   = A5 .
 
R4 ⇒ R4 + 6R3  0 0 0 1 1 4 
0 0 0 0 0 0
Note again that the row operations are performed top down. If you do them in the reverse order, you
won’t get the same matrix.

Now, if we ignore the third row, and all rows above, we see that there are no more non-zero columns
to apply the algorithm to. This means step 3 is over and the resulting matrix is in echelon form, as
you can check against the definition. The pivot positions are the (1, 1), (2, 2), and (3, 4)-positions. The
first, second, and fourth columns are the pivot columns. These have been boldfaced and underlined.
 
1 1 −1 3 4 10
 0 1 1 2 4 11 
A5 = 
 

 0 0 0 1 1 4 
0 0 0 0 0 0
Step 4. We start at the right-most pivot, which is the 1 in the (3, 4)-position. The pivot is already a 1
so we don’t need to do any scaling. Therefore, we only need to make zeroes above this pivot by doing
row replacement.
 
1 1 −1 0 1 −2
R2 ⇒ R2 − 2R3  0 1 1 0 2 3 
: A5 ∼  = A6 .
 
R1 ⇒ R1 − 3R3

 0 0 0 1 1 4 
0 0 0 0 0 0
Proceed to the next pivot to the left. This is the 1 in the (2,2)-position. We muse row replacement to
make all entries above it zero.
 
1 0 −2 0 −1 −5
 0 1 1 0 2 3 
R1 ⇒ R1 − R2 : A6 ∼  = A7 .
 

 0 0 0 1 1 4 
0 0 0 0 0 0
The next pivot to the left is the 1 in the (1, 1)-position. There are no entries above this pivot. Therefore
the algorithm terminates and we are left with the RREF of A. ♦

There is a bit of an art to getting good with this algorithm and there are certain places where you can
combine steps to get the answer quicker. There is nothing wrong with doing this but, when you’re first
30 CHAPTER 1. SOLVING LINEAR SYSTEMS WITH MATRICES

learning, it is good practice to go through all the steps in full so you understand exactly how the algorithm
works. After this, you can start doing shortcuts.
1.3. SOLVING LINEAR SYSTEMS WITH MATRICES 31

1.3 Solving Linear Systems with Matrices

In this section, we put everything we’ve been doing together in one cohesive strategy for solving linear
systems using matrices. We have seen this strategy in action in specific examples, but in this section, we see
how it can be used to solve any linear system we want.

1.3.1 Solutions to Linear Systems and Matrices

At the beginning of Section 1.2, we performed row operations on the augmented matrix for the linear system
in Example 1.1.1 which, in turn, transformed the matrix into one that represented a linear system whose
solution was the same as the one in Example 1.1.1. We saw that these were equivalent to operations we would
typically perform on equations in a linear system to get a solution, hence why the new matrix corresponded
to a system with the same solution. The main idea here is that this is how it always works: if you perform
row operations on an augmented matrix, then the transformed matrix represents a linear system whose
solution is the same as the original. This is the content of the next theorem.

Theorem 1.3.1
Let A be an augmented matrix corresponding to a linear system. Let B a matrix that is obtained
from A by performing a finite sequence of row operations on A. Then, the solution set to the linear
system represented by B is the same as the solution set to the linear system represented by A.

Proof

The argument I’ll present is more of a sketch of a proof as opposed to rigorous proof itself. That said,
I don’t think it should be too hard to see why this is true.

B is obtained from A using row operations, of which there are three types. Consider how these row
operations effect the linear system A represents.

1. Interchanging rows changes the order in which we write the equations in the linear system down.
This doesn’t change the linear system itself, so the solution set doesn’t change.

2. Scaling a row is equivalent to multiplying an equation in the linear system by some non-zero
constant. Once again, this won’t change the solution set as you could always divide the constant
back out.

3. Row replacement is equivalent to subtracting a multiple of one equation to another. This is a

valid technique for solving linear systems that you were likely shown in high school. This means
that row replacement also does not change the solution set.

The above explanations show that the elementary row operations are equivalent to operations we can
perform on equations in a linear system that do not change the solution set, whence the theorem
follows.

This theorem tells us that if we want to solve a linear system, we can perform row operations on the corre-
32 CHAPTER 1. SOLVING LINEAR SYSTEMS WITH MATRICES

sponding augmented matrix and the solution set will remain unchanged. The strategy, therefore, is to use
row operations to transform the augmented matrix into one that represents a linear system whose solution
is evident. This form is reduced row echelon form! We give an example.

Example 1.3.1

Use a matrix to find the solution set to the following linear system,

−4x1 − 2x2 − x3 = 3
2x1 + x2 + x3 = −3
x1 + 3x2 + x3 = 2

Solution

The augmented matrix for this linear system is

 
−4 −2 −1 3
A= 2 1 1 −3  .
 

1 3 1 2
From Example 1.2.3, the RREF of A is
 
1 0 0 −1
0
A = 0 1 0 2 .
 

0 0 1 −3
The linear system A0 represents is

x1 = −1
x2 = 2
x3 = 3
It is fairly clear that the solution to this linear system is (x1 , x2 , x3 ) = (−1, 2, −3). By Theorem 1.3.1,
this is the solution to the original linear system. ♦

1.3.2 Different Solution Sets and Matrices

In Section 1.1.2, we cited a fact that there are only three possibilities for the number of solutions to a linear
system. There is either:

i) Exactly one solution;

ii) No solutions;

iii) Infinitely many solutions.

1.3. SOLVING LINEAR SYSTEMS WITH MATRICES 33

The number of solutions to a linear system can be determined by looking at the reduced row echelon form
of the corresponding augmented matrix. We saw an example of what the reduced row echelon form looks
like when there is exactly one solution in Example 1.3.1. Let’s see what happens in the other two cases.

Example 1.3.2

Determine the solution set to the following linear system by putting its augmented matrix in reduced
row echelon form.
x1 + x2 = 2
x1 − x2 = 0
x1 − 2x2 = −3

Solution

The augmented matrix for this linear system is

 
1 1 2
A= 1 −1 0 .
 

1 −2 −3
We use Gauss-Jordan Elimination to put this matrix into reduced row echelon form.

Start in the (1, 1)-position. This entry is already a 1 so we don’t need to do any scaling. We need
to create zeroes below this entry using row replacement. To do this, we do the following two row
operations.
 
1 1 2
R2 ⇒ R2 − R1
: A ∼  0 −2 −2  = A1 .
 
R3 ⇒ R3 − R1
0 −3 −5
From here, we see the next pivot is in the (2, 2)-position. This entry is a -2. Since dividing row 2 by
−2 does not introduce fractions, let’s do that. After, do a row replacement to get a zero underneath.
The two row operations we perform are:
 
1 1 2
R2 ⇒ (−1/2)R2
: A1 ∼  0 1 1  = A2 .
 
R3 ⇒ R3 + 3R2
0 0 −2
The matrix is now in echelon form. To go to reduced row echelon form, start at the pivot in the (2, 2)-
position and work to the right. One can see there is only one row operation necessary to go to reduced
row echelon form:
 
1 0 1
R1 ⇐ R1 − R2 : A1 ∼  0 1 1  = A3 .
 

0 0 −2
34 CHAPTER 1. SOLVING LINEAR SYSTEMS WITH MATRICES

A3 is the reduced row echelon form of A. The linear system this augmented matrix represents is:

x1 + = 2
x2 = 1
0x1 + 0x2 = −2

The last equation simplifies to −2 = 0. Clearly, this equation is nonsense and is never satisfied. This
means it is impossible to find a solution to the linear system A3 represents; i.e. it has no solutions.
Since the solution set of this linear system is the same as the original, it follows that the original linear
system has no solution. ♦

The previous example is one in which the linear system has no solution. The reduced row echelon form of
the corresponding augmented matrix has a row of the form

0 0 0 ... 0 | b

where b is a non-zero number. We will see that this is always how it works.

Now let’s do an example of a linear system with infinitely many solutions.

Example 1.3.3

Use a matrix to find the solution set to the following linear system,

− 4x2 − 4x3 + x4 − 7x5 = −8

3x1 − x2 − 7x3 + x4 − 4x5 = −14
x1 + x2 − x3 + 3x4 + 4x5 = 10
2x1 + 2x2 − 2x3 + 2x5 = −4

Solution

The augmented matrix for the linear system is

 
0 −4 −4 1 −7 −8
 3 −1 −7 1 −4 −14 
A= .
 
 1 1 −1 3 4 10 
2 2 −2 0 2 −4
From Example 1.2.9, the RREF of A is
 
1 0 −2 0 −1 −5
 0 1 1 0 2 3 
A0 =  .
 
 0 0 0 1 1 4 
0 0 0 0 0 0
The linear system corresponding to A0 is
1.3. SOLVING LINEAR SYSTEMS WITH MATRICES 35

x1 −2x3 − x5 = −5
x2 + x3 + 2x5 = 3
x4 + x5 = 4
These three variables x1 , x2 , and x4 can be solved in terms of x3 and x5 :

x1 = −5 + 2x3 + x5 , x2 = 3 − x3 − 2x5 , x4 = 4 − x5 .

This means that if the variables x1 , x2 , and x5 have the above specified forms, then values for x3 and
x5 can be chosen arbitrarily to obtain solution to the linear system. For example, take x3 = x5 = 0.
Then,
x1 = −5 + 2(0) + 0 = −5;

x2 = 3 − 0 − 2(0) = 3;

x4 = 4 − 0 = 4.

Substituting (x1 , x2 , x3 , x4 , x5 ) = (−5, 3, 0, 4, 0) into the linear system yields,

−4(3) − 4(0) + 4 − 7(0) = −12 + 4 = −8;

3(−5) − 3 − 7(0) + 4 − 4(0) = −15 − 3 + 4 = −14;

(−5) + 3 − 0 + 3(4) + 4(0) = −5 + 3 + 12 = 10;

2(−5) + 2(3) − 2(0) + 0(4) + 2(0) = −10 + 6 = −4;

which shows (x1 , x2 , x3 , x4 , x5 ) = (−5, 3, 0, 4, 0) is a solution to the linear system.

We can do this again. Try x3 = −1 and x5 = 1. Then,

x1 = −5 + 2(−1) + 1 = −5 − 2 + 1 = −6;

x2 = 3 − (−1) − 2(1) = 4 − 2 = 2;

x4 = 4 − 1 = 3.

Substitute (x1 , x2 , x3 , x4 , x5 ) = (−6, 2, −1, 3, 1) into the linear system:

−4(2) − 4(−1) + 3 − 7(1) = −8 + 4 + 3 − 7 = −8;

3(−6) − 2 − 7(−1) + 3 − 4(1) = −18 − 2 + 7 + 3 − 4 = −14;

(−6) + 2 − (−1) + 3(3) + 4(1) = −6 + 2 + 1 + 9 + 4 = 10;

2(−6) + 2(2) − 2(−1) + 0(3) + 2(1) = −12 + 4 + 2 + 2 = −4;

which again shows that (x1 , x2 , x3 , x4 , x5 ) = (−6, 2, −1, 3, 1) is a solution to the linear system.

Since we have already noticed that a linear system has either no solutions, one solution, or infinitely
many, it follows that this linear system must have infinitely many solutions because we have found more
36 CHAPTER 1. SOLVING LINEAR SYSTEMS WITH MATRICES

than one. In fact, we get a different solution for every real value of x3 and x5 . The complete solution
to the linear system is written formally as follows: let x3 = s and x5 = t where s, t ∈ R. Then,

x1 = −5 + 2s + t;

x2 = 3 − s − 2t;

x4 = 4 − t.

Then, the complete solution to the linear system is (x1 , x2 , x3 , x4 , x5 ) = (−5+2s+t, 3−s−2t, s, 4−t, t)
where s, t ∈ R. In fact, if you substitute these values into the equations in the lienar system, you’ll see
all of the equations are satisfied. This means that regardless of the values you pick for s and t, you
always get a new solution to the linear system. ♦

Exercise

Substitute (x1 , x2 , x3 , x4 , x5 ) = (−5 + 2s + t, 3 − s − 2t, s, 4 − t, t) into the linear system in Example

1.3.3 and verify that all of the linear equations are satisfied.

The linear system in Example 1.3.3 has infinitely many solutions. When writing the solution to the system,
we chose two variables and replaced them with new variables s and t where s and t can take any real value.
Moreover, the other variables in the solution were dependent on the values of s and t and couldn’t be picked
arbitrarily. We differentiate these two types of variables as follows.

Definition 1.3.1: Basic and Free Variables

A variable in a linear system that can be chosen to be any real number is called a free variable.
A basic variable is a variable in a solution to a linear system that is not free. The parameter that
replaces a free variable in the solution is called a free parameter.

Example 1.3.4

x3 and x5 are free variables in the solution to the linear system in Example 1.3.3. x1 , x2 , x5 are basic
variables. ♦

Exercise
If there are free variables in the solution to a linear system, they are not necessarily unique. Generally,
you can rewrite the solution to change basic variables into free variables. Give this a try with the
solution in Example 1.3.3: make x1 and x2 free variables in the solution and make x3 and x5 basic
variables. This is merely a different way of writing the solution to the linear system down. Keep in
mind that it probably will look nothing like the one we obtained!
1.3. SOLVING LINEAR SYSTEMS WITH MATRICES 37

1.3.3 The Solutions Theorem

There are two questions you should ask yourself when solving a linear system:

1. Is the linear system consistent or inconsistent? I.e. does it have at least one solution or not?

2. If the linear system is consistent, how many solutions does it have?

The answers to these questions can be deduced from the reduced row echelon form of the corresponding
augmented matrix. The following big theorem explains how it is all tied together.

Theorem 1.3.2: The Solutions Theorem

Let A be an augmented matrix for a linear system with n equations and k variables, and let B be
the reduced row echelon form of A. There are three possibilities.

i) The rightmost column of B (the one to the right of the vertical bar) is a pivot column. In this
case, the linear system A represents is inconsistent.

ii) The rightmost column of B is not a pivot column, and every column to the left of the vertical
bar is a pivot column. In this case, the linear system A represents is consistent and has exactly
one solution.

iii) The rightmost column of B is not a pivot column, and at least one column to the left of the
vertical bar is not a pivot column. In this case, the linear system A represents is consistent
and has infinitely many solutions.

Moreover, since each column either contains a pivot or doesn’t, this list is exhaustive. Thus, we
deduce that there are only three possibilities for solution sets to linear systems:

1) No solutions;

2) Exactly one solution;

3) Infinitely many solutions;

and the structure of the pivot columns in B described above determines the nature of the solutions.

Proof

First suppose the rightmost column of B is a pivot column. Then, this column contains a pivot, which
is a leading entry. This means that the first non-zero entry of some row of B is contained in this column.
This necessarily means that B contains a row of the form,

0 0 0 ... 0 | b

where b 6= 0. Translating this row to an equation yields,

0x1 + 0x2 + . . . + 0xk = b =⇒ 0 = b,

38 CHAPTER 1. SOLVING LINEAR SYSTEMS WITH MATRICES

which is clearly impossible as b 6= 0. This means the linear system B represents is inconsistent and, so,
by Theorem 1.3.1, the linear system A represents is also inconsistent.

Now suppose that the rightmost column of B is not a pivot column. Additionally, suppose every column
to the left of the vertical bar is a pivot column. Since B is in reduced row echelon form, it follows that
B must have the following form:
 
1 0 0 . . . 0 b1
 
 0 1 0 . . . 0 b2 
 
 0 0 1 . . . 0 b3 
 
 . . . .
. . ... ... 

 .. .. ..
B=
 

 0 0 0 . . . 1 bk 
 
 0 0 0 ... 0 0 
 
 . . . .
. . ... ... 

 .. .. ..
 
0 0 0 ... 0 0

where b1 , b2 , . . . , bk are real numbers. Note, if n = k, then there are no rows of zeroes at the bottom of
B. Moreover, since each of the k columns to the left of the bar has a pivot, it must be the case that
there are at least k rows since each row can have at most one pivot; that is n ≥ k.

Rewriting as a linear system, the matrix B represents:

1x1 + 0x2 + 0x3 + . . . + 0xk = b1 =⇒ x1 = b1

0x1 + 1x2 + 0x3 + . . . + 0xk = b2 =⇒ x2 = b2
0x1 + 0x2 + 1x3 + . . . + 0xk = b3 =⇒ x3 = b3
..
.
0x1 + 0x2 + 0x3 + . . . + 1xk = bk =⇒ xk = bk

which clearly has only one solution: (x1 , x2 , x3 , . . . , xk ) = (b1 , b2 , . . . , bk ). Therefore, by Theorem 1.3.1,
the linear system represented by A has exactly one solution.

Finally, suppose that the rightmost column is not a pivot column and that at least one column to the
left of the vertical bar is a non-pivot column. Relabelling the variables in the linear system if necessary,
we can assume that the first m columns of B are pivot columns where 1 ≤ m < k. In this case, B has
the form,  
1 0 0 . . . 0 b1,m+1 b1,m+2 . . . b1,k b1
 
 0 1 0 . . . 0 b2,m+1 b2,m+2 . . . b2,k b2 
 
 0 0 1 . . . 0 b3,m+1 b3,m+2 . . . b3,k b3 
 
 . . .
 .. .. .. . . . ... ..
.
..
. ...
..
.
.. 
. 
 
 
 0 0 0 . . . 1 bm,m+1 bm,m+2 . . . bm,k bm 
 
 0 0 0 ... 0 0 0 ... 0 0 
 
 . . .
 .. .. .. . . . ... ..
.
..
. ...
..
.
.. 
. 
 
0 0 0 ... 0 0 0 ... 0 0
1.3. SOLVING LINEAR SYSTEMS WITH MATRICES 39

where b1 , b2 , . . . , bm and the bij ’s all represent arbitrary real numbers (it is possible they are all zero).
Note that if there if k > n, that is, if there are more variables than equations, then there are no rows
of zeroes at the bottom of this matrix. Also, in this case, m < n.

Rewriting this augmented matrix as a linear system and rearranging as we did in Example 1.3.3, we
get the following:

x1 = b1 − b1,m+1 xm+1 − b1,m+2 xm+2 − . . . − b1,k xk

x2 = b2 − b2,m+1 xm+1 − b2,m+2 xm+2 − . . . − b2,k xk
..
.
xm = bm − bm,m+1 xm+1 − bm,m+2 xm+2 − . . . − bm,k xk

From here, we can see that the variables xm+1 , xm+2 , . . . , xk are all free variables, meaning they can
take any real value and a new solution is obtained. This means the linear system represented by B has
infinitely many solutions and, so, the linear system represented by A also has infinitely many solutions
by Theorem 1.3.1.

The only case left out of this argument is when B consists entirely of zeroes. The deduction of what
happens in this case is left as an exercise.

Finally, a column of a matrix either is a pivot column or it is not, there is no in between. This means
the above 3 cases exhaust all possibilities for the pivot column structure of B. Therefore, the reduced
row echelon form of the augmented matrix for any linear system must fall into one of these three cases.
In each case, we have deduced the solution set. This means that there are only three possibilities for
solutions to linear systems:

1. No solutions;

2. Exactly one solution;

3. Infinitely many solutions;

which suffices for proof of Fact Number of Solutions of Linear Systems and the last statement of the
theorem.

Exercise
Suppose that the augmented matrix of a linear system is given by,
 
0 0 ... 0 0
 0 0 ... 0 0 
 
.
 . . . . 
 . .
 . . . . . .. .. 
0 0 ... 0 0

Determine the number of solutions this linear system has. To do this, first try writing out the linear
system the augmented matrix represents.
40 CHAPTER 1. SOLVING LINEAR SYSTEMS WITH MATRICES

It is important to recognize that The Solutions Theorem only tells you the nature of the solution set for a
linear system. It does not tell you what the solution(s) is/are.

Example 1.3.5

The reduced row echelon form of the augmented matrix for the linear system in Example 1.3.3 is,
 
1 0 −2 0 −1 −5

 0 1 1 0 2 3 

 
 0 0 0 1 1 4 
0 0 0 0 0 0

Here, the pivot columns are the first, second, and fourth. The rightmost column is not a pivot
column, and there are two columns to the left of the bar (third and fifth) that are non-pivot columns.
Therefore, The Solutions Theorem immediately implies that the linear system has infinitely many
solutions, though you still need to write everything out to determine what the solution is.

Similarly, the reduced row echelon form of the augmented matrix for the linear system in Example
1.3.2 is,  
1 0 1
 0 1 1 .
 

0 0 −2
Here, we see that the rightmost column is a pivot column. Therefore, the corresponding linear system
has no solution by The Solutions Theorem. ♦

The Solutions Theorem can be generalized slightly. In particular, one only needs to look at any echelon form
of an augmented matrix to determine the nature of the solutions to a linear system. In particular, if B is an
echelon form of an augmented matrix A, then:

i) If the rightmost column of B is a pivot column, the corresponding linear system is inconsistent.

ii) If the rightmost column of B is not a pivot column and every column to the left of the vertical bar is
a pivot column, then the corresponding linear system is consistent and has exactly one solution.

iii) If the rightmost column of B is not a pivot column and at least one column to the left of the vertical
bar is not a pivot column, then the corresponding linear system is consistent and has infinitely many
solutions.

The proof of this is a straightforward application of Theorems ?? and 1.3.1 to The Solutions Theorem. I
leave the deduction of this as an exercise.

Exercise
Fill in the details for the argument needed to prove the statement above.
1.3. SOLVING LINEAR SYSTEMS WITH MATRICES 41

1.3.4 More Examples of Solving Linear Systems With Matrices

We end this chapter with a section dedicated to examples of solving linear systems with matrices. We
summarize the steps for solving such problems into the following strategy.

Strategy 1: Solving a Linear System Using a Matrix

In order to determine the solution set of a linear system, do the following steps.

Step 1. Write down the augmented matrix A for the linear system.

Step 2. Use Gauss-Jordan Elimination to put A into its reduced row echelon form B.

Step 3. There are three cases.

i) The rightmost column of B is a pivot column. In this case, the linear system is inconsistent by
The Solutions Theorem. Stop here, you are done.

ii) The rightmost column of B is not a pivot column and every column to the left of the bar is a
pivot column. Then, the linear system has exactly one solution by The Solutions Theorem and
the solution can be easily read from the matrix.

iii) The rightmost column of B is not a pivot column and at least one of the columns to the left
of the vertical bar is a non-pivot columns. In this case, the linear system has infinitely many
solutions. If you are in this case, move to Step 4.

Step 4. This step is only applicable if you have infinitely many solutions to the linear system. To
write down the solution, first note that the pivot columns correspond to basic variables in the solution
and the non-pivot columns to the left of the bar correspond to free variables in the solution. Write
out the linear system that the reduced row echelon form represents and solve for the basic variables
in terms of the free variables. The resulting values provide a solution to the linear system.

Note

As an abuse of language, I usually say an augmented matrix A is consistent/inconsistent instead of

referring to the linear system itself. Similarly, we sometimes say “Solve the augmented matrix A” to
mean “Find the solution to the linear system whose augmented matrix is A.”

Example 1.3.6

Determine the solution set to the following linear system,

4x1 + x2 − 7x3 = 23
x1 − 3x3 = 6
2x1 + 5x2 + 19x3 = 7
42 CHAPTER 1. SOLVING LINEAR SYSTEMS WITH MATRICES

Solution

We follow Solving a Linear System Using a Matrix.

Step 1. The augmented matrix for this linear system is

 
4 1 −7 23
A =  1 0 −3 6 .
 

2 5 19 7

Step 2. Proceed with Gauss-Jordan Elimination to put A into reduced row echelon form. We start by
making a 1 in the (1,1)-position and then getting zeroes below:
 
R1 ⇔ R2 1 0 −3 6
R2 ⇒ R2 − 4R1 : A ∼  0 1 5 −1  = A1 .
 

R3 ⇒ R3 − 2R1 0 5 25 −5
Now start at the 1 in the (2,2)-position. We need a zero underneath this:
 
1 0 −3 6
R3 ⇒ R3 − 5R2 : A1 ∼  0 1 5 −1  = A2 .
 

0 0 0 0
As one can check, the matrix is now in reduced row echelon form.

Step 3. The pivots are the 1’s in the (1, 1) and (2, 2)-positions. This means the rightmost column is
not a pivot column. Moreover, there is a non-pivot column to the left of the bar (column 3). By The
Solutions Theorem, the linear system has infinitely many solutions.

Step 4. We only do this step because the system has infinitely many solutions. Since the third column
is non-pivot, we see that x3 is the free variable whereas x1 and x2 are basic. Rewriting the matrix A2
as a linear system yields,

x1 − 3x3 = 6
x2 + 5x3 = −1
x3 is a free variable, so write x3 = s where s ∈ R. Substituting this into the above and solving for the
basic variables x1 and x2 in terms of x3 = s yields,

x1 − 3s = 6 =⇒ x1 = 6 + 3s
x2 + 5s = −1 =⇒ x2 = −1 − 5s
Therefore, the solution to the linear system is,

(x1 , x2 , x3 ) = (6 + 3s, −1 − 5s, s), s ∈ R.

1.3. SOLVING LINEAR SYSTEMS WITH MATRICES 43

Note

The notation s ∈ R is important. Whenever there is a solution involving free variable, you must write
this because it tells the reader that s can be any real value. If this is not present, the solution means
nothing.

Example 1.3.7

Determine the solution set to the following linear system:

x1 − (3/2)x2 + x3 = 2
5x1 − 7x2 + 5x3 = 10
2x1 − 3x2 + 2x3 = 6

Solution

We follow Solving a Linear System Using a Matrix.

Step 1. The augmented matrix for this system is

 
1 −3/2 1 2
A= 5 −7 5 10  .
 

2 −3 2 6
Step 2. We do Gauss-Jordan Elimination to put A into reduced row echelon form. The first step is
 
1 −3/2 1 2
R2 ⇒ R2 − 5R1
: A ∼  0 1/2 0 0  = A1 .
 
R3 ⇒ R3 − 2R1
0 0 0 2
Note here that we are now in echelon form and the rightmost column is a pivot column. This is enough
to conclude the linear system is inconsistent. If you want to proceed to reduced row echelon form, we
do the following:
 
R1 ⇒ R1 + 3R2 1 0 1 2
R2 ⇒ 2R2 : A1 ∼  0 1 0 0  = A2 .
 

R3 ⇒ (1/2)R3 0 0 0 1

We are now in reduced row echelon form.

44 CHAPTER 1. SOLVING LINEAR SYSTEMS WITH MATRICES

Step 3. The rightmost column is a pivot column. Therefore, the linear system is inconsistent by The
Solutions Theorem. ♦

Example 1.3.8

Determine the solution set to the following linear system,

3x1 + 3x2 − 5x3 − 14x4 = −4

−2x1 + 9x2 + 9x4 = −22
−x1 + 2x2 + x3 + 5x4 = −5
x1 − 5x2 − 5x4 = 12

Solution

We follow Solving a Linear System Using a Matrix.

Step 1. The augmented matrix for this system is

 
3 3 −5 −14 −4
 −2 9 0 9 −22 
A=
 
−1 −5

 2 1 5 
1 −5 0 −5 12
Step 2. Start Gauss-Jordan Elimination to put A into reduced row echelon form. We start in the first
column and, even though there is a non-zero entry, in the appropriate pivot position, do a row swap to
make the math easier.
 
1 −5 0 −5 12
 −2 9 0 9 −22 
R1 ⇔ R4 : A ∼   = A1 .
 
 −1 2 1 5 −5 
3 3 −5 −14 −4
We now need zeroes below the 1 in the (1, 1)-position:
 
1 −5 0 −5 12
R2 ⇒ R2 + 2R1  0 −1 0 −1 2 
R3 ⇒ R3 + R1 : A1 ∼  = A2 .
 
−3

 0 1 0 7 
R4 ⇒ R4 − 3R1
0 18 −5 1 −40
The next pivot is the -1 in the (2,2)-position. We make it a 1 and get a zero below it by doing a row
scaling and a row replacement.
1.3. SOLVING LINEAR SYSTEMS WITH MATRICES 45

 
1 −5 0 −5 12
R2 ⇒ −R2  0 1 0 1 −2 
R3 ⇒ R3 + 3R2 : A2 ∼   = A3 .
 
 0 0 1 3 1 
R4 ⇒ R4 − 18R2
0 0 −5 −17 −4
The next pivot is the 1 in the (3, 3)-position. We need a zero below it.
 
1 −5 0 −5 12
 0 1 0 1 −2 
R4 ⇒ R4 + 5R3 : A3 ∼  = A4 .
 

 0 0 1 3 1 
0 0 0 −2 1
This matrix is now in echelon form. To proceed to reduced ow echelon form, start at the pivot in the
(4, 4)-position. It is a -2. Turn it into a 1 and make zeroes above it:
 
R4 ⇒ (−1/2)R4 1 −5 0 0 19/2
R3 ⇒ R3 − 3R4  0 1 0 0 −3/2 
: A4 ∼   = A5 .
 
R2 ⇒ R2 − R4  0 0 1 0 5/2 
R1 ⇒ R1 + 5R4 0 0 0 1 −1/2
There is one final move to get to reduced row echelon form: zero out the (1, 2)-entry:
 
1 0 0 0 2
 0 1 0 0 −3/2 
R1 ⇒ R1 + 5R2 : A5 ∼   = A5 .
 
 0 0 1 0 5/2 
0 0 0 1 −1/2
This is reduced row echelon form.
Step 3. We see that all of the columns to the left of the bar are pivot columns and the rightmost one is
not. By The Solutions Theorem, the linear system has exactly one solution and it can be readily seen
by looking at the matrix: (x1 , x2 , x3 , x4 ) = (2, −3/2, 5/2, −1/2). ♦

Example 1.3.9

Determine the solution set to the following linear system,

−6x1 + 15x2 − 48x3 − 18x4 + 53x5 = −68

18x1 + 6x2 − 9x3 + 6x4 + 2x5 = −8
3x2 − 9x3 + x5 = −4
6x1 + 6x2 − 15x3 − 6x4 + 26x5 = −32

Solution

We follow Solving a Linear System Using a Matrix.

46 CHAPTER 1. SOLVING LINEAR SYSTEMS WITH MATRICES

Step 1. The augmented matrix for the linear system is:

 
−6 15 −48 −18 53 −68
 18 6 −9 6 2 −8 
A= .
 
 0 3 −9 0 1 −4 
6 6 −15 −6 26 −32
Step 2. We now do Gauss-Jordan Elimination. We need to start in the (1, 1)-position, but there is no
way to make this entry a 1 without introducing fractions into the matrix. We want to avoid this until
it is absolutely necessary. In this case, we don’t lose out too much because we can get zeroes below the
pivot doing two fairly simple row replacements:
 
−6 15 −48 −18 53 −68
R2 ⇒ R2 + 3R1  0 51 −153 −48 161 −212 
: A ∼  = A1 .
 
R4 ⇒ R4 + R1 −9 −4

 0 3 0 1 
0 21 −63 −24 79 −100
The next pivot is in the (2, 2)-position. Again , we can’t divide by 51 without introducing fractions into
the matrix. You want to avoid this. Instead, notice that the (2, 2)-entry and all entries below in the
same column are multiples of three. There is a 3 in the (3, 2)-position. If we swap the third and second
rows so that the 3 is in the pivot position, then we can do two row replacements that don’t involve
fractions to get zeroes below the pivot position:
 
−6 15 −48 −18 53 −68
R2 ⇔ R3  0 3 −9 0 1 −4 
R3 ⇒ R3 − 17R2 : A1 ∼   = A2 .
 
 0 0 0 −48 144 −144 
R4 ⇒ R4 − 7R2
0 0 0 −24 72 −72
The next pivot is in the (3, 4)-position. Now, 144 is divisible by 48 so we can divide to transform it
into a 1, then do a row replacement:
 
−6 15 −48 −18 53 −68
R3 ⇒ −(1/48)R3  0 3 −9 0 1 −4 
: A2 ∼  = A3 .
 
R4 ⇒ R4 + 24R3 1 −3

 0 0 0 3 
0 0 0 0 0 0
We are now in echelon form. To get to reduced row echelon form, start at the pivot in (3, 4)-position.
It is a 1. We need zeroes above it, so we do one row operation:
 
−6 15 −48 0 −1 −14
 0 3 −9 0 1 −4 
R1 ⇒ R1 + 18R3 : A3 ∼ = A4 .
 
0 1 −3
 
 0 0 3 
0 0 0 0 0 0
1.3. SOLVING LINEAR SYSTEMS WITH MATRICES 47

Move to the next pivot to the left in the (2, 2)-position. It is not a one, but the row operation is easier
to do if you leave it as a 3 and don’t introduce fractions:
 
−6 0 −3 0 −6 6
 0 3 −9 0 1 −4 
R1 ⇒ R1 − 5R2 : A4 ∼  = A5 .
 
0 1 −3

 0 0 3 
0 0 0 0 0 0
Finally, to get to reduced row echelon form, we need to make the pivots equal to 1. We do this via two
row scalings:
 
1 0 1/2 0 1 −1
R1 ⇒ −(1/6)R1  0 1 −3 0 1/3 −4/3 
: A5 ∼  = A6 .
 
R2 ⇒ (1/3)R2 −3

 0 0 0 1 3 
0 0 0 0 0 0
We are now in reduced row echelon form.

Step 3. The rightmost column of the reduced row echelon form is not a pivot column and there
are pivot columns to the left of the vertical bar that are non-pivot columns (columns 3 and 5). There-
fore, the linear system has infinitely many solutions by The Solutions Theorem and we proceed to step 4.

Step 4. In this step, we determine the solution to the linear system. Start by writing the reduced row
echelon form as a linear system:

x1 + (1/2)x3 + x5 = −1
x2 − 3x3 + (1/3)x5 = −4/3
x4 − 3x5 = 3

Since the non-pivot columns correspond to x3 and x5 , we take these as the free variables, so x1 , x2 , and
x4 are the basic variables. Letting x3 = s and x5 = t where s and t are real parameters, we rewrite the
above equations as follows:

x1 + (1/2)s + t = −1 =⇒ x1 = −1 − (1/2)s − t

x2 − 3s + (1/3)t = −4/3 =⇒ x2 = −4/3 + 3s − (1/3)t

x4 − 3t = 3 =⇒ x4 = 3 + 3t.

Therefore, the solution to the linear system is,

1 4 1
(x1 , x2 , x3 , x4 , x5 ) = −1 − s − t, − + 3s − t, s, 3 + 3t, t , s, t ∈ R.
2 3 3

There are more ways we could write this. Suppose we didn’t like the fractional coefficients on the free
variables. Since the parameters s and t can take any real value, we can write them as s = 2s0 and
48 CHAPTER 1. SOLVING LINEAR SYSTEMS WITH MATRICES

t = 3t0 where s0 , t0 ∈ R with the coefficients chosen to clear denominators. Then, an equally valid way
to write the solution is,

4
−1 − s0 − 3t0 , − + 6s0 − t0 , 3s0 , 3 + 9t0 , 3t0 , s0 , t0 ∈ R
3

Both ways of writing are completely valid, you can do whichever you like. ♦

Note that a linear system must be consistent to have free variables - free variables occur only when a matrix
has infinitely many solutions.

Example 1.3.10

Consider the following augmented matrix

 
1 0 1 1
A= 0 1 1 0 .
 

0 0 0 1
There is one non-pivot column to the left of the vertical bar, which typically would indicate the
existence of a free variable in the corresponding linear system. However, because the rightmost
column of this matrix is a pivot column, the linear system is inconsistent by The Solutions Theorem.
Hence, it has no free variables. ♦
Chapter 2

Vectors in Rn

Linear algebra is the study of vector spaces; these are abstract sets where vectors live. In this chapter, we
introduce vectors and some of their elementary properties.

That being said, just because I am using the word “elementary” does not meant that the forthcoming ma-
terial is easy. In fact, it is at this point where the material becomes more abstract. As you read on, I want
you to keep in mind that all of what is coming can be linked back to solving linear systems. In fact, a lot
of the challenge of the problems we solve isn’t calculation, it is determining what the question is asking and
how to interpret the corresponding calculation.

2.1 2-Dimensional Vectors

We are going to start by looking at vectors in two dimensions only. Linear algebra is extremely geometric and
all of the abstract definitions and constructions have geometric meaning. That being said, we are generally
working with vectors in spaces that can not be visualized. Therefore, we start in two dimensions because, in
this case, vectors can be easily visualized and we can explicitly see the geometry of operations like addition
and scalar multiplication. We will frequently make use of vectors in two dimensions to illustrate abstract
concepts.

We all know the standard Cartesian plane: it is the one you work in to draw graphs of functions with the x
and y axes that has coordinates (x, y) where x and y are real numbers. Another way to refer to the Cartesian
plane is R2 , pronounced “R two.”

Definition 2.1.1: Displacement Vectors

Let A and B be two points in the Cartesian plane R2 . The displacement vector from a point A to
−−→
a point B is an arrow with its tail at A and its head at B. This is denoted by ~v = AB. We have the
following terminology for vectors:

i) A is called the initial point of ~v and B is called the terminal point.

49
50 CHAPTER 2. VECTORS IN RN

ii) The length of ~v is the distance between the points A and B and is represented by the length
of the arrow. This is denoted by k~v k.

iii) The direction of ~v is the direction the arrow is pointing in. Note that two vectors have the
same direction if they are parallel and pointing in the same direction.

Example 2.1.1

Determine the initial and terminal points of the following vectors. Also determine their lengths.

x2
4

3 ~v

x1
−4 −3 −2 −1 1 2 3 4

w
~ −1

Solution

The initial point is where the vector starts. We can see that the vector ~v starts at the point (1, −1)
and w~ starts at (3, 4). The terminal points are the points where the arrow heads touch. In this case,
~ is (−3, −1).
the terminal point of ~v is (4, 3) and the terminal point of w

To find the lengths, we can use the Pythagorean theorem. Taking ~v for example, we can form the
following right angle triangle:
2.1. 2-DIMENSIONAL VECTORS 51

3 ~v

x1
1 2 3 4

−1

We can see from here that the base of the right angle triangle has length 3 and it has height 4. Then,
by the Pythagorean Theorem, the length of ~v is
p √
k~v k = 32 + 42 = 25 = 5.

Note, the positive square root is always used because we are talking about a length.

We do the same thing for w:

x2
4

x1
−3 −2 −1 1 2 3

w
~ −1
52 CHAPTER 2. VECTORS IN RN

We see the base of the triangle is 6 and the height of the triangle is 5. Therefore,
p √
~ = 62 + 52 = 61.
kwk

Note these vectors do not share the same direction because they are not parallel. ♦

Example 2.1.2

The following two directions have the same direction, but not the same length:

1 w
~

x1
−4 −3 −2 −1 1 2 3 4

−1

~v −2

−3

The directions are the same because they are pointing in the same direction and are parallel. Note
that if we reverse the initial and terminal points of one of these vectors, they are still parallel, but no
longer point in the same direction, so the vectors themselves do not have the same direction. This is
pictured below:
2.1. 2-DIMENSIONAL VECTORS 53

1 w
~

x1
−4 −3 −2 −1 1 2 3 4

−1

−2

~v −3

The only quantities that matter when it comes to vectors are the length of the vector, also referred to as
magnitude, and direction. Indeed, two vectors ~v and w~ are equal when their magnitude and direction are
the same. We don’t care where they are positioned in R2 . For example, the following two vectors in R2 are
equal even though they have different initial and terminal points.

x2
w
~
4

~v
2

x1
−1 1 2

It may be obvious from this discussion, but it is worth mentioning that vectors are not numbers. Therefore,
54 CHAPTER 2. VECTORS IN RN

if we want perform operations like addition and subtraction on vectors, then we need to define this operation.
Let’s look at addition first.
−−→ −−→
Suppose A, B, and C are three points in R2 and let ~v = AB, and w
~ = BC. We give the following picture
below as an example to visualize the situation:
x2
C

−−→
w
~ = BC

−−→
~v = AB x1
A

Vectors represent quantities that have direction and magnitude. With this in mind, how should we define
the “sum” of two vectors? Think about adding two numbers as combining them together into a number that
represents the “total” of the two. If we think about how to extend this to vectors, we could reasonably think
that the sum of two vectors should be a vector that represents the total “effect” of the two being added.

To see how one might define this “total” vector, think about a truck starting at A, driving along ~v until it
hits the point B. then driving along w
~ until it hits the point C. The total “effect” is a path from the point
−→
A to the point C. This suggests we should define the sum of ~v and w~ as AC: the vector whose initial point
is A and whose terminal point is C; this is depicted below:
x2
C

−−→
w
~ = BC

−→
~v + w
~ = AC

−−→
~v = AB x1
A
2.1. 2-DIMENSIONAL VECTORS 55

Now, we present a way to construct the vector ~v + w. ~ This construction will be quite convenient because,
after this section, all vectors will be assumed to have their initial point at the origin (0, 0).

First translate w
~ so its initial point is the same as that of ~v :

w
~

~v
x1

Remember that the actual position of the vectors doesn’t matter, only magnitude and direction does, so the
w
~ above is the same as the one before, just in a different position.

Next, starting at the terminal point of ~v , and draw a vector that is parallel to w
~ and has the same length.
It should look like the following:

w
~

~v
x1

Next, starting at the terminal point of w,

~ draw a vector parallel to ~v with the same length:
56 CHAPTER 2. VECTORS IN RN

w
~

~v
x1

If you have done this correctly, you will always create the above parallelogram. Then, the vector ~v + w ~ is
the vector whose initial point is the initial point of ~v and w
~ and whose terminal point is the far corner of
this parallelogram:

~v + w
~

w
~

~v
x1

It is evident that ~v + w
~ constructed in this way is the same as the one constructed above. This is because
the vector starting at the terminal point of ~v is actually just w:
~ it has the same length and points in the
same direction.

The above shows how to add vectors. Can we multiply them? The answer to that question is not clear at the
moment. There are notions of multiplication of vectors, but they all have some issues. Instead of defining
multiplication of vectors, we define a related notion called scalar multiplication.
2.1. 2-DIMENSIONAL VECTORS 57

Definition 2.1.2: Scalars and Scalar Multiplication

A real number r ∈ R is called a scalar. If ~v is a displacement vector and r is a scalar, the scalar
multiple r~v is the displacement vector whose length is,

kr~v k = |r| k~v k ,

and whose direction is the same as ~v if r > 0 and is opposite of ~v if r < 0.

Example 2.1.3

The following are scalar multiples of a vector ~v .

x2
4

2~v
2

~v
1

(1/2)~v
x1
−4~v
−5 −4 −3 −2 −1 1 2 3 4

−~v −1

Note that −~v is merely ~v pointing in the opposite direction. We call this the negative of ~v . ♦

If you think about the arithmetic of numbers, the operation of subtraction is the same as adding negative
numbers. We define subtraction of vectors in the same way:

~v − w
~ = ~v + (−w).
~

We can calculate differences using the parallelogram law for addition. An example of this is shown below.
58 CHAPTER 2. VECTORS IN RN

w
~ x2
4

x1
−1 1 2 3 4 5

~v
−1

−2

−3

−4 −w
~

−5 ~v − w
~

Like most things in mathematics there is a shortcut. Notice that the vector we have drawn as ~v − w ~ is
exactly parallel to the vector we get if we start at the tip of w
~ and join this to the tip of ~v . See below:

w
~ x2
4

x1
−1 1 2 3 4 5

~v
−1

−2

−3

−4 −w
~

−5 ~v − w
~

This vector has the same magnitude and direction as ~v − w, ~ thus it is equal to ~v − w.
~ Therefore, we can
draw the difference ~v − w
~ by drawing a vector from the terminal points of w ~ to the terminal point of ~v . The
~ − ~v , which should make sense because w
direction is reversed for w ~ − ~v is the negative of ~v − w!
~
2.2. VECTORS IN EUCLIDEAN SPACE 59

2.2 Vectors in Euclidean Space

In the previous section, we noted the only thing that matters when it comes to vectors is their length and
direction and not their actual location in the Cartesian plane. Therefore, any time we refer to a vector, we
always assume the terminal point of the vector is the origin (0, 0). If it isn’t, translate the vector so that it
is. Under this assumption, vectors are completely determined by their terminal point which is a coordinate
P = (p1 , p2 ) in the Cartesian plane. Moreover, any point in the Cartesian plane defines a unique vector
under this assumption.
# "
p1
Any point P = (p1 , p2 ) in the Cartesian plane can be identified with the 2 × 1 matrix . Therefore,
p2
we can think of any 2 × 1 matrix as a vector in the Cartesian plane. Moreover, we can define addition and
subtraction of 2 × 1 vectors using componentwise addition and subtraction:
" # " # " #
p1 q1 p1 ± q1
± = .
p2 q2 p2 ± q2
If you graph some examples of vectors defined by the above matrices, you’ll see that the above definition of
vector addition and subtraction exactly gives the same vectors that the vector addition and subtraction in
the last section gave.

We define scalar multiplication via matrices componentwise as well:

" # " #
p1 rp1
r = .
p2 rp2
Again, you can graph some examples to see that the vectors defined by this operation are exactly the ones
defined by the scalar multiplication operation of the previous section.

The previous examples show we can represent vectors in the Cartesian plane using matrices. The beautiful
thing about this observation is that we can use matrices to generalize vectors. Indeed, we could do the same
thing in the previous section for vectors in 3-dimensional space. And then we can generalize to 4-dimensional
space or 5-dimensional space, or any n-dimensional space we can think of. Therefore, we define vectors using
matrices as follows.

Definition 2.2.1: Vectors & Euclidean Space

An (n-dimensional) vector ~v is an n × 1 matrix

 
v1
v2
 
 
~v = 
 .. 

 . 
vn

The entries v1 , . . . , vn are called the components of the vector ~v . The number of components n is
called the dimension of the vector. The set of all n-dimensional vectors with real components is
60 CHAPTER 2. VECTORS IN RN

called n-dimensional Euclidean space and is denoted by Rn . In set notation,

  

 v1 

 
 v2

  

Rn = 

 : v1 , . . . , v n ∈ R .
 .. 


  .  


 
vn
 

.
In the above, the .. notation is a placeholder that tells you to repeat the pattern up to the given last value.
We use this because, to work in total generality, we can not assign a specific positive integer value to n.
Indeed, if you are supposed to prove that a specific property holds for vectors in Rn , then you must work with
general vectors written down just like above. Proving the property holds by only considering vectors with a
fixed number of components, say 3, only proves that property holds for vectors in R3 , and not necessarily in
R4 , R5 , etc.

Example 2.2.1

R2 is 2-dimensional real space, which we know as the Cartesian plane. This is the space you work in
in Calculus 1 and 2. R3 is three dimensional real space. You’ve worked in R3 if you’ve done Calculus
3. R3 is also what we live in.

If n ≥ 4, then we can’t visualize Rn . This is why defining vectors in terms of matrices is so important.
In the previous section, we defined a vector as a 2-dimensional arrow with length and direction. We then
used our geometric intuition to derive how to add, subtract, and do scalar multiplication. We can do the
same thing in R3 , though addition and subtraction is a little more complicated to figure out geometrically.
But since we can’t visualize Rn for any n ≥ 4, we can’t repeat this same geometric intuition process for
higher dimensional spaces. What do length and direction even mean in a space like R4 ? Defining vectors as
matrices allows us to skirt around these problems. We can then define operations on n-dimensional vectors
via their matrix representations from a purely algebraic standpoint which is motivated by what we know
happens in the 2 and 3 dimensional cases.

Definition 2.2.2: Vector Equality, Vector Addition & Scalar Multiplication

Let ~u, ~v ∈ Rn and write    

u1 v1
u2 v2
   
   
~u = 
 .. ,
 and ~v = 
 .. .

 .   . 
un vn

1. ~u and ~v are equal, written ~u = ~v , if ui = vi for i = 1, . . . , n.

2.2. VECTORS IN EUCLIDEAN SPACE 61

2. The sum/difference of ~u, ~v , written ~u ± ~v , is defined as

     
u1 v1 u1 ± v1
u2 v2 u2 ± v2
     
     
~u ± ~v = 
 .. ±
  .. =
  .. .

 .   .   . 
un vn un ± vn

~ ∈ Rn ,
3. For any scalar r ∈ R, and any vector w scalar multiplication of r and w,
~ denoted rw,
~
is defined as    
w1 rw1
 w2 rw2
   
  
rw
~ = r ..
=
  .. .

 .   . 
wn rwn

There are a pair of things to note.

1) If you want to add or subtract two vectors, they must have the same dimension. If two vectors have
different dimension, then they can not be added or subtracted. Therefore, we can not add/subtract a
vector in R2 from a vector in R5 .

2) We do not define multiplication of two vectors. We have scalar-vector multiplication, but that is it.
This is a little different than what we are used to as we now have two different objects to deal with
(scalars and vectors).

Note
Under the operations of addition and scalar multiplication, the set Rn becomes a mathematical
structure called a vector space. We won’t delve into the abstract notion of a vector space too much
in this course but, these are studied frequently in lots of areas of mathematics.

Example 2.2.2
 
2 " #
−1
Let ~u =  1  and ~v = . Then ~u ± ~v is not defined.
 
−1
1
62 CHAPTER 2. VECTORS IN RN

Example 2.2.3

Let      
1 −2 −3
~ =  8 .
~u =  0  , ~v =  1  , w
     

2 3 −1
Calculate 3~u, 2~v − w,
~ 2~u + 3~v + w.
~

Solution. Applying the definitions of vector addition and scalar multiplication yields
 
3
3~u =  0  ,
 

6
       
−4 −3 −4 − (−3) −1
2~v − w
~ = 2 − 8 = 2−8 = −6  ,
       

6 −1 6 − (−1) 7
       
2 −6 −3 −7
~ = 0 +
2~u + 3~v + w 3  +  8  =  11 . ♦
       

4 9 −1 12

There is a special vector that plays the same role that zero does for numbers. It is aptly called the zero
vector.

Definition 2.2.3: Zero Vector

The zero vector in Rn , denoted ~0, is the vector that contains only zeroes
 
0
 0 
 
~0n =
 ..  .
 
 . 
0

When the dimension of the zero vector is clear from context, we drop the n and simply write ~0.

The following theorem gives some properties of vector addition and scalar multiplication. Many of these are
similar to those of real numbers. We need these in order to do algebra on vectors.

Theorem 2.2.1: Properties of Vectors

~ ∈ Rn be vectors and let r, s ∈ R be scalars.

Let ~u, ~v , w

1. ~u + ~v = ~v + ~u (commutativity of addition);
2.2. VECTORS IN EUCLIDEAN SPACE 63

2. r(~u + ~v ) = r~u + r~v (distributivity of scalars);

3. (r + s)~u = r~u + s~u

4. (~u + ~v ) + w
~ = ~u + (~v + w)
~ (associativity of addition);

5. r(s~u) = (rs)~u (associativity of scalar multiplication);

6. ~u + (−~u) = ~u − ~u = ~0;

7. ~u + ~0 = ~0 + ~u = ~u;

8. 1~u = ~u.

Proof. To prove each property in this theorem, we need to show each vector on the left is equal to the vector
on the right. To do this, you pick arbitrary vectors, write them out, compute the expression on the left and
show it is equal to the expression on the right. I’ll do two of these proofs and leave the rest as exercises as
they all follow in a similar fashion.
   
u1 v1
 u2   v2 
   
Proof of Property 1. Let ~u =  .  , ~v =  . 
  
 . Then,
 ..   .. 
un vn
   
u1 v1
 u2   v2 
   
~u + ~v =  .  +  . 
  
 ..   .. 


un vn

 
u1 + v1
u2 + v2
 
 
=
 ..  by definition of vector addition,

 . 
un + vn

 
v1 + u1
v2 + u2
 
 
=
 ..  by properties of real numbers,

 . 
vn + un

   
v1 u1
v2 u2
   
   
=
 .. +
  ..  by definition of vector addition,

 .   . 
vn un
64 CHAPTER 2. VECTORS IN RN

= ~v + ~u
 
u1
u2
 
 
Proof of Property 5. Let ~u =  ..  ∈ Rn and r, s ∈ R. Then,
 
 . 
un
 
su1
su2
 
 
s~u = 
 .. 

 . 
sun

by definition of scalar multiplication so that

 
r(su1 )
r(su2 )
 
 
r(s~u) = 
 ..  again by definition of scalar multiplication.

 . 
r(sun )

Therefore,
 
r(su1 )
r(su2 )
 
 
r(s~u) = 
 .. 

 . 
r(sun )

 
(rs)u1
(rs)u2
 
 
=
 ..  by properties of real numbers,

 . 
(rs)un

 
u1
u2
 
 
= (rs) 
 ..  by definition of scalar multiplication,

 . 
un

= (rs)~u.

Thus r(s~u) = (rs)~u which completes the proof.

2.2. VECTORS IN EUCLIDEAN SPACE 65

Exercise
Prove the rest of Properties of Vectors.

It should be clear why all of these properties are true for 2-dimensional vectors given the geometric definitions
of addition and subtraction we derived in the last section. For example, ~u + ~v = ~v + ~u makes sense because
the parallelogram defined by ~u and ~v is the same as the parallelogram defined by ~v and ~u.

2.2.1 Vector Forms of Solutions to Linear Systems

Solutions to linear systems can be expressed using vectors. In this section, we explain how to write the
solution to a linear system using vectors in a special way called the vector form of the solution.

The idea is straightforward. Once we have a solution to a linear system, instead of putting the solution in
round brackets as an ordered tuple, put the solution into a vector. If a linear system has only one solution,
the difference between the two forms of the solution is negligible. For example, the vector form of the solution
to the linear system in Example 1.1.1 is the following,
" # " #
x1 152
= .
x2 78

Vector forms of solutions to linear systems with infinitely many solutions are a little bit different. In this
case, we split the vector apart using vector addition and scalar multiplication so that each free variable sits
inside of its own vector. This process becomes clear after seeing a few examples.

Example 2.2.4

The solution to the linear system given in Example 1.3.6 is

7 1 58 31
(x1 , x2 , x3 ) = − − s, − s, s , s ∈ R.
25 25 25 25

Write down the vector form of the solution.

Solution. Writing out the components explicitly, the solution to the linear system is,

7 s 58 31
x1 = − − , x2 = − s, x3 = s.
25 25 25 25
Put this solution into a vector as follows,
   
x1 −(7/25) − (1/25)s
 x2  =  58/25 − (31/25)s 
   

x3 s
66 CHAPTER 2. VECTORS IN RN

Now pull apart the vector on the left using vector addition and scalar multiplication:
     
−(7/25) − (1/25)s −7/25 −(1/25)
 58/25 − (31/25)s  =  58/25  + s  −(31/25)  , s ∈ R.
     

s 0 1

The expression on the right hand side of the above equation is the vector form of the solution to the linear
system. ♦

Example 2.2.5

Express the solutions to the linear systems of Examples 1.3.8 and 1.3.9 in vector form.

Solution. The solution to the linear system in Example 1.3.8 was (x1 , x2 , x3 ) = (7/2, 1, 5/2). The vector
form of the solution is,
   
x1 7/2
 x2  =  1  .
   

x3 5/2
The solution to the linear system in Example 1.3.9 is

26 15 1 1 1 25
(x1 , x2 , x3 , x4 ) = − s+ t, − + s− t, s, t , s, t ∈ R.
11 11 11 11 11 11
To get the vector form, make x1 , x2 , x3 , and x4 components of a vector and split the vector apart into the
fixed part of the solution and one vector for each free variable. Doing so yields the following.

         
x1 26/11 − (15/11)s + (1/11)t 26/11 −15/11 1/11
 x2   −1/11 + (1/11)s − (25/11)t   −1/11   1/11   −25/11 
= =  + s  + t , s, t ∈ R.
         

 x3   s   0   1   0 
x4 t 0 0 1

The expression on the far right is the vector form of the solution. ♦

Definition 2.2.4: Fixed Part of Solution

The vector that is not multiplied by any free variables in the vector form of a solution to a linear
system with infinitely many solutions is called the fixed part of the solution.

In general, if the solution to a linear system has m free variables, then there are m + 1 vectors in the vector
form of the solution. There is one for each free variable and one for the fixed part. If the vector form only
has free variables then there is still m + 1 vectors in the vector form of the solution, but we usually only
write m of them down because the fixed part is the zero vector. If you like, you can write the zero vector in
the vector form but it isn’t necessary.
2.2. VECTORS IN EUCLIDEAN SPACE 67

Example 2.2.6

Suppose the solution to a linear system is (x1 , x2 , x3 , x4 ) = (2s + t, 3s, t, s), s, t ∈ R. The vector
form of the solution can be written as
       
x1 0 2 1
 x2   0   3   0 
=  + s  + t , s, t ∈ R,
       

 x3   0   0   1 
x4 0 1 0
or as      
x1 2 1
 x2   3   0 
 = s  + t , s, t ∈ R.
     

 x3   0   1 
x4 1 0
Either way is acceptable, though most references opt for the latter.

Example 2.2.7

Suppose a linear system has has the following solution:

√
(x1 , x2 , x3 , x4 , x5 , x6 , x7 , x8 ) = ( 2−14r+(2/3)s−13t, 3−πt, 62−(1/11)r−10t, r, 82−8s, 19r, s, t) r, s, t ∈ R.

Write the solution in vector form.

Solution. There are three free variables so the vector form of the solution has 4 vectors. The vector form
of the solution is
   √       
x1 2 −14 2/3 −13
x2 3 0 0 −π
         
         
         

 x3  
  62 


 −1/11 


 0 


 −10 

 x4   0   1   0   0 
= +r  + s  + t , r, s, t ∈ R. ♦
         
−8


 x5  
  82 


 0 






 0 


 x6  
  0 


 19 


 0 


 0 

x7 0 0 1 0
         
         
x8 0 0 0 1
68 CHAPTER 2. VECTORS IN RN

2.3 Linear Combinations and Vector Equations

In this section, we introduce linear combinations of vectors and vector equations. Linear combinations are a
bit abstract. The main thing to remember about linear combinations is the following: a linear combina-
tion is merely a special type of equation involving vectors.

Definition 2.3.1: Linear Combinations

Let ~v1 , ~v2 , . . . , ~vk ∈ Rn . A linear combination of the vectors ~v1 , ~v2 , . . . , ~vk , is a vector of the form

c1~v1 + c2~v2 + . . . + ck~vk = ~b ∈ Rn

where c1 , c2 , . . . , ck ∈ R are scalars. c1 , c2 , . . . , ck are called the coefficients of the linear combination.

Example 2.3.1

Let ~v1 , ~v2 ∈ Rn be any two vectors. The following are examples of linear combinations of ~v1 and ~v2 :

1) ~v1 + ~v2 ,

2) ~v1 − ~v2 ,
345
3) π~v1 − ~v2 ,
23
4) 0~v1 + 0~v2 = ~0.

Note

Notice that the vector ~b above is in Rn . This implies that any linear combination of vectors in Rn is
also a vector in Rn .

Linear combinations allow us to define vector equations.

Definition 2.3.2: Vector Equations

Let ~v1 , ~v2 , . . . , ~vk , ~b ∈ Rn be vectors and let x1 , x2 , . . . , xk be one-dimensional real variables. A vector
equation is an equation of the form

x1~v1 + x2~v2 + . . . + xk~vk = ~b.

A solution to a vector equation is an k-tuple of real numbers (c1 , c2 , . . . , ck ) such that when we make
the variable substitutions x1 = c1 , x2 = c2 , . . . , xk = ck , then the equation

c1~v1 + c2~v2 + . . . + ck~vk = ~b

is true.
2.3. LINEAR COMBINATIONS AND VECTOR EQUATIONS 69

The difference between a linear combination of vectors and a vector equation is that a vector equation
involves variables and a linear combination is a fixed vector in Rn . Moreover, it follows immediately from
the definitions of linear combinations and solutions of vector equations that the vector equation

x1~v1 + x2~v2 + . . . + xk~vk = ~b,

has a solution if and only if ~b is a linear combination of ~v1 , ~v2 , . . . , ~vk .

Every vector equation represents a linear system and, conversely, every linear system can be expressed using
a vector equation. Consider the linear system from Example 1.1.1,

x1 + x2 = 230
350x1 + 600x2 = 100, 000

Put both sides of these two equations into vectors.

" # " #
x1 + x2 230
=
350x1 + 600x2 100, 000

Since x1 and x2 are one-dimensional real variables they function like scalars. This allows us to split the
vector on the left apart using vector addition and scalar multiplication:
" # " # " # " # " #
x1 + x2 x1 x2 1 1
= + = x1 + x2 .
350x1 + 600x2 350x1 600x2 350 600

Thus, " # " # " #

1 1 230
x1 + x2 = ,
350 600 100, 000
which exactly represents the linear system. Reversing this process you can go from a vector equation to a
linear system.

Example 2.3.2

Express the following vector equation as a linear system

     
−1 10 −2
 2   7   1 
x1   + x2  = .
     
 0   −1   2 
1 −1 1

Solution. Combining the vectors on the left into one vector gives
   
−x1 + 10x2 −2
 2x1 + 7x2   1 
= .
   
0x1 − x2

   2 
x1 − x2 1
70 CHAPTER 2. VECTORS IN RN

Since the vectors are equal, the components of each must be equal. Writing out each equality yields the
following linear system.
−x1 + 10x2 = −2
2x1 + 7x2 = 1
−x2 = 2
x1 − x2 = 1 ♦

Example 2.3.3

Express the following linear system as a vector equation.

2x1 + 3x2 − x3 = 6
−x1 + x2 = 0
4x1 + x2 = 5

Solve the linear system and show that the solution is also a solution to the corresponding vector
equation.

Solution. Following the above, the linear system can be expressed as the following vector equation,

       
2 3 −1 6
x1  −1  + x2  1  + x3  0  =  0  . (2.1)
       

4 1 0 5

We use Solving a Linear System Using a Matrix to solve the linear system. From now on, I will skip many
of the steps and leave it to the reader to fill in the blanks. The augmented matrix for the linear system is

 
2 3 −1 6
A =  −1 1 0 0 .
 

4 1 0 5

The RREF of A is
 
1 0 0 1
A∼ 0 1 0 1 
 

0 0 1 −1

The solution to the linear system is (x1 , x2 , x3 ) = (1, 1, −1). This solution is exactly a solution to the vector
equation in Equation (2.1) because, when we make the corresponding substitutions, we get,
         
2 3 −1 (1)2 + (1)3 + (−1)(−1) 6
1  −1  + 1  1  + (−1)  0  =  (1)(−1) + (1)1 + (−1)(0)  =  0  . ♦
         

4 1 0 (1)4 + (1)1 + (−1)(0) 5

Let ~v1 , ~v2 . . . , ~vk ∈ Rn . The n × k matrix whose columns are the vectors ~v1 , ~v2 . . . , ~vk in that order is
denoted as [ ~v1 ~v2 . . . ~vk ] . This is shorthand notation to make things slightly more compact. For example,
2.3. LINEAR COMBINATIONS AND VECTOR EQUATIONS 71
   
1 2
if ~v1 =  2  and ~v2 =  1 , then
   

1 4
 
1 2
[ ~v1 ~v2 ] =  2 1 .
 

1 4

Example 2.3.3 shows the solution to a linear system is also a solution to the vector equation representing
the linear system. The converse to this is also true: The solution to a vector equation is a solution to the
linear system it represents. The next theorem summarizes this.

Theorem 2.3.1

Let ~v1 , ~v2 , . . . , ~vk , ~b ∈ Rn . Consider a vector equation

x1~v1 + x2~v2 + . . . + xk~vk = ~b. (2.2)

The solution set to this vector equation is the same as the solution set of the linear system whose
augmented matrix is, h i
A = ~v1 ~v2 . . . ~vk | ~b .

In particular, ~b can be written as a linear combination of ~v1 , ~v2 , . . . , ~vk if and only if the matrix A
corresponds to a consistent linear system.

Proof. Write    
v1i b1
 . 
~b =  ..  ,
 
~vi =  . 
 . , i = 1, 2, . . . , k,  . 
vni bn
so that,
 
v11 v12 ... v1k b1
v21 v22 ... v2k b2
h i  
~v1 ~v2 . . . ~vk | ~b
 
A= = .. .. .. .. .. .
.
 
 . . . . 
vn1 vn2 ... vnk bn
(c1 , c2 , . . . , ck ) is a solution to the vector equation (2.2) if and only if

c1~v1 + c2~v2 + . . . + ck~vk = ~b.

Equating components in this linear combination yields the following series of equations,

c1 v11 + c2 v12 + . . . + ck v1k = b1

c1 v21 + c2 v22 + . . . + ck v2k = b2
..
.
c1 vn1 + c2 vn2 + . . . + ck vnk = bn
72 CHAPTER 2. VECTORS IN RN

which shows that (c1 , c2 , . . . , ck ) is a solution to the linear system whose augmented matrix is A. Reversing
all of these steps shows that any solution to the linear system that has augmented matrix A is also a solution
to the vector equation. Therefore, the solution sets are equal. The last statement of the theorem is immediate
from the definitions of linear combinations and consistent linear systems.
2.4. MATRIX EQUATIONS 73

2.4 Matrix Equations

In this section, we introduce another equivalent way of representing linear systems. This one uses matrices.
First, we need to define how we multiply a matrix and a vector.

Definition 2.4.1: Matrix Vector Multiplication

Let A be an n × k matrix with columns  

v1
 v2 
 
k
~a1 , ~a2 , . . . , ~ak ∈ Rn , and let v̄ = 
 ..  ∈ R be a vector. The matrix vector multiplication of

 . 
vk
A and ~v , denoted A~v , is the n-dimensional vector
 
v1
 v2 
 
n
A~v = [ ~a1 ~a2 . . . ~ak ] ·  . 

 = v1~a1 + v2~a2 + . . . + vk~ak ∈ R .
 .. 
vk

Note
We only define matrix vector multiplication if ~v has as many components as A has columns. Therefore,
if A is n × k, then A~v is defined only if ~v ∈ Rk .

Example 2.4.1
 
" # 6
1 2 3
Let A = and ~v =  1 . Then,
 
2 10 3
2
 
" # 6 " # " # " # " # " #
1 2 3 1 2 3 6+2+6 14
A~v =  1 =6 +1 +2 = = .
 
2 10 3 2 10 3 12 + 10 + 6 28
2
74 CHAPTER 2. VECTORS IN RN

Example 2.4.2
 
1 2 " #
0
Let A =  2 1  and ~v = . Then,
 
1
1 8
       
1 2 " # 1 2 2
 0
A~v =  2 1  = 0 2  + 1 1  =  1 .
      
1
1 8 1 8 8

Example 2.4.3
 
" # 1
0 1
Let A = and ~v =  1  . Since ~v ∈ R3 , but A only has 2 columns, the matrix vector
 
1 1
1
multiplication A~v is not defined.

Matrix vector multiplication satisfies two very nice properties.

Theorem 2.4.1: Matrix Vector Multiplication Properties

Let A be an n × k matrix. Let ~u, ~v ∈ Rk and let r ∈ R. Then,

1. A(~u + ~v ) = A~u + A~v ;

2. A(r~v ) = r(A~v ).

Example 2.4.4

Let A be 2 × 3 and let ~u, ~v ∈ R3 be such that,

" # " #
−1 −5
A~u = , A~v = .
23 −4

Calculate,

i) A(~u + ~v )

ii) A(5~v )

iii) A(5~u − 3~v )

Solution.
2.4. MATRIX EQUATIONS 75

i) By part 1 of Matrix Vector Multiplication Properties,

A(~u + ~v ) = A~u + A~v .

Moreover, we’re given the values of A~u and A~v :

" # " # " #
−1 −5 −6
A(~u + ~v ) = A~u + A~v = + = ♦
23 −4 19

ii) By part 2 of Matrix Vector Multiplication Properties,

" # " #
−5 −25
A(5~v ) = 5(A~v ) = 5 = ♦
−4 −20

iii) Combining parts 1 and 2 of Matrix Vector Multiplication Properties:

" # " # " # " # " #
−1 −5 −5 15 10
A(5~u − 3~v ) = 5A~u − 3A~v = 5 −3 = + = ♦
23 −4 115 12 127

     
u1 v1 u1 + v1
 u2   v2   u2 + v2
     

Proof. Write ~u =  ..  and ~v =  .. . Then, ~u + ~v = 
    .. . Let ~a1 , ~a2 , . . . , ~ak denote the

 .   .   . 
uk vk uk + vk
columns of A, so A = [ ~a1 ~a2 . . . ~ak ] . Therefore,

 
u1 + v 1
u2 + v 2
 
 
A(~u + ~v ) = [ ~a1 ~a2 . . . ~ak ] 
 .. 

 . 
uk + vk

= (u1 + v1 )~a1 + (u2 + v2 )~a2 + . . . + (uk + vk )~ak by definition of matrix vector multiplication

= (u1~a1 + v1~a1 ) + (u2~a2 + v2~a2 ) + . . . + (uk~ak + vk~ak ) by part 3 of Properties of Vectors

= (u1~a1 + u2~a2 + . . . + uk~ak ) + (v1~a1 + v2~a2 + . . . + vk~ak ) by parts 1 and 4 of Properties of Vectors

= A~u + A~v by definition of matrix vector multiplication.

Part 2 is left as an exercise.

Exercise
Prove part 2 of Matrix Vector Multiplication Properties.

The linear system in Example 1.1.1 is

76 CHAPTER 2. VECTORS IN RN

x1 + x2 = 230
350x1 + 600x2 = 100, 000
As a vector equation, " # " # " #
1 1 230
x +y = .
350 600 100, 000
" # " #
1 1 x1
Let A = . Define the following vector of one dimensional variables ~x = and let
350 600 x2
" #
~b = 230
. Then, by definition of matrix vector multiplication, we can write the vector equation that
100, 000
represents this linear system as an equation involving matrix vector multiplication.
" #" # " #
1 1 x1 230
A~x = = = ~b.
350 600 x2 100, 000
This is an example of a matrix equation and it gives us another way of representing linear systems.

Definition 2.4.2: Matrix Equations

 
x1
x2
 
 
Let A be an n × k matrix and let ~x = 
 ..  be a vector comprised of one-dimensional variables.

 . 
xk
Let ~b ∈ Rn be a fixed vector. A matrix equation is an equation of the form

A~x = ~b.
 
v1
 v2
 

A solution to a matrix equation is a vector ~v =   ∈ Rk such that when we make the variable
 .. 
 . 
vk
substitution ~x = ~v , the equation A~v = ~b is true.

Matrix equations are useful because they provide a very compact way of writing down linear systems. More-
over, every matrix equation can be written as a linear system.

Example 2.4.5

Write the following linear system as a matrix equation.

x1 + 4x2 − ex3 = 1
√
(2/3)x1 + 6x3 = 2
−x2 + πx3 = 1
2.4. MATRIX EQUATIONS 77

Solution. First write the linear system as a vector equation.

       
1 4 −e 1
  √ 
x1  2/3  + x2  0  + x3  6  =  2  .
    

0 −1 π 1

Now use the definition of matrix vector multiplication to translate this into a matrix equation.
   

1 4 −e x1 1
  √ 
 2/3 0 6   x2  =  2 . ♦
 

0 −1 π x3 1
| {z } | {z } | {z }
A ~
x ~b

Example 2.4.6
   
2 1 −6 50  
x1
 0 1 2  
~b =  2 
A= , , ~x =  x2  .
    

 0 0 1   50 
x3
2 1 1 −1

Write the matrix equation A~x = ~b as a linear system.

Solution. We have    
2 1 −6   50
 0 1 2  x1  2 
A~x = ~b ⇐⇒    x2  =  .
    
 0 0 1   50 
x3
2 1 1 −1
Write this as a vector equation using the matrix vector product.
       
2 1 −6 50
 0   1   2   2 
x1   + x2   + x3  = .
       
 0   0   1   50 
2 1 1 −1

This represents the following linear system.

2x1 + x2 − 6x3 = 50
x2 + 2x3 = 2
x3 = 50
2x1 + x2 + x3 = −1 ♦

Note

Given a linear system written as a matrix equation A~x = ~b, the matrix A is always the coefficient
matrix for the corresponding
78 CHAPTER 2. VECTORS IN RN

Theorem 2.3.1 says the solution set of a linear system and the solution set of its corresponding vector equa-
tion are the same. It should come as no surprise that the solution set to the matrix equation representing a
linear system also coincides with the solution set to the linear system and conversely. The following theorem
ties all of this together.

Theorem 2.4.2
 
x1
x2
 
~
n n
 
Let A be an n × k matrix with columns ~a1 , ~a2 , . . . , ~ak ∈ R and let b ∈ R be fixed. Let ~x = 
 .. .

 . 
xk
Then, the solution set of the matrix equation A~x = ~b is the same as the solution set of the vector
equation
x1~a1 + x2~a2 + . . . + xk~ak = ~b

which is the same as the solution set of the linear system whose augmented matrix is
h i
~a1 ~a2 . . . ~ak | ~b .

Proof. We need only verify that the solution set of A~x = ~b is the same as the solution set of

x1~a1 + . . . + xk~ak = ~b

and the rest follows from Theorem 2.3.1. This follows immediately from the definition of matrix vector
multiplication.

Exercise
If the proof for Theorem 2.4.2 is not obvious, give a proof of it in order to convince yourself it is true.

We now have three equivalent ways to write down a linear system:

1. As a finite collection of linear equations;

2. As a vector equation;

3. As a matrix equation.

All three of these representations have the exact same solution set and, the beautiful part is that, regardless
of what representation we pick, we always find these solutions using 1. This also allows us to answer three
seemingly unrelated questions with the exact same method!

We summarize the equivalence of the solution sets of these three types of equations as follows.
2.4. MATRIX EQUATIONS 79

Theorem 2.4.3: Equivalence of Solutions

h i
Let ~a1 ~a2 . . . ~ak | ~b be an augmented matrix for some linear system with n equations in n vari-
ables. Asking if the linear system has a solution is equivalent to asking if ~b is a linear combination
of ~a1 , ~a2 , . . . , ~ak , which is equivalent to asking if the matrix equation A~x = ~b has a solution where
A = [ ~a1 ~a2 . . . ~ak ] .
80 CHAPTER 2. VECTORS IN RN

2.5 Spanning Sets

In this section, we introduce spanning sets. This is where the material starts to get a little more abstract so
I will reiterate: what we do in this section is not anything different than what we were doing before. All we
introduce is new language and new ways to state the same questions. The methods to solve these questions
remains unchanged.

We begin with a pair of examples.

Example 2.5.1

Consider the 3-dimensional vectors

     
−1 4 2
~v1 =  4  , ~v2 =  9  , and ~b =  1  .
     

3 −3 3

Can ~b be expressed as a linear combination of ~v1 and ~v2 ? If it can, write one down.

Solution. By the Equivalence of Solutions, asking if ~b his a linear icombination of ~v1 and ~v2 is the same as
asking if the linear system whose augmented matrix is ~v1 ~v2 | ~b has a solution. We know how to solve
this! Use Solving a Linear System Using a Matrix!

Start by forming the augmented matrix.

 
−1 4 2
A= 4 9 1 .
 

3 −3 3

Put A into echelon form by using Gauss-Jordan Elimination.

 
−1 4 2
 0 1 1 .
 

0 0 −16
The rightmost column of this echelon form is a pivot column. Therefore, the corresponding linear system
has no solution by The Solutions Theorem. This shows ~b can not be written as a linear combination of ~v1
and ~v2 . ♦

Note
Example 2.5.1 could be reworded as follows:
       
−1 4 2 −1 4
“Let ~v1 =  4  , ~v2 =  9  , ~b =  1  . Let A = [ ~v1 ~v2 ] =  4 9  . Does A~x = ~b have
       

3 −3 3 3 −3
2.5. SPANNING SETS 81

a solution?”

Example 2.5.2

9
Redo Example 2.5.1 with ~b =  14  .
 

−9

Solution. For this example, the augmented matrix we work with is,
 
−1 4 9
A= 4 9 14  .
 

3 −3 −9

The RREF of this matrix is

 
1 0 −1
A∼ 0 1 2 .
 

0 0 0

The rightmost column of the RREF of A is not a pivot column so the corresponding linear system is
consistent. Therefore, ~b is a linear combination of ~v1 and ~v2 . We now determine how to write ~b as a linear
combination of ~v1 and ~v2 . This means we must find scalars c1 , c2 ∈ R such that

c1~v1 + c2~v2 = ~b.

If this equation holds, then (c1 , c2 ) is a solution to the vector equation

x1~v1 + x2~v2 = ~b,

and we know the solution to this vector equation is the same as the solution to the corresponding linear
system. Reading from the RREF of A, we see that the solution to this linear system is (x1 , x2 ) = (−1, 2).
Therefore, we write ~b as a linear combination of ~v1 and ~v2 as follows.

−~v1 + 2~v2 = ~b.

Moreover, all of the columns to the left of the bar of the RREF are pivot columns. Therefore, there is only
one solution to the corresponding linear system. This means that the above is the only way we can write ~b
as a linear combination of ~v1 and ~v2 . ♦

Determining which vectors can be written as linear combinations of others and how to do it is one of the
main themes of this course. This motivates the following definition.
82 CHAPTER 2. VECTORS IN RN

Definition 2.5.1: Spanning Sets

Let ~v1 , ~v2 , . . . , ~vk ∈ Rn . The set of all linear combinations of the vectors ~v1 , ~v2 , . . . , ~vk is called the
span of ~v1 , ~v2 , . . . , ~vk and is denoted by

span {~v1 , ~v2 , . . . , ~vk } .

In set notation,

span {~v1 , ~v2 , . . . , ~vk } = {c1~v1 + c2~v2 + . . . + ck~vk : c1 , c2 , . . . , ck ∈ R} .

If ~v1 , ~v2 , . . . , ~vk ∈ Rn are such that,

span {~v1 , ~v2 , . . . , ~vk } = Rn ,

then we say the set {~v1 , ~v2 , . . . , ~vk } spans Rn . This means every vector in Rn is a linear combination
of the vectors ~v1 , ~v2 , . . . , ~vk .

Warning!
It is very important to remember that the span of a set of vectors is itself a set and not a single vector.
This is a common mistake.

This probably looks complicated. Remember: these definitions are only fancy language that allows us to
ask the same question in a different way, and nothing more. For example, we could have reworded Example
2.5.1 as follows.

     
2 −1 4
“Determine if the vector ~b =  1  is in the span of the two vectors ~v1 =  4  and ~v2 =  9  .”
     

3 3 −3

Using more compact notation, we could also reword Example 2.5.1 as the following.

     
−1 4 2
“Let ~v1 =  4  , ~v2 =  9  , and ~b =  1  . Is ~b ∈ span {~v1 , ~v2 }?”
     

3 −3 3

Perhaps this seems like overkill but we use all of this language throughout the course. Hence, it is important
to get familiar with all of these different phrasings and know exactly what they mean.
2.5. SPANNING SETS 83

2.5.1 Geometry of Spanning Sets in R2

In this section, we determine what a spanning set looks like in R2 . Of course, we can’t do this in Rn for
n ≥ 4 because we can’t visualize these Euclidean spaces.

n o n o
First consider span ~0 . If ~v ∈ span ~0 , then ~v = c~0 for some c ∈ R, from which it follows that ~v = ~0 itself.
n o n o n o
Therefore, span ~0 is always equal to the singleton set ~0 . That is, span ~0 is a single vector: the zero
vector, which is simply the origin.
Now suppose ~v ∈ R2 and ~v 6= ~0. Let ~u ∈ span {~v }. Then, by definition, there exists c ∈ R such that ~u = c~v .
Conversely, every scalar multiple of ~v is contained in span {~v }. Therefore, span {~v } is the set of all scalar
multiples of ~v . The following graph shows various scalar multiples of a single vector in R2 . Recall that all
vectors have the origin as their initial point.

2~v

~v
(1/2)~v
x1

−2~v

If we could continue indefinitely, we’d see that every scalar multiple of ~v lies on the line through the origin
that is parallel to ~v . Since each of these vectors can be identified with a point in R2 , we interpret span {~v }
as the line through the origin parallel to ~v .

It should be clear from the above that there are always vectors in R2 that are not in span {~v }. That is, there
is no vector ~v ∈ R2 such that span {~v } = R2 . In other words, a single vector ~v never spans R2 .

Now suppose ~u, ~v ∈ R2 and ~u is not a scalar multiple of ~v . Then, given any other vector w ~ in R2 , we can
multiply ~u and ~v by appropriate scalars, say c1 , c2 ∈ R, such that w
~ is the fourth vertex of the parallelogram
whose three other vertices are the origin, c1 ~u and c2~v . Thus, by the parallelogram law, w~ = c1 ~u + c2~v . This
2 2
means span {~u, ~v } = R so that ~u and ~v span R . We’ll prove this formally in the next section. In the
meantime, an example of this is shown below.
84 CHAPTER 2. VECTORS IN RN

c1 ~u + c2~v

c1~v

~v
c2~v x1

In summary, spans of vector in R2 are either a single point, which is the span of the zero vector, a line
through the origin, or all of R2 itself.

In R3 , the situation is similar. If ~v ∈ R3 is non-zero, then span {~v } is interpreted as a line in R3 through the
origin that is parallel to ~v . If ~u, ~v ∈ R3 are not scalar multiples of one another, then span {~u, ~v } is not equal
to R3 , but it is interpreted as a plane through the origin that contains both of ~u and ~v . This shouldn’t be
completely unexpected because the span of two vectors in R2 that are not scalar multiples of one another is
all of R2 , and we can interpret R2 as a plane in R3 . Finally, if ~u, ~v , and w~ are vectors in R3 , none are scalar
multiples of one another, and the 3 vectors don’t lie the same plane, then span {~u, ~v , w} ~ = R3 . All of these
assertions will be proved in the next section.

2.5.2 The Span Theorem

Given vectors ~v1 , ~v2 , . . . , ~vk ∈ Rn , it is natural to ask if a given vector ~b is in the span of ~v1 , ~v2 , . . . , ~vk .
Examples 2.5.1 and 2.5.2 are examples of this type of problem. A more general question asks if every vector
in Rn is in span {~v1 , ~v2 , . . . , ~vk }. We develop a method for answering this question in this section.

Example 2.5.3
 
2 4 4
Let A =  3 0 3  . Does the equation A~x = ~b have a solution for any ~b ∈ R3 ?
 

1 2 2

Note
This question could have been asked in either of the following two ways.
     
2 4 4
“Let ~v1 =  3  , ~v2 =  0  , ~v3 =  3  . Does {~v1 , ~v2 , ~v3 } span R3 ?”
     

1 2 2
2.5. SPANNING SETS 85

 
2 4 4
“Let A =  3 0 3 . Do the columns of A span R3 ?”
 

1 2 2

 
b1
Solution. Let ~b =  b2  be an arbitrary vector in R3 . The solution to A~x = ~b is the same as the solution
 

b3
to the linear system whose augmented matrix is
 
2 4 4 b1
B= 3 0 3 b2  .
 

1 2 2 b3

Use Gauss-Jordan Elimination to put B into echelon form.

 
R1 ⇐⇒ R3 1 2 2 b3
R2 ⇒ R2 − 3R1 : B∼ 0 −6 −3 b2 − 3b3  .
 

R3 ⇒ R3 − 2R1 0 0 0 b1 − 2b3

By The Solutions Theorem, A~x = ~b is consistent if and only if the rightmost column of this matrix is not a
pivot column. This happens exactly when b1 − 2b3 = 0, so that b1 = 2b3 , then the rightmost column is not
a pivot column. However, vectors in R3 do not necessarily satisfy  this
 condition. There are many vectors
1
in R3 whose first component is not twice the third, for example  1 . For all such vectors, the rightmost
 

1
column is a pivot column and, consequently, the matrix equation doesn’t have a solution. Therefore, the
answer is no, A~x = ~b does not have a solution for all ~b ∈ R3 . ♦

Let’s look a little closer at the echelon form matrix in Example 2.5.3:
 
1 2 2 b3
 0 −6 −3 b2 − 3b3  .
 

0 0 0 b1 − 2b3

The only thing preventing A~x = ~b from being solvable for any value of ~b is the bottom row of zeroes to the
left of the bar. If the third row contains a pivot to the left of the bar, then we can find a solution to A~x = ~b
for any value of ~b ∈ R3 . Moreover, the converse also holds. That is, if A~x = ~b has a solution, then A has a
pivot in each row. This is summarized in the following important theorem (next page).
86 CHAPTER 2. VECTORS IN RN

Theorem 2.5.1: The Span Theorem

Let ~v1 , ~v2 , . . . , ~vk ∈ Rn and let A = [ ~v1 ~v2 . . . ~vk ] be the n × k matrix whose columns are
~v1 , ~v2 , . . . , ~vk . The following are equivalent (this means that if one of the statements is true, then
they are all true, and if one of the equations is false then they are all false):

1. For every ~b ∈ Rn , the equation A~x = ~b has a solution;

2. The columns of A span Rn . This means every vector ~b ∈ Rn is a linear combination of the
columns of A. Equivalently, {~v1 , ~v2 , . . . , ~vk } spans Rn .

3. Any echelon form of A has a pivot in every row.

This is a most excellent result. It gives an easy test to see if a set of vectors in Rn spans Rn . Simply make
the vectors the columns of a matrix, row reduce the matrix to echelon form, and count pivots! We do some
examples before we give a proof.

Example 2.5.4

Determine if any of the following sets of vectors span R4 :

     

 −2 −2 3  
 1   −1   1 
 
i) S1 =  , , 
     

  0   1   0  
 
1 1 3
 

       

 −2 0 1 −2 


 1   4 
  3   1 
ii) S2 =  , , ,
       
−1




 0   0   0   


1 1 1 1
 

       
 −2


1 −2 −3 

 1   0   0   1 

iii) S3 =  , , ,
       

 0   0
   2   2 

 
1 1 1 3
 

Solution. All three questions can be answered by forming the matrix whose columns are the vectors in each
set, row reducing to echlon form, and counting pivot rows.

i) We have
   
−2 −2 3 −2 −2 3
 1 −1 1   0
  −2 5/2 
A1 =  ∼ .
 
 0 1 0   0 0 5/4 
1 1 3 0 0 0
The bottom row contains no pivots. Therefore, by The Span Theorem, the vectors in S1 do not span
2.5. SPANNING SETS 87

R4 . ♦

ii) We have
   
−2 0 1 −2 −2 0 1 −2
 1 4 3 1   0 4 7/2 0 
A2 =  ∼ .
   
 0 0 0 −1   0 0 5/8 0 
1 1 1 1 0 0 0 −1

There is a pivot in every row. Therefore, by The Span Theorem, the vectors in S2 do span R4 . ♦

iii) We have
   
−2 1 −2 −3 −2 1 −2 −3
1 0 0   0 1/2 −1 −1/2 
1 
  
A3 =  ∼ .

 0 0 2 2   0 0 2 2 
1 1 1 3 0 0 0 0
The bottom row does not have a pivot. Therefore, by The Span Theorem, the vectors in S3 do not
span R4 . ♦

Proof. To prove a chain of equivalences like this, first prove that 1 implies 2, then that 2 implies 3, and
finally that 3 implies 1. The Equivalence of Solutions already shows 1 implies 2. Therefore, we only need to
prove that 2 implies 3 and that 3 implies 1.

2 =⇒ 3: We prove this using contraposition. This means we assume the negation of 3 and show that it
implies the negation of 2.

To this end, suppose B does not have a pivot in every row. We must show there is a vector in Rn that is
not a linear combination of the columns of A. Since B doesn’t have a pivot in every row, the bottom row of
 
0
 0 
 
 .  n
B contains only zeroes by definition of echelon form. Augment B with the vector ~u =  . 
 .  ∈ R where
 
 0 
1
0
there are n − 1 zeroes above the 1. Denote this matrix by B = [ B | ~u ] .

Since A ∼ B, there is a sequence of elementary row operations that transform A into B. If we apply the
opposite of each of these row operations in the reverse order to B, then we transform B into A. Applying
this procedure to B 0 , we get a matrix of the form A0 = [ A | w ~ ] where w ~ is some vector in Rn . Since B 0 is
an echelon form of A0 , and B 0 contains a row of the form [ 0 . . . 0 | 1 ], The Solutions Theorem implies the
linear system with augmented matrix [ A | w ~ ] is inconsistent. By Equivalence of Solutions, w ~ is not a linear
combination of ~a1 , ~a2 , . . . , ~ak .
88 CHAPTER 2. VECTORS IN RN

3 =⇒ 1: Suppose B has a pivot in each row. Pick any ~b ∈ Rn and form the augmented matrix A0 = [ A | ~b ].
Apply the sequence of elementary row operations required to turn A into B to get a matrix of the form
B 0 = [ B | ~c ] where ~c ∈ Rn . This matrix is an echelon form of A0 and, since B has a pivot in every row,
B 0 contains no row of the form [ 0 . . . 0 | m ] where m 6= 0. By The Solutions Theorem, the linear system
whose augmented matrix is A0 is consistent, and so the Equivalence of Solutions implies the matrix equation
A~x = ~b has a solution. Since ~b ∈ Rn was arbitrary, we have shown that A~x = ~b has a solution for every
~b ∈ Rn .

There is a way to immediately determine if a set of vectors does not span Rn .

Corollary 2.5.1

Let ~v1 , ~v2 , . . . , ~vk ∈ Rn be vectors. If k < n, then span {~v1 , ~v2 , . . . , ~vk } =
6 Rn ; that is, if there are fewer
vectors in a set than components in each vector, that set of vectors does not span Rn .

Example 2.5.5

Explain why the following set of vectors does not span R3 :

   
 −1
 0  
S =  2 , 0 
   
 
1 1
 

Solution. We have a set of two vectors each of which have three comopnents. Since there are fewer vectors
than components in each vector, Corollary 2.5.1 implies that S does not span R3 . ♦

Proof. Form the n × k matrix A = [ ~v1 ~v2 . . . ~vk ]. Let B be an echelon form of A. If k < n, then B has
more rows than columns. Since B is in echelon form, it can have at most one pivot in each column. As there
are more rows than columns, B can not have a pivot i every row. Therefore, by The Span Theorem, the
columns of A, {~v1 , ~v2 , . . . , ~vk }, do not span Rn .

Unfortunately, if you have a set with at least as many vectors as their are components in each vector, there
is typically no way to tell whether or not that set spans Rn just by looking at the vectors (unless you are
really really good at mental calculation). There is one exception to this rule. Suppose you have some vectors
~v1 , ~v2 , . . . , ~vk ∈ Rn and assume they all have a zero in the same component. Then, the set doesn’t span Rn .
This is because any vector that has a non-zero entry in that component will never be a linear combination
of ~v1 , ~v2 , . . . , ~vk .
2.5. SPANNING SETS 89

Example 2.5.6

Explain why the following set of vectors does not span R3 .

     
 −1
 0 2 
S =  0 , 0 , 0 
     
 
1 1 1
 

Find a vector in R3 that is not in S.

Solution. S does not span R3 because every vector in S has a zero in the second component. Therefore,
any linear combination of these vectors has a zero in the second component. This means all vectors in R3
with a non-zero second component can not be written as a linear combination of the vectors in S, hence is
0
not in the span of S. An example of such a vector is  1  . Therefore, S does not span R3 . ♦
 

0
90 CHAPTER 2. VECTORS IN RN

2.6 Linear Independence

In this section, we introduce linear independence. This is a very subtle yet very important concept and, like
spanning, it is a concept that is used in many different areas of mathematics in a variety of contexts.

2.6.1 The Homogeneous Equation

One way to define linear independence is in terms of solutions to a special matrix equation called the ho-
mogeneous equation.

Definition 2.6.1: The Homogeneous Equation

Let A be an n × k matrix. The homogeneous equation is the matrix equation A~x = ~0.

The homogeneous equation is always consistent. Certainly, if A = [ ~a1 ~a2 . . . ~an ] and we substitute ~x = ~0,
then

A~0 = 0 · ~a1 + 0 · ~a2 + . . . + 0 · ~an = ~0 + ~0 + . . . + ~0 = ~0.

| {z } | {z } | {z }
=~
0 =~
0 =~
0

This shows ~0 is always a solution to the homogeneous equation, which means that the homogeneous equation
is always consistent. Therefore, there are only two possibilities for solution sets of the homogeneous equation:

1. The homogeneous equation has exactly one solution ~x = ~0. This solution is called the trivial solution.

2. The homogeneous equation has infinitely many solutions. A non-zero solution is referred to as a
non-trivial solution. Note that in this case, the solution to the matrix equation will contain a free
variable.
h i
To solve A~x = ~b, we form the augmented matrix A | ~b and follow Solving a Linear System Using a Ma-
h i
trix. For the homogeneous equation, ~b = ~0, so we form the augmented matrix A | ~0 . The elementary row
operations never change the last column because it is full of zeroes. Therefore, when we solve homogeneous
equations, we don’t usually augment A with ~0 and instead, we apply Solving a Linear System Using a Matrix
to the coefficient matrix A itself. If you really want to, you can still augment with the column of zeroes, but
going forward I will not.

Example 2.6.1

Determine if the following homogeneous linear system has a non-trivial solution.

3x1 + 10x2 + 8x3 = 0

4x1 + 4x3 = 0
x1 + 2x2 + 2x3 = 0

Write the solution to the linear system in vector form.

2.6. LINEAR INDEPENDENCE 91

Solution. The coefficient matrix A for the linear system is

 
3 10 8
A= 4 0 4 .
 

1 2 2

The RREF of this matrix is

 
1 0 1
A∼ 0 1 1/2  .
 

0 0 0
Since column 3 is a non-pivot column, x3 is a free variable by The Solutions Theorem. Thus, the linear
system has infinitely many solutions and, hence, a non-trivial solution. To get the vector form, we look at
the RREF of the matrix. The RREF represents the following linear system.

x1 + x3 = 0
x2 + (1/2)x3 = 0.
Rearranging, we get

x1 = −x3

x2 = −(1/2)x3 .

Letting x3 = s ∈ R, the vector form of the solution is as follows.

     
x1 −s −1
 x2  =  −(1/2)s  = s  −1/2  , s ∈ R. ♦
     

x3 s 1

Example 2.6.2
 
2 0 1 1
 −1 2 3 −1 
Let A =  . Determine if A~x = ~0 has a non-trivial solution. Write the solution to
 
 0 10 1 5 
1 1 2 1
~
A~x = 0 in vector form.

Solution. The RREF of A is

 
1 0 0 0
 0 1 0 0 
A∼ .
 
 0 0 1 0 
0 0 0 1

Every column in the RREF of A is a pivot column. Therefore, A~x = ~0 has exactly one solution, the trivial
solution. Thhis shows A~x = ~0 has no non-trivial solution. The vector form of the solution is
92 CHAPTER 2. VECTORS IN RN

   
x1 0
 x2   0 
 ~
=  = 0. ♦
  

 x3   0 
x4 0

Let A be n × k and ~b ∈ Rn . Suppose A~x = ~b is consistent. The number of solutions to A~x = ~b is related to
the number of solutions to A~x = ~0. In fact, all you need is one particular solution to A~x = ~b and the whole
solution to A~x = ~0 in order to get every solution to A~x = ~b.

Theorem 2.6.1

Let A be an n × k matrix and let ~b ∈ Rn be a fixed vector. Suppose A~x = ~b is consistent. Let ~vp be
a fixed solution to A~x = ~b. Then, every solution of A~x = ~b can be written in the form w
~ = ~vp + ~vh
where ~vh is a solution to the homogeneous equation A~x = ~0.

Proof. Since A~x = ~b is consistent, it has at least one solution. Denote this solution by ~vp . Let w
~ be any
~ ~ − ~vp . Then,
solution to A~x = b (this could be ~vp itself). Define ~vh = w

~ − A~vp = ~b − ~b = ~0.
~ − ~vp ) = Aw
A~vh = A(w

Therefore, ~vh is a solution to the homogeneous equation A~x = ~0. Finally,

w ~ − ~vp + ~vp = ~vh + ~vp

~ =w

which completes the proof.

Side Note
A similar theorem holds for solving first order partial differential equations, and the proof in that
setting is the same as it is here! Cool!

The solutions ~vp and ~vh of Theorem 2.6.1 are called particular and homogeneous solutions respectively.
This idea of writing all solutions to a system as a sum of a fixed particular solution and a solution the
corresponding homogeneous system appears in lots of areas of mathematics. It is a useful technique because,
in general, solving a homogeneous system tends to be easier than solving a non-homogeneous one.

2.6.2 Linear Independence of Vectors

The main question of interest in regards to the homogeneous equation is whether or not the solution to it is
unique or not. This is how we define linear independence.
2.6. LINEAR INDEPENDENCE 93

Definition 2.6.2: Linear Independence

Let ~v1 , ~v2 , . . . , ~vk ∈ Rn and define the n × k matrix A = [ ~v1 ~v2 . . . ~vk ]. The set of vectors
{~v1 , ~v2 , . . . , ~vk } is called linearly independent if the homogeneous equation A~x = ~0 has only the
trivial solution. If the homogeneous equation has infinitely many solutions, then {~v1 , ~v2 , . . . , ~vk } is
called linearly dependent.

Solutions to matrix equations and solutions to the corresponding vector equations are the same by Equiva-
lence of Solutions. Therefore, an equivalent characterization for linear independence is if the vector equation,

x1~v1 + x2~v2 + . . . + xk~vk = ~0,

has only the solution x1 = x2 = . . . = xk = 0, and {~v1 , ~v2 , . . . , ~vk } is linearly dependent otherwise. Most
references define linear independence in terms of vector equations. This is because linear independence gen-
eralizes to a lot of different settings beyond matrices.

Example 2.6.3
      

 3 10 8 
The set of vectors ~v1 =  1  , ~v2 =  0  , ~v3 =  1  ⊆ R3 is linearly dependent because
     
 
1 2 2
 

          
3 10 8 −1 3 10 8 −3 − 5 + 8
 1
 1 0 1   −1/2  = −  1 −  0 + 1  = −~v1 −(1/2)~v2 +v~3 =  −1 + 1  = ~0.
        
2
1 2 2 1 1 2 2 −1 − 1 + 2

In the previous example, we wrote the zero vector as a non-trivial linear combination of non-zero vectors.
This is an example of a linear dependence relationship.

Definition 2.6.3: Linear Dependence Relationships

Let {~v1 , ~v2 , . . . , ~vk } ⊆ Rn be a set of non-zero vectors. A linear dependence relationship for
{~v1 , ~v2 , . . . , ~vk } is a linear combination of the form

c1~v1 + c2~v2 + . . . + ck~vk = ~0

where at least one of the ci ’s is not zero.

Linear dependence relationships can be found by calculating a non-trivial solution to the homogeneous equa-
tion.
94 CHAPTER 2. VECTORS IN RN

Example 2.6.4

Determine if the following set of vectors is linearly independent.

      

 0 3 5 
~v1 =  4  , ~v2 =  0  , ~v3 =  2 
     
 
1 1 1
 

If it is linearly dependent, write down a linear dependence relationship for the vectors.

 
0 3 5
Solution. Let A =  4 0 2  . The RREF of A is
 

1 1 1
 
1 0 0
A∼ 0 1 0 .
 

0 0 1

This shows x1 = x2 = x3 = 0 is the only solution to the homogeneous equation A~x = ~0. Therefore, the set
{~v1 , ~v2 , ~v3 } is linearly independent. ♦

Example 2.6.5

Determine whether or not the following set of vectors is linearly independent.

      

 0 3 −3  
~v1 =  4  , ~v2 =  0  , ~v3 =  4 
     
 
1 1 0
 

If it is linearly dependent, write down a linear dependence relationship for the vectors.

 
0 3 −3
Solution. Form the matrix A =  4 0 4 . We must determine the number of solutions to the
 

1 1 0
~
homogeneous equation A~x = 0. The RREF of A is
 
1 0 1
A∼ 0 1 −1  .
 

0 0 0

Since there is a non-pivot column, the The Solutions Theorem implies the homogeneous equation A~x = ~0
has infinitely many solutions. Therefore, the set of vectors {~v1 , ~v2 , ~v3 } is linearly dependent.

To find a linear dependence relationship, we need to find a non-trivial solution to A~x = ~0. Start by writing
out the vector form of the solution to A~x = ~0. The linear system corresponding to the RREF is
x1 + x3 = 0 ⇒ x1 = −x3
x2 − x3 = 0 ⇒ x2 = x3
2.6. LINEAR INDEPENDENCE 95

Therefore, the vector form of the solution to A~x = ~0 is

   
x1 −1
 x2  = s  1  , s ∈ R.
   

x3 1

Picking any non-zero value of s yields a non-trivial solution to the homogeneous equation. Taking s = 1 say,
we have
 
−1
A  1  = ~0 =⇒ −~v1 + ~v2 + ~v3 = ~0
 

The equation,

−~v1 + ~v2 + ~v3 = ~0

is a linear dependence relationship for {~v1 , ~v2 , ~v3 } . We could pick any non-zero value of s we want to get a
linear dependence relationship. For example, if s = 2,

−2~v1 + 2~v2 + 2~v3 = ~0

is another linear dependence relationship. ♦

Note
There exist infinitely many linear dependence relationships for a set of linearly dependent vectors.

The Span Theorem provides an easy way of checking if the columns of an n × k matrix A span Rn : count
pivot rows in the echelon form of a matrix. There is a similar way to check if the columns of a matrix are
linearly independent.

Theorem 2.6.2: The Linear Independence Theorem

Let ~v1 , ~v2 , . . . , ~vk ∈ Rn and let A = [ ~v1 ~v2 . . . ~vk ] be the n × k matrix whose columns are
~v1 , ~v2 , . . . , ~vk . The following are equivalent.

1. The homogeneous equation A~x = ~0 has only the trivial solution;

2. The columns of A are linearly independent; that is, {~v1 , ~v2 , . . . , ~vk } is a linearly dependent set.

3. Any echelon form of A has a pivot in every column.

96 CHAPTER 2. VECTORS IN RN

Example 2.6.6

Determine if the following set of vectors is linearly independent,

      

 −1 7 −10 

  2   −2   −2 
~v1 =   , ~v2 =   , ~v3 = 
     
 −10 


  0   1  
 
1 0 1
 

 
−1 7 −10
 2 −2 −2 
Solution. Form the matrix A =  . An echelon form of A is,
 
 0 1 −10 
1 0 1
 
−1 7 −10
 0 12 −22 
A∼ .
 
 0 0 −49/6 
0 0 0

Each column in this matrix contains a pivot. Therefore, the vectors {~v1 , ~v2 , ~v3 } are linearly independent by
The Linear Independence Theorem. ♦

Proof. It is clear from the definition of linear independence that 1 implies 2.

2 =⇒ 3: Suppose the columns of A are linearly independent. By definition, the matrix equation A~x = ~0
has only the trivial solution. Therefore, the solution to the homogeneous equation does not contain a free
variable. Thus, The Solutions Theorem implies every echelon form of A has a pivot in every column.
h i
3 =⇒ 1: Suppose B has a pivot in every column. Form the augmented matrix A0 = A | ~0 . An echelon
h i
form of A0 is B 0 = B | ~0 . Since B has a pivot in every column, every column to the left of the bar of
B 0 is a pivot column. Therefore, the matrix equation A~x = ~0 has only the trivial solution by The Solutions
Theorem.

2.6.3 Special Cases of Linear Independence

Corollary 2.5.1 provides a quick way of determining whether a set of vectors in Rn spans Rn or not. There
are similar results for linear independence that follow as corollaries to The Linear Independence Theorem.
In this section, we explore some of them.

Corollary 2.6.1

A single vector {~v } ⊆ Rn is always linearly independent unless ~v = ~0.

2.6. LINEAR INDEPENDENCE 97

Proof. Suppose ~v 6= ~0 and consider the vector equation

   
x1 v 1 0
 x1 v 2 0
   
x1~v = ~0 =⇒ 
  
= .
 ..   .. 
 .   . 
x 1 vn 0

Since ~v 6= ~0, at least one of v1 , v2 , . . . , vn is not zero. Thus, the only way all the components of the vector
on the left can be zero is if x1 = 0. This shows the vector equation x1~v = ~0 has only the trivial solution so
{~v } is linearly independent. On n theoother hand, if ~v = ~0, it is clear that the vector equation x1~v = ~0 has
infinitely many solutions so that ~0 is linear dependent.

Corollary 2.6.2

A set of two non-zero vectors {~v1 , ~v2 } ⊆ Rn is linearly dependent if and only if ~v1 is a scalar multiple
of ~v2 . If one of ~v1 or ~v2 are ~0 then the set is linearly dependent.

Example 2.6.7

The set of vectors, (" # " #)

−1 2
,
2 −4
is linearly dependent because the vectors are scalar multiples of one another.

Proof. Define the matrix A = [ ~v1 ~v2 ] and let B be an echelon form of A. If one of ~v1 or ~v2 is the zero
vector, then it is clear that the corresponding column of B will also be zero, and hence B will not have a
pivot in every column. Therefore, if one of the vectors is the zero vector, then The Linear Independence
Theorem implies that ~v1 and ~v2 are linearly dependent.

Now assume both ~v1 and ~v2 are non-zero and that ~v1 and ~v2 are linearly dependent. Then, the " matrix
# equation
" #
w1 0
A~x = ~0 has a non-trivial solution. Since A is n×2, the solution is a vector in R2 , call it w
~= 6= .
w2 0
Then,

~ = ~0.
Aw
Rewriting this as a vector equation, we get

w1~v1 + w2~v2 = ~0

so that, upon rearranging, we get

w1~v1 = −w2~v2 .
If w1 = 0, then
w2~v2 = ~0.
98 CHAPTER 2. VECTORS IN RN

~ 6= ~0, it must be the case that w2 6= 0. Then, the above equation only holds if ~v2 = ~0 which is not
Since w
possible. Therefore, w1 6= 0, and similarly w2 6= 0. Hence,

w2
~v1 = − ~v2
w1

so that ~v1 is a scalar multiple of ~v2 .

Conversely, suppose ~v1 is a scalar multiple of ~v2 . Say ~v1 = c~v2 for some non-zero c ∈ R. Rearranging this
vector equation yields
~v1 − c~v2 = ~0
" #
1
which shows that ~u = is a non-trivial solution to A~x = ~0. Therefore, ~v1 and ~v2 are linearly depen-
−c
dent.

In view of the previous two results, it is easy to tell when sets of 2 or fewer vectors are linearly independent.
For sets of three or more vectors, we generally have to use row reduction and count pivot columns to de-
termine linear independence. However, there are some special cases where we can quickly determine linear
dependence.

Corollary 2.6.3

Let {~v1 , ~v2 , . . . , ~vk } ⊆ Rn . If one of the ~vi ’s is ~0, then the set is linearly dependent.

Exercise
Prove Corollary 2.6.3.

The next result is similar to Corollary 2.5.1.

Corollary 2.6.4

Let {~v1 , ~v2 , . . . , ~vk } ⊆ Rn . If k > n, then {~v1 , ~v2 , . . . , ~vk } is linearly dependent. That is, if you have
a set with more vectors than there are components in the vectors, then that set of vectors is linearly
dependent.

Example 2.6.8

The following is a linearly dependent set of vectors,

       
 −1
 −4 23 −2 
 0  ,  42  ,  4  ,  1 
       
 
0 23 10 3
 
2.6. LINEAR INDEPENDENCE 99

Proof. Let A = [ ~v1 ~v2 . . . ~vk ] . Then A has more columns than rows. Since there can be at most one pivot
in each row, an echelon form of A can have at most k pivots. Therefore, it is impossible for A to have a pivot
in each column, and so the set of vectors {~v1 , ~v2 , . . . , ~vk } is linearly dependent by The Linear Independence
Theorem.

Warning!
If the set of vectors contains fewer vectors than there are components in each vector, this does not
mean the set is linearly independent. All you can conclude is that if there are more vectors than
there are components, then the set is linearly dependent.

The next result provides a complete characterization of linearly independent set.

Theorem 2.6.3

Let S = {~v1 , ~v2 , . . . , ~vk } ⊆ Rn be a set of vectors with k ≥ 2. Then, S is a linearly dependent set
if and only if at least one of the vectors in S is a linear combination of the others. In fact, if S is
linearly dependent and ~v1 6= ~0, then some ~vj with j > 1 is a linear combination of the preceding
vectors ~v1 , . . . , ~vj−1 . In other words, if the vectors in S are linearly dependent, then one of the vectors
in S can be written as a linear combination of all the vector preceding it.

Proof. Define A = [ ~v1 ~v2 . . . ~vk ] .

First suppose ~vj is a linear combination of the other vectors for some j = 1, . . . , k. If ~vj = ~0, then Corollary
2.6.3 implies S is linearly dependent. If ~vj 6= 0, then there exist scalars c1 , c2 , . . . , cj−1 , cj+1 , . . . , ck , not all
zero, such that
c1~v1 + c2~v2 + . . . + cj−1~vj−1 + cj+1~vj+1 + . . . + ck~vk = ~vj .
Subtracting ~vj on both sides gives

c1~v1 + c2~v2 + . . . + cj−1~vj−1 − ~vj + cj+1~vj+1 + . . . + ck~vk = ~0. (2.3)

Since at least one of the scalars in Equation (2.3) is non-zero, this implies
 
c1
 . 
 .. 
 
 cj−1 
 
k
 
 −1  ∈ R
~u =  
 cj+1 
 
 . 
 . 
 . 
ck

is a non-trivial solution to the homogeneous equation A~x = ~0. Therefore, {~v1 , ~v2 , . . . , ~vk } is linearly depen-
dent.
100 CHAPTER 2. VECTORS IN RN

Conversely, suppose that S is linearly dependent. If ~v1 = ~0, then S is linearly dependent by Corollary 2.6.3.
Hence, suppose ~v1 6= ~0. Since S is linearly dependent, there exist scalars d1 , d2 , . . . , dk ∈ R, not all zero,
such that
d1~v1 + d2~v2 + . . . + dk~vk = ~0.

Let j be the largest subscript for which dj 6= 0. Then j =

6 1 lest we have di = 0 for all i = 2, . . . , n. This
would imply
d1~v1 = ~0 =⇒ ~v1 = ~0

which runs contrary to assumption. Therefore, j > 1 and, we have

d1~v1 + d2~v2 + . . . + dj ~vj = ~0 =⇒ ~vj = −(d1 /dj )~v1 − (d2 /dj )~v2 − . . . − (dj−1 /dj )~vj−1

which completes the proof.

This theorem tells us that if we have a set of vectors {~v1 , ~v2 , . . . , ~vk } in Rn , and one of the vectors is a linear
combination of the others, then the set is necessarily linearly dependent and vice versa.

Warning!
Theorem 2.6.3 does not guarantee that all of the vectors in the set are linear combinations of the
others. It only guarantees that at least one is.

The following provides some examples of using everything we’ve seen in this section so far.

Example 2.6.9

Determine whether the following sets of vectors are linearly independent or linearly dependent.
(" # " #)
1 3
i) S1 = , ⊆ R2
2 1
     
 2
 1 0  
ii) S2 =  3  ,  0  ,  0  ⊆ R3
     
 
1 0 0
 

         

 3 2 1/2 5 0 

 
 2   2
    π  
  12   1 
iii) S3 =  , , , ,  ⊆ R4
     



 1   2   2   324/5   0 


1/2 1 7 1 0
 

     
 2
 1 5  
iv) S4 =  4  ,  2 , 2  ⊆ R3
     
 
1 1/2 1
 
2.6. LINEAR INDEPENDENCE 101

Solution. S1 is a set of two vectors and they are not scalar multiples of one another. Thus, by Corollary
2.6.2, S1 is linearly independent. S2 contains the zero vector, so is linearly dependent by Corollary 2.6.3.
There are more vectors in S3 then there are components in each vector. Therefore, S3 is linearly dependent
by Corollary 2.6.4. The first two vectors in S4 are scalar multiples of one another, hence one is a linear
combination of the other two. Therefore, Theorem 2.6.3 implies S4 is linearly dependent. ♦

We end by giving an example of a proof involving linear independence.

Example 2.6.10

Let {~v1 , ~v2 , ~v3 } ⊆ Rn be linearly dependent. Show that {~v1 , ~v2 , ~v3 , ~v4 } is also linearly dependent for
any ~v4 ∈ Rn .

Solution. Let A = [ ~v1 ~v2 ~v3 ~v4 ]. Since the set {~v1 , ~v2 , ~v3 } is linearly dependent, by definition, there exist
scalars c1 , c2 , c3 not all zero such that,

c1~v1 + c2~v2 + c3~v3 = ~0.

Now add 0 · ~v4 to both sides. Since 0 · ~v4 = ~0, this doesn’t change the right hand side so we get,
 
c1
 c2 
c1~v1 + c2~v2 + c3~v3 + 0 · ~v4 = ~0 =⇒ A 
 ~
 = 0.

 c3 
0
 
c1
 c 
 2 
This shows that ~x =   is a non-trivial solution to the homogeneous equation A~x = ~0, which means
 c3 
0
{~v1 , ~v2 , ~v3 , ~v4 } is a linearly dependent set. ♦
102 CHAPTER 2. VECTORS IN RN

2.7 Linear Transformations

A function f whose domain and codomain is R is a rule that assigns one number x to another number y.
We write f (x) = y. These are the types of functions you study in first year calculus. Functions can be
generalized to vectors. In this section, we study such functions.

2.7.1 Transformations
We start with an example.

Example 2.7.1

Define the matrix " #

1 3 5
A= .
2 4 6
 
v1
For any vector ~v =  v2  ∈ R3 , the matrix vector product is defined as
 

v3
" # " # " # " #
1 3 5 v1 + 3v2 + 5v3
A~v = v1 + v2 + v3 = .
2 4 6 2v1 + 4v2 + 6v3
Some specific examples:
 
" # 1 " # " #
1 3 5 1·1+3·0+5·0 1
 0 = = ,
 
2 4 6 2·1+4·0+6·0 2
0
 
" # 4 " # " #
1 3 5 4·1+3·1+5·2 17
 1 = = .
 
2 4 6 2·4+4·1+6·2 24
2
This shows that multiplication by A assigns a vector in R2 to a vector in R3 , hence it is like a function
with domain R3 and codomain R2 !

The previous example shows that we can think of matrix multiplication as some sort of function that takes
vectors as input and spits vectors out. Such functions are called transformations.

Definition 2.7.1: Transformations

A transformation F from Rk to Rn is a rule that assigns to each vector ~v ∈ Rk a vector in Rn .

This vector is denoted F (~v ). Rk is called the domain of F , and Rn is called the codomain of F .
For notation, we write F : Rk → Rn , which is read as “F is a transformation from Rk to Rn ”.

A transformation is merely a special type of function. It takes vectors from its domain as input and returns
2.7. LINEAR TRANSFORMATIONS 103

vectors in its codomain. Consequently, the same language for functions carries over for transformations as
well.

Definition 2.7.2: Image and Range

Let F : Rk → Rn be a transformation. For any ~v ∈ Rk , the vector F (~v ) ∈ Rn is called the image of
~v under F . The set of all images F (~v ) is called the range of F .

Example 2.7.2

The matrix in Example 2.7.1 defines a transformation F : R3 → R2 defined by

 
v1 " #
v1 + 3v2 + 5v3
F (~v ) = F  v2  = = A~v .
 
2v1 + 4v2 + 6v3
v3

Given any n × k matrix A, and any ~v ∈ Rk , the matrix vector product A~v is a vector in Rn . Thus, matrix
vector multiplication defines a transformation FA : Rk → Rn given by FA (~v ) = A~v for all ~v ∈ Rk . These are
called matrix transformations.

Definition 2.7.3: Matrix Transformations

A transformation FA : Rk → Rn is called a matrix transformation if there is an n × k matrix A

such that FA (~v ) = A~v for all ~v ∈ Rk .

Example 2.7.3

Let In be the following n × n matrix

 
1 0 0 ... 0

 0 1 0 ... 0 

0 0 1 ... 0
 
In =  .
.. .. .. ..
 
 .. 
 . . . . . 
0 0 0 ... 1

Define a transformation FI : Rn → Rn by FI (~v ) = In~v . Calculate FI (~v ) for any ~v ∈ Rn .

104 CHAPTER 2. VECTORS IN RN

 
v1
 v2 
Solution. Write ~v =   ∈ Rn . Then,
 
 ... 
~vn
       
1 0 0 v1
0 1 0 v2
       
       
       
 0   0   0   v3 
FI (~v ) = In~v = v1   + v2   + . . . + vn  =  = ~v .
       
.. .. .. ..

 . 


 . 


 .  
  . 

0 0 0   vn−1
       
     
0 0 1 vn

Thus, FI (~v ) = ~v for all ~v ∈ Rn . ♦

Note
The matrix In defined above is called the n × n identity matrix. The transformation FI is called
the identity transformation. The identity matrix plays a very important role in linear algebra.
We will see it more frequently later in the course.

Example 2.7.4

Let FA : R3 → R2 be the matrix transformation from Examples 2.7.1 and 2.7.2;

 
" # v1 " #
1 3 5  v1 + 3v2 + 5v3
FA (~v ) = A~v =  v2 = .

2 4 6 2v1 + 4v2 + 6v3
v3

i) What are the domain and codomain of FA ?

 
3
ii) Let ~u =  2 . Calculate FA (~u).
 

1
" #
7
iii) Let ~b = . Find ~v ∈ R3 such that FA (~v ) = ~b.
8

iv) Is there more than one ~v ∈ R3 such that FA (~v ) = ~b?

" #
0
v) Is the vector ~y = in the range of FA ?
1

Solution.

i) The matrix transformation takes vectors from R3 as input and outputs vectors in R2 . Therefore, the
domain of FA is R3 and the codomain of FA is R2 . ♦
2.7. LINEAR TRANSFORMATIONS 105

ii)  
" # 3 " # " #
1 3 5 1·3+3·2+5·1 14
FA (~u) =  2 = = . ♦
 
2 4 6 2·3+4·2+6·1 20
1

iii) The questions asks if for a vector ~v ∈ R3 such that FA (~v ) = ~b. Since FA (~v ) = A~v , this means we need
to find a vector ~v ∈ R3 such that A~v = ~b; i.e. we need to find a solution to the matrix equation A~x = ~b.
We know how to do this! Augment A with ~b and row reduce!
" # " #
h i 1 3 5 7 1 0 −1 −2
A | ~b = ∼ .
2 4 6 8 0 1 2 3
The vector form of the solution is evident from the RREF:
     
x1 −2 1
~x =  x2  =  3  + s  −2  , s ∈ R.
     

x3 0 1

Therefore, any vector ~v of the above form is a solution to A~x = ~b; hence satisfies FA (~v ) = ~b. A specific
vector is found by picking any fixed value of s. If we take s = 0, then,
 
−2 " #
7
FA  3  = . ♦
 
8
0

iv) Yes because there are infinitely many solutions to the matrix equation A~x = ~b and each such solution
yields a vector ~v such that FA (~v ) = ~b. ♦

"#
0
v) If ~y = ~ ∈ R3 such that FA (w)
is in the range of FA , then by definition there is a vector w ~ = ~y .
1
Such a vector is necessarily a solution to the matrix equation A~x = ~y . Therefore, we can answer the
question by determining whether or not this matrix equation has a solution. We know how to do this!
Form the augmented matrix and row reduce!
" # " #
1 3 5 0 1 0 −1 3/2
[ A | ~y ] = ∼ .
2 4 6 1 0 1 2 −1/2

The vector form of the solution to A~x = ~y is,

     
x1 3/2 1
~x = ~x =  x2  =  −1/2  + s  −2  , s ∈ R.
     

x3 0 1

Since A~x = ~y has a solution, we conclude that yes, ~y is in the range of FA . For example, letting s = 1,

5/2
p) = ~y is p~ =  −5/2
a specific vector p~ such that FA (~ . ♦
 

1
106 CHAPTER 2. VECTORS IN RN

Parts iii) and v) of Example 2.7.4 ask whether or not a specific vector is in the range of a matrix transfor-
mation. The question is answered by finding the solution to a matrix equation. Moreover, we know this is
equivalent to asking whether or not a given vector is in the span of some others. Therefore, we can now
phrase questions that we’ve seen in the last two sections in terms of transformations. For example, part iii)
of Example 2.7.4 can be stated in terms of spanning sets as follows.

Restatement
" # " # " # " #
1 3 5 7
“Let ~v1 = , ~v2 = , ~v3 = . Is ~b = in span {~v1 , ~v2 , ~v3 }?”
2 4 6 8

2.7.2 Linear Transformations

There are many types of transformations. Those of specific interest are the linear transformations.

Definition 2.7.4: Linear Transformations

Let F : Rk → Rn be a transformation. F is called a linear transformation if the following two

conditions hold:

1. F (~u + ~v ) = F (~u) + F (~v ) for all ~u, ~v ∈ Rk ;

2. F (rw)
~ = r(F (w)) ~ ∈ Rk and all scalars r ∈ R.
~ for all w

Note on Language

If F is a linear transformation, we usually refer to F only as “linear.” I will do this a lot.

There are transformations that are not linear, as the following example shows.

Example 2.7.5

Let F : R3 → R3 be a transformation defined by

   
v1 v1
F (~v ) = F  v2  =  1  .
   

v3 v3

Show that F is not linear.

Solution. To show F is not linear, it suffices to find examples of vectors that violate one of the two properties
2.7. LINEAR TRANSFORMATIONS 107

in the definition of linear transformations. To this end, let

  

1 0
~v1 =  0 , and ~v2 =  0  .
   

0 1
 
1
Then, ~v1 + ~v2 =  0  , so that
 

1
 
1
F (~v1 + ~v2 ) =  1  .
 

1
On the other hand,   
1 0
F (~v1 ) =  1  , and F (~v2 ) =  1  ,
   

0 1
so that  
1
F (~v1 ) + F (~v2 ) =  2  .
 

1
This shows F (~v1 + ~v2 ) 6= F (~v1 ) + F (~v2 ) which means F is not a linear transformation. ♦

The answer to the previous example violates the first condition in the definition of linearity to show F is not
a linear transformation. It is perfectly valid to violate the second condition to show a transformation is not
linear as well.

Example 2.7.6

Let F : R3 → R be a transformation defined by

 
v1
F (~v ) = F  v2  = v12 .
 

Show that F is not linear.

Solution. This time, we find a counterexample to the second condition in the definition of linear transfor-
mations. To this end, let  
1
~v =  0  .
 

0
Then,  
2
F (2~v ) = F  0  = 22 = 4,
 

0
108 CHAPTER 2. VECTORS IN RN

and
2F (~v ) = 2 · 12 = 2.
Therefore, F (2~v ) 6= 2F (~v ). This violates property 2 in the definition of linear transformations and, conse-
quently, F is not a linear transformation. ♦

The following examples show how to prove a transformation is linear, even if it is not given explicitly.

Example 2.7.7

Let c ∈ R be a scalar. Define a transformation F : Rn → Rn by F (~v ) = c~v . Prove that F is a linear

transformation.

Solution. To prove that F is linear, we must show that F satisfies the two conditions in the definition.
First, let ~u, ~v ∈ Rn . Then,

F (~u + ~v ) = c(~u + ~v ) = c~u + c~v = F (~u) + F (~v ),

where the middle equality follows from part 2 of Properties of Vectors. Therefore, the first condition for
~ ∈ Rn . Then,
linear transformations is verified. For the second condition, let r ∈ R and w

F (rw)
~ = c(rw)
~ = (cr)w
~ = (rc)w
~ = r(cw)
~ = r(F (w)),
~

where the middle equality follows from part 5 of Properties of Vectors. Therefore, the second condition for
linear transformations is verified and so, F is a linear transformation.

Note
If 0 ≤ c < 1, then the linear transformation in Example 2.7.7 is called a contraction. If c ≥ 1, then
the linear transformation in Example 2.7.7 is called a dilation.

The following theorem is important. It shows that every matrix transformation is a linear transformation.

Theorem 2.7.1
Every matrix transformation is a linear transformation.

Proof. Let F : Rk → Rn be a matrix transformation. Then, by definition, there exists an n × k matrix

A such that F (~v ) = A~v for all ~v ∈ Rk . To prove F is linear, we need to check the two properties of the
definition. For the first property, let ~u, ~v ∈ Rk . Then,

F (~u + ~v ) = A(~x + ~y ) = A~x + A~y = F (~u) + F (~v ),

where the middle equality follows from part 1 of Theorem 2.4.1. Thus, the first property in the definition of
~ ∈ Rk . Then,
linear transformations holds. For the second property, let r ∈ R be any scalar and let w

F (rw)
~ = A(rw)
~ = r(Aw)
~ = r(F (w)),
~
2.7. LINEAR TRANSFORMATIONS 109

where the middle equality follows from part 2 of Theorem 2.4.1. Thus, the second condition in the definition
of linear transformation holds and, hence, F is a linear transformation.

An amazing fact about is that the converse of Theorem 2.7.1 holds as well; that is, every linear transforma-
tion is a matrix transformation. We prove this formally in Section 2.7.4.

The following theorem gives two properties that every linear transformation satisfies. The first one is par-
ticularly useful for showing transformations are not linear.

Theorem 2.7.2

Let F : Rk → Rn be a linear transformation. Then, F (~0k ) = ~0n .

This theorem is good for testing if you have a non-linear transformation.

Example 2.7.8

Let F : R2 → R2 by " #! " #

v1 cos(v1 )
F = .
v2 sin(v2 )
Show that F is not linear.

Solution. We have, " # " #

cos(0) 1
F (~0) = = 6= ~0.
sin(0) 0

Since F (~0) 6= ~0, part 1 of Theorem 2.7.2 implies that F is not linear. ♦

Proof. Note that ~0k + ~0k = ~0k . Since F is linear, property 1 in the definition implies,

F (~0k ) = F (~0k + ~0k ) = F (~0k ) + F (~0k ).

Subtracting F (~0k ) from both sides yields F (~0k ) = ~0n .

Warning!
If asked to determine if a given transformation F is linear, the first thing you should always check is
F (~0) = ~0. But be careful. Theorem 2.7.2 can only be used to show that something is not linear. If
a transformation F satisfies F (~0) = ~0, this is not enough information to conclude that F is a linear
transformation.

Linear transformations possess the following property that allows you to “pull” it through a sum.
110 CHAPTER 2. VECTORS IN RN

Theorem 2.7.3

Let F : Rk → Rn be a linear transformation. Let ~v1 , ~v2 , . . . , ~vm ∈ Rk and let c1 , c2 , . . . , cm be scalars.
Then,
F (c1~v1 + c2~v2 + . . . + cm~vm ) = c1 F (~v1 ) + c2 F (~v2 ) + . . . + cm F (~vm ).

Proof. We only prove this for 2 vectors. I leave the proof of the general case to the reader.

Let ~u, ~v ∈ Rk and r, s ∈ R. Then, r~u, s~v , r~u + s~v ∈ Rk and, since F is assumed linear, the first condition in
the definition implies
F (r~u + s~v ) = F (r~u) + F (s~v ),

and the second condition implies,

F (r~u) + F (s~v ) = rF (~u) + sF (~v ).

Combining the equalities yields

F (r~u + s~v ) = rF (~u) + sF (~v ),

which completes the proof.

2.7.3 The Geometry of Linear Transformations

In this section, we look at what linear transformations do to vectors geometrically in R2 .

Example 2.7.9
" #
1 0
Let and define the linear transformation FA : R2 → R2 by FA (~v ) = A~v . Describe how
0 −1
FA acts on vectors in R2 geometrically.

" #
v1
Solution. Let ~v = ∈ R2 . Then,
v2
" #" # " #
1 0 v1 v1
FA (~v ) = A~v = = .
0 −1 v2 −v2
# " " # " #
v1 2 v1 v1
If we consider the vector as a point in R , is the reflection of over the x-axis.
v2 −v2 v2
This means that FA is reflecting the given vector ~v over the x-axis. Therefore, FA acts on vectors in R2
by reflecting them over the x-axis. The following picture shows examples of FA applied to some vectors in R2 .
2.7. LINEAR TRANSFORMATIONS 111

w
~ x2

FA (~v ) x1

FA (w)
~

We can also apply FA to a set of vectors. Here is FA applied to the square S with vertices at (±1, 1) and
(±1, 3). In this case, the vectors are interpreted as points in R2 .
x2

FA (S)

Exercise
Reflections over the y-axis and reflections over the line x = y are matrix transformations as well. See
if you can figure out the matrix that defines.
112 CHAPTER 2. VECTORS IN RN

Example 2.7.10
" #
1 1 −1
Let A = √ and define the linear transformation FA : R2 → R2 by FA (~v ) = A~v . Describe
2 1 1
how FA acts on vectors in R2 geometrically.

" #
1
Solution. This one is a little trickier. To get an idea of what FA is doing, apply FA to ~v = a few
0
times and plot them.

" #" # " #

1 1 −1 1 1 1
FA (~v ) = √ =√ ,
2 1 1 0 2 1
" #! " #! " # " #
1 1 −1 1 1 1 0 0
FA (FA (~v )) = √ √ = = ,
2 1 1 2 1 2 2 1
" #" # " #
1 1 −1 0 1 −1
FA (FA (FA (~v ))) = √ =√
2 1 1 1 2 1

Plotting these yields the following.

FA (FA~v ))

FA (FA (FA (w)))

~ FA (~v )

x1
~v

We see that the linear transformation is rotating ~v counter-clockwise about the origin through an angle of
45 degrees. In fact, any counter-clockwise rotation of a vector about the origin through an angle ϕ, is a
matrix transformation. The matrix that defines the rotation is called a rotation matrix, denoted Rϕ , and
is given by,
" #
cos(ϕ) − sin(ϕ)
Rϕ = . ♦
sin(ϕ) cos(ϕ)
2.7. LINEAR TRANSFORMATIONS 113

Another common type of linear transformation is given by a matrix of the form

" # " #
1 m 1 0
or
0 1 m 1

where m is some non-zero number. These are called shear transformations, the former matrix corresponds
to a horizontal shear and the latter corresponds to a vertical shear. Here are a few examples of what
shears look like after applying them to the square with vertices (0, 0), (1, 0), (1, 1), and (0, 1).
" #
1 2
Horizontal Shear 1. :
0 1

F (S)

" #
1 −1/2
Horizontal Shear 2. :
0 1

x2
F (S) S

" #
1 0
Vertical Shear 1. :
−4 1
114 CHAPTER 2. VECTORS IN RN

x2
S

F (S)
2.7. LINEAR TRANSFORMATIONS 115
" #
1 0
Vertical Shear 2. :
3 1

F (S)

Here are the above shears applied to the square with vertices (1, −1), (1, 1), (−1, 1), and (−1, −1).
" #
1 2
Horizontal Shear 1. :
0 1

x2
S F (S)

" #
1 −1/2
Horizontal Shear 2. :
0 1
116 CHAPTER 2. VECTORS IN RN

x2
S F (S)

" #
1 0
Vertical Shear 1. :
−4 1

F (S)
2.7. LINEAR TRANSFORMATIONS 117
" #
1 0
Vertical Shear 2. :
3 1

x2 F (S)

Projections are another type of matrix transformation. These" are important

# in lots of areas of "
mathematics.
#
1 0 v1
An example of a projection P is P (~v ) = A~v where A = . Notice that if ~v = , then
0 0 v2
" #
v1
A~v = , so that the transformation is “projecting” the vector onto the x-axis, whence the name. P
0
applied to a few vectors is plotted below.

x2
~u

P (w)
~
x1
P (~v ) P (~u)

w
~
118 CHAPTER 2. VECTORS IN RN

Projections to the vertical axis are also possible. I leave it to the reader to explore such transformations.

In R3 , we vectors canbe projected

 onto lines
 or planes. For example, consider
 the matrix transformation P
1 0 0 v1 v1
3
whose matrix is A =  0 1 0  . If ~v =  v2  ∈ R , then P (~v ) = A~v =  v2 . Thus, P is “projecting”
     

0 0 0 v3 0
~v onto the xy-plane.

There are some other types of linear transformations as well, such as scaling in the horizontal and vertical
directions and squeezing. See if you can find matrices that represent these operations.

2.7.4 Linear Transformations are Matrices

Theorem 2.7.1 shows that every matrix is a linear transformation. In this section, we prove the converse to
this theorem. That is, every linear transformation is a matrix transformation. Even more amazingly, this
matrix is unique. First, we need a definition.

Definition 2.7.5: Standard Basis

The standard basis for Rn is the set of vectors {~e1 , ~e2 , . . . , ~en } ⊆ Rn where ei is the vector with a
1 in the ith component, and zeroes everywhere else. That is, each ei has the following form,
 
0
 . 
 .. 
 
 0 
 
 
ei =  1 

 0 
 
 . 
 . 
 . 
0

where the the 1 is in the ith component.

Example 2.7.11

The standard basis of R3 is

      

 1 0 0 
~e1 =  0  , ~e2 =  1  , ~e3 =  0  .
     
 
0 0 1
 
2.7. LINEAR TRANSFORMATIONS 119

The standard basis of R4 is

        

 1 0 0 0 

  0   1 
   0   0 
~e1 =   , ~e2 =   , ~e3 =   , ~e4 =   .
       



 0   0   1   0 


0 0 0 1
 

( # " " #) " #

1 0 v1
Consider the standard basis ~e1 = , ~e2 = 2
of R . Let ~v = ∈ R2 be a vector. Then,
0 1 v2
" # " # " # " # " #
v1 v1 · 1 0 1 0
~v = = + = v1 + v2 = v1~e1 + v2~e2 .
v2 0 v2 · 1 0 1

This shows every vector ~v ∈ R2 is a linear combination of ~e1 and ~e2 . It is not hard to generalize this argument
to Rn for any arbitrary n ≥ 1. That is, if ~v ∈ Rn and {~e1 , ~e2 , . . . , ~en } is the standard basis of Rn , then

~v = v1~e1 + v2~e2 + . . . + vn e~n

where v1 , v2 , . . . , vn are the components of ~v . Using this, we prove the converse to Theorem 2.7.1.

Theorem 2.7.4

Let F : Rk → Rn be a linear transformation. Then, there exists a unique n × k matrix A such that
F (~v ) = A~v for all ~v ∈ Rk .

Proof. Let ~v ∈ Rk . Using the observation above, write ~v = v1~e1 + +v2~e2 + . . . + vk~ek where v1 , v2 , . . . , vk
are the components of ~v . Since F is linear, part 2 of Theorem 2.7.3 implies

F (~v ) = F (v1~e1 + v2~e2 + . . . + vn~en ) = v1 F (~e1 ) + v2 F (~e2 ) + . . . + vk F (~ek ). (2.4)

Since F (~ei ) ∈ Rn for each i = 1, . . . , k, the matrix A = [ F (~e1 ) F (~e2 ) . . . F (~en ) ] is n × k. By definition of
matrix vector multiplication, Equation (2.4) implies

F (~v ) = A~v .

Thus, we have found an n × k matrix A such that F (~v ) = A~v for all ~v ∈ Rk .

We now show A is unique. Suppose B is another matrix such that F (~v ) = B~v for all ~v ∈ Rk . Then
A~v =h B~v for all ~v i∈ Rk . In particular, A~ei = B~ei for each i = 1, . . . , k. Write A = [ ~a1 ~a2 . . . ~ak ] and
B = ~b1 ~b2 . . . ~bk . Then,

A~ei = 0~a1 + +0~a2 + . . . + 0~ai−1 + 1~ai + 0~ai+1 + . . . + 0~ak = ~ai , for each i = 1, . . . , k

and similarly
B~ei = ~bi for each i = 1, . . . , k.
120 CHAPTER 2. VECTORS IN RN

Therefore, ~ai = ~bi for each i = 1, . . . , k which prove A = B. Therefore, A is unique.

The matrix in Theorem 2.7.4 are called standard matrices.

Definition 2.7.6: Standard Matrices

Let F : Rk → Rn be a linear transformation. The n × k matrix [ F (~e1 ) F (~e2 ) . . . F (~ek ) ] is called

the standard matrix for F .

Example 2.7.12
 
  v1
v1
3 4
 v1 + v2 
Let F : R → R be a linear transformation given by F  v2  =  . Find the
   
 v1 + v2 + v3 
v3
v1 − v3
standard matrix for F .

Solution. Apply F to the standard basis {~e1 , ~e2 , ~e3 } for R3 to get
         
1 1 0 0 0
 1+0   1   0+1   1   0 
F (~e1 ) =  = , F (~e2 ) =  = , F (~e3 ) =  .
         
 1+0+0   1   0+1+0   1   1 
1−0 1 0−0 0 −1

Thus, the standard matrix for F is

 
1 0 0
 1 1 0 
A= . ♦
 
 1 1 1 
1 0 −1

Example 2.7.1. Let f, g : R → R be differentiable functions and let r ∈ R. Recall from calculus that
derivatives obey the following properties:

d df dg d df
(f + g) = + , and (rf ) = r .
dx dx dx dx dx

d
This means that differentiation is a linear function from R to R. Can you find the standard matrix for ?
dx

2.7.5 One-to-One and Onto

In this section, we define one-to-one and onto. These are properties of transformations that are defined in
the same way as they are for functions.
2.7. LINEAR TRANSFORMATIONS 121

Definition 2.7.7: Onto

A linear transformation F : Rk → Rn is called onto if each ~b ∈ Rn is the image of some ~v ∈ Rk ; that

is, for each ~b ∈ Rn , there exists a ~v ∈ Rk such that F (~v ) = ~b.

If F : Rk → Rn is onto, we say F maps Rk onto Rn . Moreover, if F is onto, then the codomain of

F equals the range of F .

Here is a way to visualize an onto transformation. Imagine you are at a school dance and that everyone
present is split into two groups; call the first group Rk and the second group Rn . Think of a transformation
F : Rk → Rn as a way of assigning to a person in Rk a dance partner in Rn . Then, F is onto if every person
in Rn has a dance partner in Rk under F .

Suppose F : Rk → Rn is an onto linear transformation with standard matrix A. Then, for every ~b ∈ Rn ,
there is a vector ~v ∈ Rk such that F (~v ) = A~v = ~b. Thus, the matrix equation A~x = ~b has a solution for every
vector ~b ∈ Rn and so the columns of A span Rn . This shows that if F is an onto linear transformation, then
the columns of its standard matrix span Rn . The converse is also true. This let’s us add a new condition
into The Span Theorem.

Theorem 2.7.5: The Span Theorem

Let ~v1 , ~v2 , . . . , ~vk ∈ Rn and let A = [ ~v1 ~v2 . . . ~vk ] be the n × k matrix whose columns are
~v1 , ~v2 , . . . , ~vk . Let FA : Rk → Rn be the linear transformation whose standard matrix is A. The
following are equivalent.

1. For every ~b ∈ Rn , the equation A~x = ~b has a solution;

2. The columns of A span Rn . This means every vector ~b ∈ Rn is a linear combination of the
columns of A. Equivalently, {~v1 , ~v2 , . . . , ~vk } spans Rn .

3. Any echelon form of A has a pivot in every row.

4. FA is onto.

Example 2.7.13

Let F : R5 → R4 be a linear transformation whose standard matrix is given by

 
3 2 9 1 0
 0 1 −1 4 6 
A= .
 
 1 2 7 7 1 
2 0 0 8 0

Is F onto?
122 CHAPTER 2. VECTORS IN RN

Solution. An echelon form of A is

 
1 2 7 7 1
 0 1 −1 4 6 
.
 
−16 −4

 0 0 21 
0 0 0 29/2 −13/8

This matrix has a pivot in every row. Therefore, by The Span Theorem, the linear transformation F is
onto.

Proof. We’ve already proved the first three statements are equivalent. Therefore, it suffices to show the
fourth is equivalent to any of the first three. We show it is equivalent to 1.

First suppose A~x = ~b has a solution for all ~b ∈ Rn . Then, there is ~v ∈ Rk such that A~v = ~b. As A is the
standard matrix for FA , it follows that FA (~v ) = ~b. This shows every ~b ∈ Rn is in the range of FA , so that
FA is onto.

Conversely, suppose FA is onto. Pick any ~c ∈ Rn . Then, there is w ~ ∈ Rk such that FA (w)
~ = ~c. As A is the
standard matrix of FA , it follows that Aw ~ = ~c, which shows w
~ is a solution to the matrix equation A~x = ~c.
Therefore, A~x = ~c has a solution for all ~c ∈ Rn .

This is wonderful! We now have an easy way to test if a linear transformation is onto: calculate its standard
matrix, put it into echelon form, and see if there is a pivot in every row!

We now have four distinct ways of asking questions about spanning sets in Rn . These are summarized as
follows.

Four Equivalent Questions Regarding Spanning

Let ~v1 , ~v2 , . . . , ~vk ∈ Rn . Let A = [ ~v1 ~v2 . . . ~vk ]. The following questions are all different ways to
ask the same thing.

1. Can every vector ~b ∈ Rn be written as a linear combination of ~v1 , ~v2 , . . . , ~vk ?

2. Does the matrix equation A~x = ~b have a solution for all ~b ∈ Rn ?

3. Does {~v1 , ~v2 , . . . , ~vk } span Rn ? or does span {~v1 , ~v2 , . . . , ~vk } = Rn ?

4. Let FA : Rk → Rn be the linear transformation whose standard matrix is A. Is FA onto?

All four of these question are solved the same way: row reduce the appropriate matrix and count
pivot rows.

Onto is a property of linear transformations that is related to spanning. One-to-one is related to linear
independence.
2.7. LINEAR TRANSFORMATIONS 123

Definition 2.7.8: One-to-One

A linear transformation F : Rk → Rn is called one-to-one if for each ~b ∈ Rn , there is at most one

~v ∈ Rk such that F (~v ) = ~b.

Again, think of Rn and Rk as two groups of people at a dance and a transformation F : Rk → Rn as a

method of assigning people in Rn dance partners in Rk . If F is one-to-one, then this means that every person
in Rn has at most one dance partner in Rk .

Note
The definitions of one-to-one and onto hold true for any function, not just linear transformations. For
example, the function f : R → R given by f (x) = x2 is neither one-to-one nor onto, but the function
g : R → R given by g(x) = x3 is both one-to-one and onto. See if you can prove these claims!

The Span Theorem provides an easy criterion for checking if a linear transformation is onto. We prove a
similar criterion for one-to-one.

Theorem 2.7.6: The Linear Independence Theorem Version

1. The homogeneous equation A~x = ~0 has only the trivial solution;

2. The columns of A are linearly independent; that is, {~v1 , ~v2 , . . . , ~vk } is a linearly dependent set.

3. Any echelon form of A has a pivot in every column.

4. FA is one-to-one.

Example 2.7.14

Refer to the linear transformation F in Example 2.7.13. Is this F one-to-one?

Solution. We saw that an echelon form of the standard matrix of F is,

 
1 2 7 7 1
 0 1 −1 4 6 
 
 0 0 −16 −4
 
21 
0 0 0 29/2 −13/8

Since this matrix does not have a pivot in every column, the linear transformation F is not one-to-one. ♦
124 CHAPTER 2. VECTORS IN RN

Proof. We’ve already proved the first three are equivalent. Therefore, it suffices to show 4 is equivalent to 1.

Suppose A~x = ~0 has only the trivial solution. Pick any ~u, ~v ∈ Rk that satisfy FA (~u) = FA (~v ). It suffices to
show that ~u = ~v . Rearranging this equation yields

FA (~u) − FA (~v ) = ~0.

Since FA is linear, this equals

FA (~u − ~v ) = ~0.

Since A is the standard matrix for FA , this is equivalent to

A(~u − ~v ) = ~0.

Therefore, ~u − ~v is a solution to the homogeneous equation A~x = ~0. By assumption, this equation has only
the trivial solution ~0. Therefore, we conclude that ~u − ~v = ~0 =⇒ ~u = ~v . This shows FA is one-to-one.

Conversely, suppose FA is one-to-one. Then, the equation FA (x) ~ = ~0 has at most one solution. Since ~x = ~0
is always a solution to this equation, it must be the only one. This exactly means A~x = ~0 has only the trivial
solution.

Example 2.7.15

Let F : R4 → R4 be a linear transformation whose standard matrix is given by

 
3 2 9 1
 0 1 −1 4 
A= .
 
 1 2 7 7 
2 0 0 8

Determine if F is one-to-one or onto.

Solution. An echelon form of A is

 
1 2 7 7
 0 1 −1 4 
.
 
−16 −4

 0 0 
0 0 0 29/2

The echelon form has a pivot in every row and in every column. Therefore, F is both one-to-one and onto. ♦
2.7. LINEAR TRANSFORMATIONS 125

Example 2.7.16

Let F : R3 → R5 be a linear transformation whose standard matrix is given by

 
2 −5 −1
 1 2 1 
 
 
A=  2 4 .
2 
−1 4 1
 
 
1 2 1

Determine if F is one-to-one or onto.

Solution. An echelon form of A is  

2 −5 −1
0 9/2 3/2 
 

 
0 0 .
0 


0 0 0 
 

0 0 0
The echelon form does not have a pivot in every row, nor in every column. Therefore, F is neither one-to-one
nor onto. ♦

We now have four distinct ways to ask questions regarding linear independence. These are summarized as
follows.

Four Equivalent Questions Regarding Linear Independence

Let ~v1 , ~v2 , . . . , ~vk ∈ Rn . Let A = [ ~v1 ~v2 . . . ~vk ]. The following questions are all different ways to
ask the same thing.

1. Does the vector equation

x1~v1 + x2~v2 + . . . + xk~vk = ~0

have only the trivial solution?

2. Does the matrix equation A~x = ~0 have only the trivial solution?

3. Are ~v1 , ~v2 , . . . , ~vk linearly independent?

4. Let F : Rk → Rn be the linear transformation whose standard matrix is A. Is F one-to-one?

All four of these questions are solved in the same way: row reduce the appropriate matrix and count
pivot columns.

Sometimes you can tell if a linear transformation is one-to-one or onto just by looking at the domain and
codomain.
126 CHAPTER 2. VECTORS IN RN

Corollary 2.7.1

Let F : Rk → Rn be a linear transformation.

1. If n > k, then F is not onto.

2. If k > n, then F is not one-to-one.

3. If n = k, then F is either both one-to-one and onto or it is neither.

Proof. Let A be the standard matrix for F . Then, A is n × k. There are three cases.

Case 1. If n > k, then A has more rows than columns. By Corollary 2.5.1, the columns of A do not span
Rn . Therefore, F is not onto by The Span Theorem.

Case 2. If k > n, then A has more columns than rows. By Corollary 2.6.4, the columns of A are linearly
dependent. Therefore, F is not one-to-one by The Linear Independence Theorem Version.

Case 3. If n = k, then A has the same number of rows as columns. Therefore, an echelon form of A either
has a pivot in every row AND column, or there is a column and a row without a pivot. Thus, by The Span
Theorem and The Linear Independence Theorem Version, F is either is both one-to-one and onto or it is
neither.

There is a practical (but not at all rigorous) way to think about and remember Corollary 2.7.1. Suppose
F : Rk → Rn is a linear transformation and that n > k. Then, the vectors in Rn have more components than
the vectors in Rk . Intuitively, it seems like there should be no way to “cover” all of Rn with Rk . Likewise, if
k > n, the vectors in the domain have more components than the vectors in the codomain, so the domain is
“too big” to cover each vector in the codomain uniquely. This is only a practical way to think about this as
the reasoning is unsound. Indeed, both Rk and Rn have the same number of vectors in them for any posi-
tive integers n and m, surprising as that may seem. This fact, however, is beyond the scope of this document.

Warning!
One-to-one and onto are not binary. Just because a linear transformation is not one-to-one doesn’t
mean it’s onto, and vice versa. The only time they are intertwined is when the dimension of the
domain is equal to the dimension of the codomain. Otherwise, they are generally not related.

The following example ties all that we’ve been doing so far in this section together.
Example 2.7.17

For each of the following matrices, do the following.

i) Write down the domain and codomain of the corresponding linear transformation.
2.7. LINEAR TRANSFORMATIONS 127

ii) Determine if the corresponding linear transformation is one-to-one, onto, both, or neither.
 
−1 0
a) A =  0 2 
 

3 9
" #
8 1 5 7
b) B =
1 −2 4 −1
 
2 −2 −8
c) C =  3 1 0 
 

1 0 −1
" #
1 −1
d) D =
0 1
" #
−1 3 −1
e) E =
0 0 0

Solution.

a) Let FA denote the linear transformation whose standard matrix is A.

i) A is 3 × 2, therefore the domain of FA is R2 and its codomain is R3 .

ii) Since the dimension of the domain is smaller than that of the codomain, FA is not onto by
Corollary 2.7.1. It could be one-to-one, though. To determine this, we calculate an echelon form
of A:  
−1 0
A ∼  0 2 .
 

0 0
This echelon form has a pivot in every column. Therefore, FA is one-to-one by The Linear
Independence Theorem Version. ♦

b) Let FB denote the linear transformation whose standard matrix is B.

i) B is 2 × 4. Therefore, the domain of FB is R4 and the codomain is R2 .

ii) Since the dimension of the domain is larger than that of the codomain, FB is not one-to-one by
Corollary 2.7.1. It could be onto, though. To determine this, we calculate an echelon form of B:
" #
1 −2 4 −1
B∼ .
0 17 −27 15

This echelon form has a pivot in each row. Therefore, FB is onto by The Span Theorem. ♦

c) Let FC denote the linear transformation whose standard matrix is C.

128 CHAPTER 2. VECTORS IN RN

i) As C is 3 × 3, the domain of FC is R3 and the codomain is R3 .

ii) By Corollary 2.7.1, FC is both one-to-one and onto or it is neither because the dimensions of its
domain and codomain are the same. To determine this, we find an echelon form for C:
 
1 0 −1
C∼ 0 1 3 .
 

0 0 0

This echelon form does not have a pivot in every row nor in every column. Therefore, FC is neither
one-to-one nor onto by The Span Theorem and The Linear Independence Theorem Version.

d) Let FD be the linear transformation whose standard matrix is D.

i) As D is 2 × 2 its domain and codomain are both R2 .

ii) Since the columns of D are not scalar multiples of one another, the columns of D are linearly
independent. Therefore, FD is one-to-one by The Linear Independence Theorem Version. Since
the dimensions of its domain and codomain are teh same, it is also onto by Corollary . ♦

e) Let FE denote the linear transformation whose standard matrix is E.

i) As E is 2 × 3, the domain of FE is R3 and the codomain is R2 .

ii) Since the dimension of the domain is larger than the dimension of the codomain, FE is not one-
to-one by Corollary 2.7.1. E is already in echelon form and does not have a pivot in every row.
Therefore, by the The Span Theorem, FE is not onto either. ♦
Chapter 3

Algebra of Matrices

In this chapter, we develop the algebra of matrices. Some of the algebraic properties of matrices are similar
to those of real numbers, but there are some unfamiliar properties as well.

3.1 Addition, Subtraction, and Scalar Multiplication of Matrices

In this section, we define some terminology, notational convention, and basic operations on matrices. First,
suppose A is an n × k matrix and write
 
a11 a12 . . . a1k
 a21 a22 . . . a2k 
 
A=  .. .. .. .
.. 
 . . . . 
an1 an2 . . . ank

To simplify notation, we write A = [aij ], 1 ≤ i ≤ n, 1 ≤ j ≤ k, or A = [aij ] when the size of the matrix is
clear from context.

We start by defining a number of terms regarding matrices.

Definition 3.1.1: Matrix Terminology

1. A matrix is called square if it has the same number of rows as it does columns.

2. Let A = [aij ], 1 ≤ i ≤ n, 1 ≤ j ≤ k be a n × k matrix. The aii terms are called the main
diagonal of A.

3. The n × k matrix whose entries are all zero is called the n × k zero matrix. This is denoted
by [0]nk , or 0nk , or merely [0] if the context is clear.

4. The n × n identity matrix, denoted In , is the matrix with with ones on its main diagonal and
zeroes everywhere else.

129
130 CHAPTER 3. ALGEBRA OF MATRICES

Example 3.1.1

The main diagonal of the following three matrices have been boldfaced and underlined
 
  1 2
1 2 1 " #
 0 1  −1 2 1 −1
A1 =  0 0 1 , A2 =  , A3 = .
   
 0 −1  2 −1 1 1
−1 1 2
10 1

Example 3.1.2

The following are identity matrices,

 
  1 0 0 0
" # 1 0 0
1 0  0 1 0 0 
I2 = , I3 =  0 1 0 , I4 =  ,
   
0 1  0 0 1 0 
0 0 1
0 0 0 1

and etcetera.

Identity matrices play the same role for square matrices that 1 does for real numbers. We’ll see this when
we define matrix multiplication. Moreover, notice the columns of In form the standard basis for Rn . Thus,
we can write,
In = [ ~e1 ~e2 . . . ~en ] .

Before we can perform algebra on matrices, we need operations: addition, subtraction, multiplication,
etcetera. Since matrices are new mathematical objects, we need to define these operations from scratch.
Equality, addition, and subtraction are all defined in the obvious way. Multiplication is a little bit different.

Definition 3.1.2: Matrix Equality, Addition, and Subtraction

Consider two n × k matrices A = [aij ] and B = [bij ].

1. A and B are equal if if aij = bij for each i ∈ {1, 2, . . . , n} and each j ∈ {1, 2, . . . , k} . That is,
A and B are equal componentwise. If two matrices are equal, we write A = B.

2. The sum/difference, of A and B, denoted A ± B, is the n × k matrix

 
a11 ± b11 a12 ± b12 ... a1k ± b1k
a21 ± b21 a22 ± b22 ... a2k ± b2k
 
 
A ± B = [aij ± bij ] =  .. .. .. .. .
.
 
 . . . 
an1 ± bn1 an2 ± bn2 ... ank ± bnk

That is, the sum/difference of matrices is computed by adding/subtracting the terms compo-
nentwise.
3.1. ADDITION, SUBTRACTION, AND SCALAR MULTIPLICATION OF MATRICES 131

Note

Like with vectors, we do not define the sum/difference of two matrices if they have different sizes.

Example 3.1.3
" # " #
1 7 2 −2 0 1
Let A = and B = . Then,
1 2 −1 1 1 1
" # " #
1 + (−2) 7 + 0 2+1 −1 7 3
A+B = = ,
1+1 2+1 (−1) + 1 2 3 0
" # " #
1 − (−2) 7−0 2−1 3 7 1
A−B = = ,
1−1 2−1 (−1) − 1 0 1 −2
" # " #
−2 − 1 0−7 1−2 −3 −7 −1
B−A = = .
1−1 1 − 2 1 − (−1) 0 −1 2

Scalar multiplication on matrices is defined in the obvious way as well.

Definition 3.1.3: Scalar Multiplication of Matrices

Let A = [aij ] be an n × k matrix and let r ∈ R be any scalar. The scalar multiplication of r and A
is the n × k matrix  
ra11 ra12 ... ra1k
 ra21 ra22 ... ra2k
 

rA = [(raij )] = 
 .. .. .. .. .
.

 . . . 
ran1 ran2 ... rank

Example 3.1.4
" #
1 3 2
Let A = . Then,
2 4 7
" # " # " #
1 1/2 3/2 1 3 9 6 π 3π 2π
A= , 3A = , πA = .
2 1 2 7/2 6 12 21 2π 4π 7π
132 CHAPTER 3. ALGEBRA OF MATRICES

Example 3.1.5
" # " #
−1 2 −1 −1
Let A = ,B = . Then,
0 1 2 3
" # " # " #
−3 6 −2 −2 −1 8
3A − 2B = − = .
0 3 4 6 −4 −3

Vectors are matrices with a single column so it should come as no surprise that the definitions of addition,
subtraction, and scalar multiplication are the same for general matrices as they are vectors. The following
analogue of Properties of Vectors should come as no surprise either.

Theorem 3.1.1: Matrix Addition and Scalar Multiplication Properties

Let A, B, C be three matrices of the same size. Let r, s ∈ R be any scalars. Then,

1. A + B = B + A (commutativity of addition);

2. A + (B + C) = (A + B) + C (associativity of addition);

3. A + [0] = [0] + A = A;

4. r(A + B) = rA + rB (distributivity of scalar multiplication);

5. (r + s)A = rA + sA (distributivity or matrices);

6. r(sA) = (rs)A (associativity of scalar multiplication);

7. A − A = [0] .

Proof. The proofs of these properties all follow immediately from the definitions of addition and subtraction
of matrices and from from properties of real numbers. Here is an example of two of them. The rest are
similar and are left as exercises.

Proof of 1. Write A = [aij ], B = [bij ]. Then,

A+B = [aij + bij ] = [bij + aij ] = B + A.

| {z }
by properties of real numbers

Proof of 4. Write A = [aij ], B = [bij ]. Then

r(A + B) = [r(aij + bij )] = [raij + rbij ] = [raij ] + [rbij ] = rA + rB.

| {z }
definition of sums of matrices
3.1. ADDITION, SUBTRACTION, AND SCALAR MULTIPLICATION OF MATRICES 133

Exercise
Prove the rest of Matrix Addition and Scalar Multiplication Properties.

3.1.1 Matrix Multiplication

Addition and scalar multiplication for matrices is defined in the obvious way. Matrix multiplication on the
other hand is defined in a bit of a different way.

Definition 3.1.4: Matrix Multiplication

h i
Let A be an n × k matrix and let B be a k × m matrix. Write B = ~b1 ~b2 . . . ~bm . The product
of A and B, denoted AB, is the n × m matrix
h i h i
AB = A ~b1 ~b2 . . . ~bm = (A~b1 ) (A~b2 ) . . . (A~bm ) .

By definition, matrix multiplication is not defined for all matrices A and B. For AB to be defined, B must
have the same number of rows as A has columns. In this case, he size of AB is easy to calculate. If A is
n × k and B is k × m, then AB is k × m. A mnemonic for remembering this is to write

An×k Bk×m = (AB)n×m

where the size of AB is determined by “cancelling off” the two inner k’s.

Example 3.1.6
 
2 1 " #
2 8 1 −2
Let A =  −1 0 ,B = . Compute AB.
 
0 1 7 −1
3 −2

Solution. A is 3 × 2 and B is 2 × 4. Thus, AB is 3 × 4 and

h i
AB = (A~b1 ) (A~b2 ) (A~b3 ) (A~b4 ) .

Calculating the appropriate matrix vector products yields,

134 CHAPTER 3. ALGEBRA OF MATRICES

       
2 1 " # 2 1 4
 2
A~b1 =  −1 0  = 2  −1  + 0  0  =  −2  ,
      
0
3 −2 3 −2 6
       
2 1 " # 2 1 17
 8
A~b2 =  −1 0  = 8  −1  +  0  =  −8  ,
      
1
3 −2 3 −2 22
       
2 1 " # 2 1 9
 1
A~b3 =  −1 0  =  −1  + 7  0  =  −1  ,
      
7
3 −2 3 −2 −11
       
2 1 " # 2 1 −5
 −2
A~b4 =  −1 0  = −2  −1  −  0  =  2 .
      
−1
3 −2 3 −2 −4
Therefore,  
4 17 9 −5
AB =  −2 −8 −1 2 . ♦
 

6 22 −11 −4
There is a shortcut for calculating AB that is less work than calculating from the definition. It is easiest
seen with an example.

Example 3.1.7
 
−1 8 " #
7 8 −3
Let A =  2 −3  and B = . Calculate the product AB.
 
7 2 1
4 1

Solution. The shortcut requires a little bit of imagination. First, AB is 3 × 3 because A is 3 × 2 and B is
2 × 3. Write the following.
   
−1 8 " #
 7 8 −3
AB =  2 −3  = .
  
7 2 1
4 1

Start in the (1, 1)−entry of AB. To calculate " this,# take the first row of A and, in your mind, rotate it 90
−1
degrees clockwise so it looks like the vector . Lay this vector on top of the first column of B. You
8
" # " #
−1 7
should now be imagining the vector sitting on top of the vector . To calculate the (1, 1)-entry,
8 7
multiply the numbers sitting on top of each other, and add down the vector. In this example, we calculate

(−1) · 7 + 8 · 7 = −7 + 56 = 49.
3.1. ADDITION, SUBTRACTION, AND SCALAR MULTIPLICATION OF MATRICES 135

Therefore, we have
   
−1 8 " # 49
 7 8 −3
AB =  2 −3  = .
  
7 2 1
4 1

Now move to the (1, 2)-entry. To calculate this, take the first row of "A, rotate
# it 90 degrees clockwise,
" # and
−1 8
lay it on top of the second column of B. You should be imagining laying on top of . Now
8 2
proceed the same way as before: multiply the numbers sitting on top of one another and add down the
vector. You should be calculating
(−1)(8) + 8(2) = −8 + 16 = 8.

This yields,

   
−1 8 " # 49 8
 7 8 −3
AB =  2 −3  = .
  
7 2 1
4 1

Repeat this process to get the (1, 3)-entry. Doing so, you’ll find

   
−1 8 " # 49 8 11
 7 8 −3
AB =  2 −3  = .
  
7 2 1
4 1

Now move to the (2, 1)−entry. To calculate this, take the second row" of A,#rotate it 90 degrees
" clockwise,
#
2 7
and lay it on top of the first column of B. You should be imagining laying on top of . You
−3 7
know what to do now: multiply the numbers sitting on top of one another and add down the vector. This
yields,
2(7) + (−3)(7) = 14 − 21 = −7,

and so
   
−1 8 " # 49 8 11
 7 8 −3
AB =  2 −3  =  −7 .
  
7 2 1
4 1

Now iterate this process to fill in the rest of AB. Each time you move down a row in AB, you move down
a row in A. In general, the (i, j)-entry of AB by taking the ith row of A, rotating it 90 degrees clockwise,
laying it on top of the jth column of B, multiplying the numbers sitting on top of one another, and adding
down the vector. Finish this process yourself to calculate the final matrix product:

   
−1 8 " # 49 8 11
 7 8 −3
AB =  2 −3  =  −7 10 −9  . ♦
  
7 2 1
4 1 35 34 −11
136 CHAPTER 3. ALGEBRA OF MATRICES

Example 3.1.8
 
  2 1 1
−1 2 1 3  −8 7 0 
Let A =  4 0 −8 −9  and B =  . The product BA is a 4 × 4 matrix. Use
   
 0 −2 −5 
−9 10 5 6
1 1 1
the shortcut described in the previous example to calculate the (1, 2), (2, 1), (4, 3), and (3, 2) entries
of BA.

Solution. We calculate the (1, 2)-entry by taking the first row of B, rotating it 90 degrees clockwise, laying
it on top of the second column of A, multiplyingthe numbers sitting ontop of  one another, and adding
2 2
down the vector. Therefore, you should imagines  1  sitting on top of  0  and then calculate
   

1 10

(2)(2) + (1)(0) + (10)(1) = 14.

This is the (1, 2)-entry. For the (2, 1)-entry, lay the second row of B on the first column of A and repeat.
This yields,
(−8)(−1) + (7)(4) + (0)(−9) = 36.
The (4, 3)-entry is calculated by laying the fourth row of B on top of the third column of A to get,

(1)(1) + (1)(−8) + (1)(5) = −2.

Finally, the (3, 2)-entry is calculated by laying the third row of B on the second column of A to get,

(0)(2) + (−2)(0) + (−5)(10) = −50.

As an exercise, calculate the entire product. The final answer is,
   
2 1 1   −7 14 −1 3
 −8 7 0  −1 2 1 3  36 −16 −64 −87 
BA =   4 0 −8 −9  =  . ♦
    
 0 −2 −5   37 −50 −9 −12 
−9 10 5 6
1 1 1 −6 12 −2 0

The formula for calculating entries in a matrix product as specified by the shortcut above is easy to prove.
I leave this as an exercise to the reader.

Theorem 3.1.2

Let A be n × k and suppose that B is a matrix with k rows, so that AB is defined. Let (AB)ij denote
the (i, j)-entry of the matrix AB. Then
k
X
(AB)ij = ai1 b1j + ai2 b2j + . . . + aik bkj = ai` b`j .
`=1
3.1. ADDITION, SUBTRACTION, AND SCALAR MULTIPLICATION OF MATRICES 137

Exercise
Prove Theorem 3.1.2.

Matrix multiplication satisfies a number of properties we would expect multiplication to satisfy.

Theorem 3.1.3: Matrix Multiplication Properties

Let A be n × k and let B and C be two matrices that have sizes for which all of the following
multiplications are defined. Then,

1. (AB)C = A(BC) (associativity of multiplication);

2. A(B + C) = AB + AC (left distribution law);

3. (A + B)C = AC + BC (right distribution law);

4. r(AB) = (rA)B = A(rB) for any scalar r ∈ R;

5. In A = AIk = A (identity for matrix multiplication).

Of particular note, pay attention to property 5. If A is n × n, this says that AIn = In A = A. This means
that the identity matrix functions the same way for square matrices that 1 does for real numbers.

Proof. We only prove a couple of these and the rest are left as exercises for the reader.

Proof of 1. First let A be n × k, B be k × m, and let ~v ∈ Rm . We first show that (AB)~v = A(B~v ). Start by
writing,  
v1
 v2 
h i  
~ ~ ~
B = b1 b2 . . . bm , ~v =  .  .

 .. 
vm
By definition of matrix vector multiplication,

B~v = v1~b1 + v2~b2 + . . . + vm~bm .

So,

A(B~v ) = A(v1~b1 + v2~b2 + . . . + vm~bm )

= v1 (A~b1 ) + v2 (A~b2 ) + . . . + vm (A~bm ) by Matrix Vector Multiplication Properties

 
v1
i v2
h 
(A~b1 ) (A~b2 ) . . . (A~bm ) 
 
=  ..  by definition of matrix vector multiplication

 . 
vm
138 CHAPTER 3. ALGEBRA OF MATRICES

= (AB)~v by definition of matrix multiplication

Therefore, (AB)~v = A(B~v ) for all vectors ~v ∈ Rm .

Now, pick any matrix C that is m × `. Write

h i
B = ~b1 ~b2 . . . ~bm , and C = [ ~c1 ~c2 . . . ~c` ] .

Then, (AB) is an n × m matrix and,

(AB)C = [ (AB)~c1 (AB)~c2 . . . (AB)~c` ] by definition of matrix multiplication,

= [ A(B~c1 ) A(B~c2 ) . . . A(B~c` ) ] from what we just proved

= A([ (B~c1 ) (B~c2 ) . . . (B~c` ) ]) by definition of matrix multiplication,

= A(BC) by definition of matrix multiplication.

Proof of 2. Let A be n × k, and suppose B and C are k × m. Write

h i
B = ~b1 ~b2 . . . ~bm , and C = [ ~c1 ~c2 . . . ~cm ] .

Then, from the definition of matrix addition,

h i
B + C = (~b1 + ~c1 ) (~b2 + ~c2 ) . . . (~bm + ~cm ) .

The definition of matrix multiplication gives

h i
A(B + C) = A(~b1 + ~c1 ) A(~b2 + ~c2 ) . . . A(~bm + ~cm )

h i
= (A~b1 + A~c1 ) (A~b2 + A~c2 ) . . . (A~bm + A~cm ) by Matrix Vector Multiplication Properties,

h i
= A~b1 A~b2 . . . A~bm + [ A~c1 A~c2 . . . A~cm ] by definition of matrix addition,

= AB + AC by definition of matrix multiplication.

Warning!
Here are 3 properties that hold for multiplication of real numbers but do not hold for matrix multi-
plication.
1) AB is NOT generally equal to BA. In fact, if AB is defined, BA generally isn’t.

2) If AB = AC, then it is NOT always the case that B = C.

3) If AB = 0, then it is NOT always the case that one of A or B is the zero matrix.
3.1. ADDITION, SUBTRACTION, AND SCALAR MULTIPLICATION OF MATRICES 139

Do not fall into the trap of thinking these hold for matrices. This is a very common mistake.

Example 3.1.9
" # " #
1 0 0 0
Let A = and B = . Then, AB = 0 but neither A nor B is a zero matrix.
0 0 1 0

Example 3.1.10
" # " #
1 2 2 1
Let A = and B = . Then,
0 2 1 0
" # " #
4 1 2 6
AB = , and BA = ,
2 0 1 2

so AB 6= BA.

Example 3.1.11
" # " # " #
1 1 2 0 5 1
Let A = ,B = ,C = . Then,
0 0 3 1 0 0
" #
5 1
AB = = AC
0 0

but B 6= C.

Matrix multiplication is defined in a bit of a strange way. We give some justification for why this definition
is chosen. Let F : Rk → Rn and G : Rm → Rk be linear transformations with standard matrices A and B
respectively; note that A is n × k and G is k × m. Consider the composition transformation,

F ◦ G : Rm → Rn defined by (F ◦ G)(~v ) = F (G(~v )) for all ~v ∈ Rm .

It is a fact that F ◦ G is a linear transformation (try to prove this!). Therefore, there exists an m × n matrix
C such that
(F ◦ G)(~v ) = F (G(~v )) = C~v for all ~v ∈ Rm .
Then,
C~v = (F ◦ G)(~v ) = F (G(~v )) = F (B~v ) = A(B~v ) = (AB)~v ,
where the last equality follows by part 1 of Matrix Multiplication Properties. Since this holds for all vectors
~v ∈ Rm , this shows that C = AB; that is, AB is the standard matrix for F ◦ G, which shows that the
standard matrix of a composition of functions is calculated by taking the product of the standard matrices.
This is one of the reasons why matrix multiplication is defined in the way it is.
140 CHAPTER 3. ALGEBRA OF MATRICES

Exercise

Let F : Rk → Rn and G : Rm → Rk be linear transformations. Show that the composition F ◦ G :

Rm → Rn is a linear transformation.

Note
There is a definition of matrix multiplication wherein matrix products are calculate using component-
wise multiplication. This is called the Hadamard product. Unfortunately, it’s not a great definition
for matrix multiplication. For example, we lose the ability to calculate standard matrices of compo-
sitions by multiplication if we use this product. You can read more about this on Wikipedia if you
like, the link is here: Hadamard Product.
3.2. MATRIX INVERSES 141

3.2 Matrix Inverses

We have defined matrix addition, subtraction, and multiplication. Division requires a little bit more care.
In fact, we don’t divide matrices, we multiply by inverses. This is not dissimilar to real numbers. Dividing
1
a by b is really just multiplying a by b−1 = . This is a real number inverse, a matrix inverse is defined as
b
follows.

Definition 3.2.1: Matrix Inverse

An n × n matrix A is called invertible if there exists an n × n matrix C such that AC = CA = In .
In this case, C is called an inverse of A.

This definition of inverses of matrices lines up with how real number inverses are defined. If r ∈ R is non-zero,
1
then r−1 = is defined as the unique number rr−1 = 1. Since In is the matrix analogue of 1, we see that
r
the two definitions line up with one another.

Note
Matrix inverses are only defined for square matrices. If a matrix is not square, it does not have an
inverse in the way I just defined them. It may have a left inverse, a right inverse, or none at all, but
these are beyond the scope of this course.

Example 3.2.1
" # " #
3 2 1/4 1/4
Let A = . Then, an inverse of A is B = because,
1 −2 1/8 −3/8
" #" # " # " #
3 2 1/4 1/4 3/4 + 1/4 3/4 − 6/8 1 0
AB = = = = In
1 −2 1/8 −3/8 1/4 − 2/8 1/4 + 6/8 0 1

and
" #" # " # " #
1/4 1/4 3 2 3/4 + 1/4 2/4 − 2/4 1 0
BA = = = = In .
1/8 −3/8 1 −2 3/8 − 3/8 2/8 + 6/8 0 1

Matrix inverses are unique as the next theorem shows.

Theorem 3.2.1
An invertible n × n matrix A has exactly one inverse.

Proof. Let B and C be two inverses for A. By definition,

AB = BA = In and AC = CA = In .
142 CHAPTER 3. ALGEBRA OF MATRICES

Consider AB = In . Multiplying on the left by C yields

C(AB) = C(In ) =⇒ (CA)B = (CIn ) =⇒ In B = C =⇒ B = C.

Thus, A has exactly one inverse.

Since matrix inverses are unique, we denote the inverse of an invertible A by A−1 . Some matrices do not
have inverses. Such matrices ae called non-invertible.

The following theorem gives some properties of matrix inverses.

Theorem 3.2.2: Matrix Inverse Properties

Let A and B be two n × n invertible matrices.

1. (A−1 )−1 = A

2. In−1 = In

3. (AB)−1 = B −1 A−1

Proof.

1. By definition of inverses, AA−1 = A−1 A = In . Thus, the definition of matrix inverses with C = A and
A = A−1 implies that A is the inverse of A−1 . That is, (A−1 )−1 = A.

2. In A = AIn = A for all n × n matrices A. Therefore, In In = In so that In = In−1 by definition of

inverses.

3. Since A and B are both invertible, their inverses B −1 and A−1 exist and both are n × n. Thus, the
product B −1 A−1 is defined. We then have,

(AB)(B −1 A−1 ) = A(BB −1 )A−1 = AIn A−1 = AA−1 = In ,

| {z }
by associativity

(B −1 A−1 )(AB) = B −1 (A−1 A)B = B −1 In B = B −1 B = In .

| {z }
by associativity
Thus, (AB)−1 = B −1 A−1 by definition.

Property 3 of Matrix Inverse Properties shows that a product of invertible matrices is invertible. This
property generalizes to any finite product of matrices. Suppose A1 , A2 , . . . , Am are invertible n × n matrices.
Then, the product A1 A2 . . . Am−1 Am is an invertible n × n matrix and its inverse is

−1 −1 −1
(A1 A2 . . . Am−1 Am )−1 = A−1
m Am−1 . . . A2 A1 .

This is easily proved using induction. Give it a try!

3.2. MATRIX INVERSES 143

3.2.1 Calculating Inverses

In this section, we study a method for calculating the inverse of a matrix. There are two cases. For 3 × 3 or
bigger matrices, there is an algorithm that uses Gauss-Jordan Elimination. For 2 × 2 matrices, there is an
easy formula.

Theorem 3.2.3: Inverse of 2 × 2 Matrix

" #
a b
Let A = . Then, A is invertible if and only if ad − bc 6= 0. In the case that A is invertible,
c d
A−1 is given by " #
−1 1 d −b
A = .
ad − bc −c a

Example 3.2.2
" #
3 2
Let A = . Use the formula to calculate the inverse of A.
1 −2

Solution. Here, a = 3, b = 2, c = 1, d = −2. Notice,

ad − bc = 3(−2) − 2(1) = −8 6= 0,

therefore by Inverse of 2 × 2 Matrix,

" # " # " #
−1 1 d −b 1 −2 −2 1/4 1/4
A = =− = .
ad − bc −c a 8 −1 3 1/8 −3/8

This is exactly the matrix we showed was the inverse for A in Example 3.2.1. ♦

Example 3.2.3
" #
2 6
Let A = . Find the inverse for A and check that it satisfies the inverse relations AB =
−1 7
BA = I2 .

Solution. Here, a = 2, b = 7, c = −1, d = 7. Then,

ad − bc = 2(7) − 6(−1) = 20,

so, by Inverse of 2 × 2 Matrix, " #

−1 1 7 −6
A = .
20 1 2
144 CHAPTER 3. ALGEBRA OF MATRICES

Then, " #" # " # " #

−1 1 2 6 7 −6 14 + 6 −12 + 12 1 20 0
AA = = = = I2 ,
20 −1 7 1 2 −7 + 7 6 + 14 20 0 20
" #" # " #
1 7 −6 2 6 1 20 0
A−1 A = = = I2 . ♦
20 1 2 −1 7 20 0 20

Proof. First suppose that A is invertible so A−1 exists. Write

" #
−1 p r
A = .
s t

By definition of matrix inverses,

" #" # " # " # " #
a b p r 1 0 ap + bs ar + bt 1 0
AA−1 = In =⇒ = =⇒ = .
c d s t 0 1 cp + ds cr + dt 0 1
Equating the entries of the matrices yields the following system of equations,

ap + bs = 1, ar + bt = 0, cp + ds = 0, cr + dt = 1.

Subtracting a times the third equation from c times the first yields

cbs − ads = c ⇒ s(bc − ad) = c.

There are now two cases.

Case 1. c 6= 0. Then, s(bc − ad) 6= 0 which implies bc − ad 6= 0 and we are done.

Case 2. c = 0. Then the third and fourth equations of the system become ds = 0, dt = 1. Since dt = 1, d 6= 0.
It then follows from the third equation that s = 0. Thus, c = s = 0 and the system reduces to

ap = 1, ar + bt = 0, dt = 1.

d 6= 0 and the equation ap = 1 implies a 6= 0, so ad − bc = ad − 0 = ad 6= 0. Thus, ad − bc 6= 0.

Therefore, in either case ad − bc 6= 0.

Conversely, suppose ad − bc 6= 0. Then, the matrix

" #
1 d −b
B=
ad − bc −c a

is defined and it is readily checked that

AB = BA = I2 .

I leave the details to the reader. Therefore, A is invertible.

3.2. MATRIX INVERSES 145

Exercise
Fill in the details of the proof of Inverse of 2 × 2 Matrix.

If A is n×n and n ≥ 3, there are no nice formulas for the inverse of a matrix. There is, however, an algorithm
that determines if a given matrix is invertible and, if it is, outputs its inverse.

Algorithm 2: The Matrix Inverse Algorithm

Let A be an n × n matrix. To calculate A−1 , do the following:

Step 1. Form the matrix A0 = [ A | In ] .
Step 2. Perform the same series of elementary row operations on A0 that would put A into RREF.
Step 3. If the RREF of A is In , then A0 will be transformed into In | A−1 . If the RREF of A is

not In , then A is not invertible.

We’ll see the proof of this algorithm in the section on elementary matrices. For now, we do some examples.
Example 3.2.4
 
2 1 −1
Find the inverse for A =  0 2 −1  if it exists.
 

1 0 0

Solution. We use The Matrix Inverse Algorithm.

Step 1. Form the matrix A0 = [ A | I3 ] . It is

 
2 1 −1 1 0 0
0
A = 0 2 −1 0 1 0 .
 

1 0 0 0 0 1

Step 2. Perform the same row operations on [ A | I3 ] that would transform A into RREF. We use Gauss-
Jordan Elimination to do this.
 
2 1 −1 1 0 0
0
A = 0 2 −1 0 1 0 
 

1 0 0 0 0 1

 
1 0 0 0 0 1
∼  0 2 −1 0 1 0 
 
R1 ⇐⇒ R3
2 1 −1 1 0 0
146 CHAPTER 3. ALGEBRA OF MATRICES
 
1 0 0 0 0 1
∼  0 2 −1 0 1 0 
 
R3 ⇒R3 −2R1
0 1 −1 1 0 −2

 
1 0 0 0 0 1
∼  0 1 −1/2 0 1/2 0 
 
R2 ⇒(1/2)R2
0 1 −1 1 0 −2

 
1 0 0 0 0 1
∼  0 1 −1/2 0 1/2 0 
 
R3 ⇒R3 −R2
0 0 −1/2 1 −1/2 −2

 
1 0 0 0 0 1
∼  0 1 −1/2 0 1/2 0 
 
R3 ⇒−2R3
0 0 1 −2 1 4

 
1 0 0 0 0 1
∼  0 1 0 −1 1 2  .
 
R2 ⇒R2 +(1/2)R3
0 0 1 −2 1 4

I3 | A−1 . Thus,

Step 3. Since A ∼ In , The Matrix Inverse Algorithm implies [ A | I3 ] ∼

 
0 0 1
A−1 =  −1 1 2 . ♦
 

−2 1 4

Note

If you’re not sure that you’ve done this correctly, multiply AA−1 out to check that you get In .

Example 3.2.5
 
−1 0 2
Find the inverse of the matrix A =  1 5 3  if it exists.
 

1 −3 −5

Solution. We use The Matrix Inverse Algorithm.

3.2. MATRIX INVERSES 147

Step 1. Form the matrix A0 = [ A | I3 ] . It is

 
−1 0 2 1 0 0
A0 =  1 5 3 0 1 0 .
 

1 −3 −5 0 0 1

Step 2. Apply the row operations on A0 that would put A into RREF. The first three in order are

R1 ⇒ −R1 , R2 ⇒ R2 − R1 , R3 ⇒ R3 − R1 .

Applying these gives

 
1 0 −2 −1 0 0
 0 5 5 1 1 0 .
 

0 −3 −3 1 0 1

The second row is a scalar multiple of the third. Thus, if we do the next row operation R3 ⇒ R3 + (5/3)R2 ,
we get a row of zeroes. Thus, A is not row equivalent to I3 and so The Matrix Inverse Algorithm implies A
is not invertible. ♦

Example 3.2.6
 
2 0 1 4
 1 7 1 0 
Find the inverse of A =   if it exists.
 
 −1 0 −2 3 
8 6 5 8

Solution. We use The Matrix Inverse Algorithm.

Step 1. Form the matrix A0 = [ A | I4 ]:

 
2 0 1 4 1 0 0 0
 1 7 1 0 0 1 0 0 
.
 
−1 0 −2 3

 0 0 1 0 
8 6 5 8 0 0 0 1

Step 2. Apply the row operations on [ A | I4 ] that would put A in RREF. There are 16 row operations to
perform. They are,

R2 ⇐⇒ R1 ; R2 ⇒ R2 − 2R1 ; R3 ⇒ R3 + R1 ; R4 ⇒ R4 − 8R1 ; R2 ⇐⇒ R3 ;

R3 ⇒ R3 + 2R2 ; R4 ⇒ R4 + (50/7)R2 ; R4 ⇒ R4 − (71/21)R3 ; R4 ⇒ −(21/92)R4 ; R3 ⇒ R3 − 10R4 ;

R2 ⇒ R2 − 3R4 ; R3 ⇒ −(1/3)R3 ; R2 ⇒ R2 + R3 ; R1 ⇒ R1 − R3 ; R2 ⇒ (1/7)R2 ;

R1 ⇒ R1 − 7R2
148 CHAPTER 3. ALGEBRA OF MATRICES

Applying these to A0 yields,

 
1 0 0 0 −199/92 −33/46 15/23 77/92
 0 1 0 0 −1/92 7/46 1/23 −1/92 
.
 
15/23 −22/23 −35/46

 0 0 1 0 103/46 
0 0 0 1 71/92 9/46 −2/23 −21/92

Step 3. Since the RREF of A is I4 , we have

   
−199/92 −33/46 15/23 77/92 −199 −66 60 77
 −1/92 7/46 1/23 −1/92  1  −1 14 4 −1 
A−1 =  = . ♦
   
15/23 −22/23 −35/46  92  60 −88 −70

 103/46 206 
71/92 9/46 −2/23 −21/92 71 18 −8 −21

3.2.2 The Invertible Matrix Theorem

In this section, we introduce the first version of the Invertible Matrix Theorem. This theorem is a big list of
conditions that are equivalent to a matrix A being invertible. First, we need a small result.

Theorem 3.2.4

Let A be an n × n invertible matrix. Then, for each ~b ∈ Rn , the equation A~x = ~b has exactly one
solution.

Theorem 3.2.4 provides another method for finding a solution to a linear system. Determine if the coefficient
matrix A is invertible. If it is, then A−1~b is the only solution to the linear system.

Example 3.2.7

Use Theorem 3.2.4 to find the solution to the linear system

3x1 − x2 = 5
−4x1 + x2 = 8.

" # " #
3 −1 5
Solution. Let A = and ~b = . The solution to the linear system is the same as the
−4 1 8
solution to the matrix equation A~x = ~b. Since 3(1) − (−4)(−1) = −1 6= 0, Inverse of 2 × 2 Matrix implies A
is invertible. Its inverse is " # " #
−1 1 1 1 −1 −1
A = = .
−1 4 3 −4 −3

Hence, Theorem 3.2.4 implies the solution to A~x = ~b is

" #" # " #
−1~ −1 −1 5 −13
A b= = ,
−4 −3 8 −44
3.2. MATRIX INVERSES 149

that is, (x1 , x2 ) = (−13, −44) is the solution to the linear system. ♦

Proof. Since A is invertible, A−1 exists, and the product A−1~b ∈ Rn . Substitute ~x = A−1~b into the matrix
equation A~x = ~b. Then,
A~x = A(A−1~b) = (AA−1 )~b = In~b = ~b.
Thus, A~x = ~b has solution ~x = A−1~b.

We now show A−1~b is the only solution. Let ~c ∈ Rn be any other solution to the matrix equation. Then
A~c = ~b, and multiplying both sides by A−1 yields

A−1 (A~c) = A−1~b =⇒ (A−1 A)~c = A−1~b =⇒ ~c = A−1~b.

This shows A−1~b is the only solution to A~x = ~b.

We now give the first version of the Invertible Matrix Theorem. This is one of the most important results
in this course.

Theorem 3.2.5: The Invertible Matrix Theorem

Let A be an n × n matrix. Let FA : Rn → Rn be the linear transformation whose standard matrix is
A. Let B be an echelon form of A. The following are equivalent.

1. A is invertible.

2. The RREF of A is In .

3. B has n pivot positions.

4. The homogeneous equation A~x = ~0 has only the trivial solution.

5. The columns of A are linearly independent.

6. FA is one-to-one.

7. The equation A~x = ~b has exactly one solution for each ~b ∈ Rn .

8. The columns of A span Rn .

9. FA is onto.

10. There is an n × n matrix C such that AC = In .

11. There is an n × n matrix D such that DA = In .

Note
Conditions 4 - 6 is the The Linear Independence Theorem Version and Conditions 7 - 9 is The Span
Theorem.
150 CHAPTER 3. ALGEBRA OF MATRICES

Example 3.2.8
 
  −1 2 −6 5
7 −1 8  0 1 −1 2 
Let A =  0 1 1  and B =   . Determine if A or B are invertible.
   
 0 2 −2 4 
1 −1 0
2 3 5 4

Solution. Row reduce A and B to echelon form:

 
  −1 2 −6 5
1 −1 0  0 1 −1 2 
A∼ 0 1 1 , and B ∼  .
   
 0 0 0 0 
0 0 2
0 0 0 0

B does not have a pivot in every column but A does. By The Invertible Matrix Theorem, B is not invertible
and A is. ♦

Proof. Many of these are straight forward and/or we have already seen their proofs. We will go through
this in whole for completeness and refer to other theorems if we have already proved the given statement.

1 =⇒ 2: Suppose A is invertible. By Theorem 3.2.4, the homogeneous equation A~x = ~0 has only the trivial
solution. By The Linear Independence Theorem Version, A has a pivot in every column. Since A is square
it must have a pivot in every row as well. Therefore, the only possible RREF is In .

2 =⇒ 3: Suppose A ∼ In . Then In is the RREF of A and has n pivot positions. Since pivot positions are
invariant among echelon forms, it follows that B has n pivot positions.

3 =⇒ 4: Suppose B has n pivot positions. Since A is n × n, this implies B has a pivot in every column. By
The Linear Independence Theorem Version, the homogeneous equation A~x = ~0 has only the trivial solution.

4 =⇒ 5: Suppose the homogeneous equation A~x = ~0 has only the trivial solution. By The Linear Indepen-
dence Theorem Version, the columns of A are linearly independent.

5 =⇒ 6: Suppose the columns of A are linearly independent. Then, by The Linear Independence Theorem
Version, the linear transformation FA is one-to-one.

6 =⇒ 7: Suppose FA is one-to-one. Then, since the domain and codomain of F have the same dimension,
Corollary 2.7.1 implies FA is onto. Therefore, A~x = ~b has at least one solution for all ~b ∈ Rn by The Span
Theorem. Since F is assumed one-to-one, the matrix equation has exactly one solution for all ~b ∈ Rn .

7 =⇒ 8 : Suppose A~x = ~b has exactly one solution for all ~b ∈ Rn . Then the columns of A span Rn by the
definition of spanning.
3.2. MATRIX INVERSES 151

8 =⇒ 9: Suppose the columns of A span Rn . Then FA is onto by The Span Theorem.

9 =⇒ 10: Suppose FA is onto. Let {~e1 , ~e2 , . . . , ~en } be the standard basis for Rn . Since FA is onto, there
exist vectors ~c1 , ~c2 , . . . , ~cn such that FA (~ci ) = ~ei for each i = 1, 2, . . . , n. Therefore, A~ci = ~ei for each
i = 1, 2, . . . , n. Let C = [ ~c1 ~c2 . . . ~cn ] . Then C is n × n and, by definition of matrix multiplication, we
have
AC = A [ ~c1 ~c2 . . . ~cn ] = [ A~c1 A~c2 . . . A~cn ] = [ ~e1 ~e2 . . . ~en ] = In .

10 =⇒ 11 : Suppose there exists an n × n matrix C such that AC = In . We show that CA = In as well.

To this end, let ~u be any solution to the homogeneous equation C~x = ~0. Then,

C~u = ~0 =⇒ A(C~u) = A~0 =⇒ (AC)~u = ~0 =⇒ ~u = ~0.

This shows C~x = ~0 has only the trivial solution. By The Linear Independence Theorem Version, the lin-
ear transformation FC : Rn → Rn with standard matrix C is one-to-one, and since C is square Corollary
2.7.1 implies FC is onto as well. Therefore, there exists ~vi ∈ Rn such that FC (~vi ) = C~vi = ~ei , for each
i = 1, 2, . . . , n.

Consider the equation AC = In . Multiplying both sides on the left by C gives

CAC = C.

Subtracting C from both sides and using distributivity of matrix multiplication yields

(CA − In )C = [0]. (3.1)

h i
Let D = CA − In , and write D = d~1 d~2 . . . d~n . Multiplying both sides of Equation (3.1) on the right
by each of the ~vi0 s above gives

((CA − In )C)~vi = [0]~vi =⇒ D(C~vi ) = ~0 =⇒ D~ei = ~0 =⇒ d~i = ~0

for each i = 1, 2, . . . , n. Thus, every column of D is equal to the zero vector, which implies D = [0]. Hence,

D = [0] =⇒ CA − In = 0 =⇒ CA = In ,

which completes the proof.

11 =⇒ 1: Suppose there is an n × n matrix such that DA = In . With the roles of A and D reversed, the
same proof as in 10 =⇒ 11 shows that AD = In . Therefore, there exists an n × n matrix D such that
AD = DA = In . By definition, A is invertible with inverse D.

Warning!
The Invertible Matrix Theorem only works for square matrices. Be careful not to apply this to
matrices that are not square.
152 CHAPTER 3. ALGEBRA OF MATRICES

This theorem ties everything for n × n matrices. It is a really beautiful theorem. It states that The Span
Theorem and The Linear Independence Theorem Version are equivalent in the context of n × n matrices.
This is really cool because, taken at face value, the two concepts these theorems deal with have nothing to
do with one another. Moreover, condition 10 of The Invertible Matrix Theorem implies we need only check
that there exists an n × n matrix B such that AB = In to conclude invertibility of A and, in this case,
A−1 = B.

Every matrix defines a linear transformation. Invertible matrices define really useful linear transformations.

Definition 3.2.2: Invertible Linear Transformations

A linear transformation F : Rk → Rn is called invertible if there exists a transformation G : Rn → Rk

such that
(F ◦ G)(~v ) = F (G(~v )) = ~v for all ~v ∈ Rn ,

and
(G ◦ F )(~u) = G(F (~u)) = ~u for all ~u ∈ Rk .

G is called the inverse of F .

Exercise
Suppose F is an invertible linear transformation with inverse G. Prove that G is an invertible linear
transformation.

It is not true that every linear transformation is invertible. In fact, a linear transformation F is invertible
if and only if it is one-to-one and onto. This is a fact that is true for any function, not just linear transfor-
mations. Therefore, by The Invertible Matrix Theorem, F is invertible if and only if its standard matrix is
invertible. From this, we might suspect the standard matrix of the inverse of F is the inverse of the standard
matrix of F . This is exactly right.

Theorem 3.2.6

Let F : Rk → Rn be a linear transformation with standard matrix A. Then, F is invertible if and

only if A is invertible. In this case, the linear transformation G : Rn → Rk with standard matrix A−1
is the inverse of F and, furthermore, G is unique.

Proof. First we prove that F : Rk → Rn is invertible if and only if F is one-to-one and onto.

Suppose F is invertible. Let G : Rn → Rk be an inverse of F . To show F is one-to-one, let w ~ 2 ∈ Rk be

~ 1, w
such that F (w ~ 2 ). Since both of these vectors are in Rn , applying G to both sides gives
~ 1 ) = F (w

G(F (w
~ 1 )) = G(F (w
~ 2 )) =⇒ w
~1 = w
~ 2.
3.2. MATRIX INVERSES 153

This shows F is one-to-one.

~ 3 ∈ Rn . Then, G(w
To show F is onto, let w ~ 3 ) ∈ Rk and so

F (G(w
~ 3 )) = w
~ 3.

This shows F is onto.

Now suppose F is both one-to-one and onto. Then, for every ~u ∈ Rn , there exists exactly one ~vu ∈ Rk such
that F (~vu ) = ~u. Define a transformation G : Rn → Rk by G(~u) = ~vu for each ~u ∈ Rn . Since F is one-to-one,
this function is well defined. Let ~v ∈ Rk . Then, F (~v ) ∈ Rn , and by definition of G, G(F (~v )) = ~v . Thus,
(G ◦ F )(~v ) = ~v for all ~v ∈ Rk . Secondly, for any ~u ∈ Rn , G(~u) = ~vu . By definition,

F (G(~u)) = F (~vu ) = ~u.

Thus, (F ◦ G)(~u) = ~u for all ~u ∈ Rn . Therefore, G is an inverse for F .

We’ve shown that F : Rk → Rn is an invertible linear transformation if and only if it is one-to-one and onto.
The only instance when a linear transformation can be both one-to-one and onto is if the dimensions of their
domain and codomain are the same. Therefore, F is invertible if and only if k = n. Therefore, its standard
matrix A is square. Furthermore, by The Invertible Matrix Theorem, A is invertible. Let G : Rn → Rn be
the linear transformation with standard matrix A−1 . Then,

(G ◦ F )(~u) = (A−1 A)~u = ~u, and (F ◦ G)(~u) = (AA−1 )~u = ~u for all ~u ∈ Rn .

Thus, G is an inverse for F .

Finally, to show G is unique, suppose H : Rn → Rn is another inverse for F with standard matrix B. Then,
for any ~v ∈ Rn ,
(H ◦ F )(~v ) = (BA)~v = ~v .

Therefore, applying H ◦ F to every element in the standard basis for Rn yields BA = In . Hence, by The
Invertible Matrix Theorem, B = A−1 and consequently, H = G.

Note
This theorem implies that any linear transformation F : Rn → Rn that is either one-to-one or onto
is automatically invertible.

3.2.3 Elementary Matrices

In this section, we introduce elementary matrices. These are special matrices that can be used to carry out
elementary row operations.
154 CHAPTER 3. ALGEBRA OF MATRICES

Definition 3.2.3: Elementary Matrices

An elementary matrix E is an n×n matrix that results from performing one elementary row operation
on In .

Example 3.2.9

The following are examples of 3 × 3 elementary matrices with the corresponding row operation per-
formed on the identity matrix.
   
1 0 0 1 0 0
E1 =  0 0 1  (R2 ⇐⇒ R3 ), E2 =  k 1 0  (R2 ⇒ R2 + kR1 ),
   

0 1 0 0 0 1
 
m 0 0
E3 =  0 1 0  (R1 ⇒ mR1 ), m 6= 0.
 

0 0 1
 
a b c
Let A =  d e f . Describe the multiplications E1 A, E2 A, E3 A, E1 E2 A, E3 E2 A, E1 E3 E2 A.
 

g h i

Solution. We calculate
     
a b c a b c ma mb mc
E1 A =  g h i , E2 A =  ka + d kb + e kc + f  , E3 A =  d e f ,
     

d e f g h i g h i

   
a b c ma mb mc
E1 E2 A =  g h i , E3 E2 A =  ka + d kb + e kc + f  ,
   

ka + d kb + e kc + f g h i

 
ma mb mc
E1 E3 E2 A =  g h i .
 

ka + d kb + e kc + f

Notice that multiplication by the elementary matrices E1 , E2 , E3 is equivalent to performing the correspond-
ing row operations on A. ♦

Example 3.2.9 gives reason to believe that multiplying a matrix A on the left by an elementary matrix E
produces a matrix that results from performing the row operation on A that was used to create E. This is
true in general.
3.2. MATRIX INVERSES 155

Fact 3.2.1
Let A be an n × k matrix. Then, performing an elementary row operation on A is equivalent to
performing that same row operation on In to get an elementary matrix En , and then performing the
matrix multiplication En A.

Exercise
Prove Fact 3.2.1.

A nice property of elementary matrices is that they are always invertible.

Theorem 3.2.7

Let E be an elementary matrix. Then, E is invertible and E −1 is an elementary matrix.

Proof. The RREF of E is In because E differs from In by a single row operation. Therefore, Part 2 of
The Invertible Matrix Theorem implies E is invertible. The row operation used to create E can be reversed
by performing another row operation. Let E 0 be the elementary matrix corresponding to the row operation
that reverses the one used to create E. Then, E 0 E = In , so that E 0 = E −1 by part 11 of The Invertible
Matrix Theorem.

Example 3.2.10

Find the inverses of E1 , E2 , E3 from Example 3.2.9.

Solution. The row operation performed on I3 to get E1 is swapping rows 2 and 3. To get this back to the
identity matrix, we perform the operation again. This means
 
1 0 0
E1−1 =  0 0 1  = E1 .
 

0 1 0
E2 is obtained by adding k times row 1 to row 2. To reverse this, subtract k times row 1 from row 2.
Therefore,  
1 0 0
E2−1 =  −k 1 0 .
 

0 0 1
Finally, E3 is obtained from I3 by multiplying row 1 by m 6= 0. To reverse this, multiply row by 1/m to get
 
1/m 0 0
E3−1 =  0 1 0 .
 

0 0 1
156 CHAPTER 3. ALGEBRA OF MATRICES

Check for yourself that these are the correct inverses. ♦

Elementary matrices facilitate proof of The Matrix Inverse Algorithm.

Proof of The Matrix Inverse Algorithm. Let A be an n × n matrix. If the RREF of A is not In , then A
is not invertible by The Invertible Matrix Theorem. If A is invertible, then A ∼ In by The Invertible Matrix
Theorem. Let E1 , . . . , Em be a sequence of elementary matrices that represent row operations needed to
transform A into In , where we start with the row operation represented by E1 and proceed in order until
we get to Em . Then, Fact 3.2.1 implies

Em Em−1 . . . E2 E1 A = In .

Multiply both sides on the right by A−1 to get

Em Em−1 . . . E1 In = A−1 .

It now follows from Fact 3.2.1 that performing the elementary row operations needed to transform A to In
on In itself exactly results in A−1 . This is exactly The Matrix Inverse Algorithm.
3.3. TRANSPOSE 157

3.3 Transpose
In this section, we introduce a new matrix operation called transpose. This is different than anything we
have for real numbers.

Definition 3.3.1: Transpose

Let A = [aij ] be an n × k matrix. The transpose of A, denoted AT , is the k × n matrix whose

columns are the rows of A; that is, AT = [aji ], j ∈ {1, 2, . . . , k} , i ∈ {1, 2, . . . , n} .

Example 3.3.1
 
  8 −1 7 1
2 1  0 0 8 1 
Let A =  3 −1  and B =   . Calculate AT and B T .
   
 1 2 1 −4 
1 4
0 −4 −8 0

Solution. From definition of transpose, we have

 
" # 8 0 1 0
2 3 1  −1 0 2 −4 
AT = , and B T =  . ♦
 
1 −1 4  7 8 1 −8 
1 1 −4 0

Here are some properties of transposes.

Theorem 3.3.1: Transpose Properties

Let A and B denote matrices that have appropriate sizes so that the following expressions are defined.
Then,

1. (AT )T = A

2. InT = In

3. (A + B)T = AT + B T

4. r(AT ) = (rA)T for all scalars r ∈ R

5. (AB)T = B T AT .

6. A is invertible if and only if AT is, and (AT )−1 = (A−1 )T .

Proof. The first four properties are straightforward. We only prove one of them and leave the other three
as exercises for the reader. The last two are more involved so we prove them.
158 CHAPTER 3. ALGEBRA OF MATRICES

Proof of 2. Let A and B be n × k and write A = [aij ], B = [bij ]. Then,

(A + B)T = ([aij ] + [bij ])T = ([aij + bij ])T = [aji + bji ] = [aji ] + [bji ] = AT + B T .

Proof of 5. In order to prove this, we show that each entry of (AB)T is the same as that of B T AT . Let A
be n × k and B be k × m, so that AB is n × m. From Theorem 3.1.2,
k
X
(AB)ij = ai` b`j .
`=1

(AB)Tij = (AB)ji by definition of the transpose, so

k
X
(AB)Tij = (AB)ji = aj` b`i .
`=1

Now, B T is m × k and AT is k × n. Therefore, by Theorem 3.1.2,

k
X
(B T AT )ij = b̂i` â`j
`=1

where B T = [b̂ij ], AT = [âij ] with b̂ij = bji and âij = aji . Therefore,
k
X k
X k
X
T T
(B A )ij = b̂i` â`j = b`i aj` = aj` b`i = (AB)Tij .
`=1 `=1 `=1

Thus, the (i, j)-entry of (AB) is the same as the (i, j)-entry of B T AT for each 1 ≤ i ≤ n and 1 ≤ j ≤ m.
T

This shows (AB)T = B T AT .

Proof of 6. Suppose A is invertible. Apply the transpose on both sides of AA−1 = In and use parts 2 and 5
of this theorem to get
(AA−1 )T = InT =⇒ (A−1 )T AT = In .
Therefore, AT is invertible and (A−1 )T = (AT )−1 by parts 11 and 12 of The Invertible Matrix Theorem.
Conversely, if AT is invertible, then AT (AT )−1 = In . Taking transposes of both sides again gives

(AT (AT )−1 )T = InT =⇒ ((AT )−1 )T A = In .

Therefore, A is invertible by parts 11 and 12 of The Invertible Matrix Theorem.

Part 6 of Transpose Properties implies that A being invertible is equivalent to AT being invertible. We
update the Invertible Matrix Theorem with this condition

Theorem 3.3.2: The Invertible Matrix Theorem

Let A be an n × n matrix. Let FA : Rn → Rn be the linear transformation whose standard matrix is
A. Let B be an echelon form of A. The following are equivalent.

1. A is invertible.
3.3. TRANSPOSE 159

2. The RREF of A is In .

3. B has n pivot positions.

4. The homogeneous equation A~x = ~0 has only the trivial solution.

5. The columns of A are linearly independent.

6. FA is one-to-one.

7. The equation A~x = ~b has exactly one solution for each ~b ∈ Rn .

8. The columns of A span Rn .

9. FA is onto.

10. There is an n × n matrix C such that AC = In .

11. There is an n × n matrix D such that DA = In .

12. AT is invertible.

Similar to products of inverses, part 5 of Transpose Properties generalizes to any finite product of matrices
(as long as the product is defined). Indeed, let A1 , A2 , . . . , Am−1 , Am be matrices of appropriate size so that
the product A1 A2 . . . Am−1 Am is defined. Then,

(A1 A2 . . . Am−1 Am )T = ATm ATm−1 . . . AT2 AT1 .

This is easily proved using induction.

160 CHAPTER 3. ALGEBRA OF MATRICES

3.4 Solving Algebraic Equations Involving Matrices

In this section, we see some examples of how to solve algebraic equations involving matrices. Some of this is
similar to solving algebraic equations involving numbers, but there are some key differences and we have to
be a little more careful with matrices.

Example 3.4.1

Let A, B, C, and X be n × n matrices with A and X invertible. Solve the following equation for X

BX = A + CX. (3.2)

Solution. If we were given this question with real numbers instead of matrices, you would solve for X by
isolating for the variable. The process is the same in this case, except there are two key differences we have
to be careful of:

1. We don’t know that all of the matrices involved are invertible. Therefore, if we need to invert a matrix
we must ensure it is invertible first.

2. Matrix multiplication is not commutative. This means that we have to keep track of which side
multiplication is occurring. In particular, if we wish to take out a common matrix from an expression,
we have to make sure that term is on the same side of all quantities we are factoring it out of.

The first step we take in isolating for X is to move all the X’s to one side of the equation. We can do this
by subtracting CX from both sides of Equation (3.2):

BX = A + CX =⇒ BX − CX = A. (3.3)

Both B and C are being multiplied by X on the right. Therefore, we can factor it out.

BX − CX = A =⇒ (B − C)X = A. (3.4)

If this were real numbers, we would divide by B − C to solve for X. However, we are working with matrices,
so we can’t divide. We can, however, multiply by the inverse of B − C if it exists. Therefore, we must show
that B − C is invertible.

We are given that B and C are invertible but, in general, sums of invertible matrices are not invertible, so
we can’t immediately conclude B − C is invertible. To show B − C is invertible, recall that X is invertible.
Therefore, we can multiply both sides of Equation (3.4) on the right by X −1 . This yields,

(B − C)X X −1 = AX −1 =⇒ (B − C)(XX −1 ) = AX −1 =⇒ B − C = AX −1 .

(3.5)

This shows B − C is a product of two invertible matrices, A and X −1 . Therefore, B − C is invertible by

part 3 of Matrix Inverse Properties; its inverse is (AX −1 )−1 = XA−1 .
3.4. SOLVING ALGEBRAIC EQUATIONS INVOLVING MATRICES 161

Now that we’ve shown B − C is invertible, we are free to multiply both sides of Equation (3.4) on the left
by (B − C)−1 .

(B − C)−1 (B − C)X) = (B − C)−1 A ⇒ X = (B − C)−1 A.

This solves the equation. ♦

Note
Pay attention to how the multiplication and factoring was done in Example 3.4.1. In particular, notice
how X needed to be factored out on the right in Equation (3.4). The only reason this is possible is
because both B and C are multiplied by X on the right. If one had been multiplied by X on the
left, factoring in this way is not possible. For example, the following three algebraic manipulations
are incorrect:

BX − XC = (B − C)X, XB − CX = (B − C)X, XB − XC = (B − C)X.

Moreover, notice how whenever we multiplied by a matrix in Example 3.4.1, we were always
consistent with what side we multiplied on. Even though it seems like it shouldn’t be that big of a
deal, it is absolutely necessary because matrix multiplication is not commutative.

Another thing to make note of is how we needed to show B − C is invertible before we could
multiply by its inverse. This is very important. You must always argue why a matrix is invertible
before multiplying by its inverse.

Example 3.4.2

Let A, B, C, and X be n × n matrices with A, B, and X invertible. Solve the following equation for
X.
AX + B T = AXC. (3.6)

Solution. First, isolate for X. We do this by subtracting AX on both sides.

AX + B T = AXC =⇒ AXC − AX = B T . (3.7)

Multiplying both sides of Equation (3.7) on the left by A−1 yields

A−1 (AXC − AX) = A−1 B T =⇒ A−1 (AXC) − A−1 (AX) = A−1 B T =⇒ XC − X = A−1 B T . (3.8)

Now, X = X · In . Therefore, we can factor X out of Equation (3.8) on the left

XC − X = A−1 B T =⇒ X(C − In ) = A−1 B T . (3.9)

We need C − In to be invertible in order to solve for X. Since X is assumed invertible, we can multiply both
sides of Equation (3.9) by X −1 on the left to get

X(C − In ) = A−1 B T =⇒ C − In = X −1 A−1 B T . (3.10)

162 CHAPTER 3. ALGEBRA OF MATRICES

B T is invertible because B is. Therefore, C − In is a product of invertible matrices and, therefore, it too is
invertible. This means we can multiply both sides of Equation (3.9) by (C − In )−1 on the right to get

X(C − In ) (C − In )−1 = A−1 B(C − In )−1 =⇒ X = A−1 B(C − In )−1 ,

which solves the equation. ♦

Chapter 4

Subspaces

In this chapter, we study special subsets of Rn called subspaces. These are one of the primary areas of
studies in linear algebra.

4.1 Introduction to Subspaces

In this section, we define what a subspace is. We have already seen these before. For example. the range of
a linear transformation is a subspace of its codomain. Spans of vectors are subspaces as well.

Definition 4.1.1: Subspaces

A subset S ⊆ Rn of vectors is called a subspace of Rn if the following three properties hold for S:

1. ~0 ∈ S,

2. For any two ~u, ~v ∈ S, the sum ~u + ~v is contained in S (closure under addition),

3. For any w~ ∈ S and any scalar r ∈ R, the product rw

~ is contained in S (closure under scalar
multiplication).

Example 4.1.1
n o
There are two obvious subspaces of Rn : Rn itself and ~0 . These are referred to as “improper”
subspaces of Rn .

Exercise
n o
Prove that ~0 and Rn are subspaces of Rn .

163
164 CHAPTER 4. SUBSPACES

Showing that subsets of Rn are subspaces is generally not too hard. Exactly like showing something is linear,
all we need to do is verify the conditions in the definition.

Example 4.1.2

Consider the following subset of R3 :

  

 2a − b 

3
S=  0  : a, b, c ∈ R ⊆ R .
 
 
a+b+c
 

Show that S is a subspace of R3 .

Solution. We need to verify the three conditions in the definition of a subspace.

1. For the first condition, clearly ~0 ∈ S because, when a = b = c = 0,

   
2(0) − 0 0
   ~
0   0  = 0 ∈ S.
=


0+0+0 0

2. For the second condition, pick any two arbitrary vectors ~v1 and ~v2 in S. Write
   
2a1 − b1 2a2 − b2
~v1 =  0  , ~v2 =  0 , a1 , a2 , b1 , b2 , c1 , c2 ∈ R.
   

a1 + b1 + c1 a2 + b2 + c3
We must show the sum of these two vectors is in S. Calculating their sum yields
     
2a1 − b1 + 2a2 − b2 2(a1 + a2 ) − (b1 + b2 ) 2a3 − b3
~v1 +~v2 =  0 = 0 = 0
     

a1 + b1 + c1 + a2 + b2 + c2 (a1 + a2 ) + (b1 + b2 ) + (c1 + c2 ) a3 + b3 + c3
where a3 = a1 + a2 ∈ R, b3 = b1 + b2 ∈ R, and c3 = c1 + c2 ∈ R. Since ~v1 + ~v2 satisfies the criteria for
a vector to be in S, the sum ~v1 + ~v2 is in S. Therefore, the sum of any two vectors in S is also in S,
which verifies the second condition in the definition of subspaces.

3. For the third condition, pick any vector ~v ∈ S and any scalar r. Write
 
2a − b
~v =  0 , a, b, c ∈ R.
 

a+b+c
We must show r~v ∈ S. Calculating this product yields
     
r(2a − b) 2(ra) − (rb) 2d − e
r~v =  0 = 0 = 0 ,
     

r(a + b + c) (ra) + (rb) + (rc) d+e+f

4.1. INTRODUCTION TO SUBSPACES 165

where d = ra ∈ R, e = rb ∈ R, f = rc ∈ R. This shows r~v satisfies the criteria for a vector to be in

S. Therefore, we’ve shown r~v ∈ S for any vector ~v ∈ S and any scalar r ∈ R. This verifies the third
condition.

All three conditions have now been verified, therefore we conclude that S is a subspace of R3 . ♦

As noted, the span of a set of vectors is a subspace.

Theorem 4.1.1

Let {~v1 , ~v2 , . . . , ~vk } ⊆ Rn be any subset of vectors in Rn . Then, S = span {~v1 , ~v2 , . . . , ~vk } is a subspace
of Rn .

Proof. We verify the three conditions in the definition of subspaces.

1. ~0 ∈ S because
~0 = 0~v1 + 0~v2 + . . . + 0~vk ∈ S.

2. Let ~u, ~v ∈ S. By definition of S, there exist scalars a1 , a2 , . . . , ak , b1 , b2 , . . . , bk ∈ R such that

~u = a1~v1 + a2~v2 + . . . + ak~vk , and ~v = b1~v1 + b2~v2 + . . . + bk~vk .

Calculating ~u + ~v yields

~u + ~v = (a1~v1 + a2~v2 + . . . + ak~vk ) + (b1~v1 + b2~v2 + . . . + bk~vk )

= (a1 + b1 )~v1 + (a2 + b2 )~v2 + . . . + (ak + bk )~vk by 1 and 3 of Properties of Vectors.

Since ai + bi ∈ R for each i ∈ {1, 2, . . . , k}, this shows ~u + ~v is a linear combination of ~v1 , ~v2 , . . . , ~vk .
Therefore, ~u + ~v ∈ S.

~ ∈ S and r ∈ R. By definition of S, there exist scalars c1 , c2 , . . . , ck ∈ R such that

3. Let w

w
~ = c1~v1 + c2~v2 + . . . + ck~vk .

By parts 2 and 5 of Properties of Vectors, we get

rw
~ = r(c1~v1 + c2~v2 + . . . + ck~vk ) = (rc1 )~v1 + (rc2 )~v2 + . . . + (rck )~vk .

Since rci ∈ R for each i ∈ {1, 2, . . . , k}, this shows rw

~ is a linear combination of ~v1 , . . . , ~vk . Thus,
~ ∈ S.
rw

All three conditions in the definition of subspace are verified. Therefore, S = span {~v1 , ~v2 , . . . , ~vk } is a sub-
space of Rn .

Theorem 4.1.1 gives another way of showing a subset of Rn is a subspace: write it as the span of a set of
vectors in Rn .
166 CHAPTER 4. SUBSPACES

Example 4.1.3

Consider the subspace S from Example 4.1.2. Pick any ~v ∈ S. Then,

       
2a − b 2 −1 0
~v =  0  = a 0  + b 0  + c 0 .
       

a+b+c 1 1 1
     
 2
 −1 0  
This shows any vector ~v ∈ S is in span  0  ,  0  ,  0  . Conversely, it is obvious that
     
 
1 1 1
 
any vector in this span is also in S. Therefore,
     
 2
 −1 0 
S = span  0  ,  0  ,  0  ,
     
 
1 1 1
 

and so Theorem 4.1.1 implies S is a subspace of R3 .

Example 4.1.4

The span of a single, non-zero vector in R2 or R3 is interpreted as a line through the origin. The span
of two non-zero, linearly independent vectors in R3 is interpreted as a plane through the origin. These
are subspaces by Theorem 4.1.1. However, arbitrary lines in R2 or R3 are not generally subspaces
because they may not pass through the origin. A similar argument is true for arbitrary planes in R3 .
4.2. COLUMN AND NULL SPACE 167

4.2 Column and Null Space

There are three important subspaces attached to every n × k matrix: columns space, null space, and row
space. In this section, we introduce column space and null space.

Definition 4.2.1: Column Space

Let A be an n × k matrix and write A = [ ~a1 ~a2 . . . ~ak ] . The column space of A, denoted Col(A),
is the span of the columns of A. That is,

Col(A) = span {~a1 , ~a2 , . . . , ~ak } ⊆ Rn .

If A is n × k, Theorem 4.1.1 implies Col(A) is a subspace of Rn . Moreover, if F : Rk → Rn is the linear

transformation whose standard matrix is A, then it is easy to show that Col(A) equals the range of F .

Exercise

Let A be an n × k matrix and let F : Rk → Rn be the linear transformation with standard matrix
A. Prove that Col(A) is equal to the range of F . (Note that this implies the range of a linear
transformation is always a subspace of its codomain).

Example 4.2.1
     
−2 3 −20 8 0
Let A =  1 3 −17  . Determine whether the vectors ~b =  5 , ~c =  2  are in the
     

−7 −1 −1 5 1
column space of A. If they are, write the vector as a linear combination of the columns of A.

Solution. By definition, Col(A) is equal to the span of the columns of A. Therefore, asking if a vector ~b is
in Col(A) is the same as asking if ~b can be written
h as a linear
i combination of the columns of A. We know
~
how to answer this! Form the augmented matrix A b and row reduce!

   
h i −2 3 −20 8 1 0 1 −1
A | ~b = 1 3 −17 5 ∼ 0 1 −6 2 .
   

−7 −1 −1 5 0 0 0 0

The matrix is consistent, hence the equation A~x = ~b has a solution. This shows that ~b ∈ Col(A). The vector
form of the solution is
     
x1 −1 −1
 x2  =  2  + s  6  , s ∈ R.
     

x3 0 1
168 CHAPTER 4. SUBSPACES

Letting s = x3 = 0, we get x1 = −1, x2 = 2, so that

       
8 −2 3 −20
~b = 
 5  = −1  1
 
 + 2  3  + 0  −17  = −~a1 + 2~a2 .
    

5 −7 −1 −1

We could also take s = x3 = 1. This gives x1 = −2, x2 = 8, and so

 
8
~b = 
 5  = −2~a1 + 8~a2 + ~a3 .


To see if ~c ∈ Col(A), we do the same thing:

   
−2 3 −20 0 1 0 1 0
[ A | ~c ] =  1 3 −17 ∼
2   0 1 −6 0 .
   

−7 −1 −1 1 0 0 0 1

Since the rightmost column is a pivot column, this matrix corresponds to an inconsistent linear system.
Thus, A~x = ~c has no solution and, hence, ~c is not a linear combination of the columns of A. This shows
~c 6∈ Col(A). ♦

The second important subspace attached to any matrix is called the null space.

Definition 4.2.2: Null Space

Let A be an n × k matrix. The null space of A, denoted Null(A), is the set of all solutions to the
homogeneous equation A~x = ~0. In set notation,
n o
Null(A) = ~v ∈ Rk : A~v = ~0 .

Determining if a given vector ~v is in the null space of a matrix A is easy. All you need to do is calculate A~v
and see if it is zero. If it’s zero, then ~v is in Null(A). If it’s not zero, then ~v is not in Null(A).

Example 4.2.2
   
" # 0 2
2 1 3
Let A = . Determine if the vectors ~u =  1  , ~v =  −1  are in Null(A).
   
1 2 0
−1 −1

Solution. We calculate A~u and A~v and see if either of these products is zero. For the first,
 
" # 0 " # " #
2 1 3  2(0) + 1(1) + 3(−1) −2
A~u =  1 = = 6 ~0.
=

1 2 0 1(0) + 2(1) + 0(−1) 2
−1
4.2. COLUMN AND NULL SPACE 169

Since A~u 6= ~0, ~u 6∈ Null(A). For ~v ,

 
" # 2 " #
2 1 3 2(2) + 1(−1) + 3(−1)
A~v =  −1  = = ~0.
 
1 2 0 1(2) + 2(−1) + 0(−1)
−1

Since A~v = ~0, we conclude ~v ∈ Null(A). ♦

The definition of Null(A) does not immediately imply it is a subspace. Therefore, we need to prove this.

Theorem 4.2.1

Let A be an n × k matrix. Then, Null(A) ⊆ Rk is a subspace of Rk .

Proof. We need to check the three conditions in the definition of a subspace. The first condition is trivial
because A~0 = ~0 always, so ~0 ∈ Null(A).

For the second, let ~u, ~v ∈ Null(A). Then A~u = ~0 and A~v = ~0. Then, using part 1 of Matrix Vector
Multiplication Properties yields,
A(~u + ~v ) = A~u + A~v = ~0 + ~0 = ~0.

Thus, A(~u + ~v ) = ~0, so ~u + ~v ∈ Null(A).

~ = ~0, and part 2 of Matrix Vector

~ ∈ Null(A) and r ∈ R be any scalar. Then Aw
For the third condition, let w
Multiplication Properties yields,
A(rw)
~ = r(Aw)~ = r~0 = ~0.

~ = ~0, so rw
Thus, A(rw) ~ ∈ Null(A).

All three conditions in the definition of a subspace have been verified. Therefore, Null(A) is a subspace of
Rk .

Warning!
For an n × k matrix A, Col(A) is subspace of Rn and Null(A) is a subspace of Rk . Try not to confuse
these.
170 CHAPTER 4. SUBSPACES

4.3 Bases for Subspaces

In this section, we introduce bases of subspaces. Recall, the standard basis in Rn {~e1 , . . . , ~en } . This set of
vectors has two interesting properties. First, it spans Rn , so we can represent every vector in Rn as a linear
combination of the vectors in the standard basis. Secondly, the set is linearly independent. Although it is
not immediate, this implies every vector in Rn can be written as a unique linear combination of the vectors
in the standard basis. This is exactly what a basis is: a linearly independent spanning set.

Definition 4.3.1: Basis

Let S ⊆ Rn be a subspace. A basis for S is a finite subset of vectors of S, denoted B =
{~v1 , ~v2 , . . . , ~vm }, that has the following two properties:

1. B is a linearly independent set,

2. span {B} = span {~v1 , ~v2 , . . . , ~vm } = S.

Example 4.3.1

The standard basis for Rn is a basis for Rn .

The following property of spanning sets are useful in this section.

Theorem 4.3.1

Let ~v1 , ~v2 , . . . , ~vm ∈ Rn . Let S = span {~v1 , ~v2 , . . . , ~vm } and suppose ~v ∈ S. Then,

S = span {~v1 , ~v2 , . . . , ~vm , ~v } .

Proof. Let ~u ∈ S. Then, there exist scalars d1 , d2 , . . . , dm such that

~u = d1~v1 + d2~v2 + . . . + dm~vm + 0~v ∈ span {~v1 , ~v2 , . . . , ~vm , ~v } .

Therefore, span {~v1 , ~v2 , . . . , ~vm } ⊆ span {~v1 , ~v2 , . . . , ~vm , ~v } .

~ ∈ span {~v1 , ~v2 , . . . , ~vm , ~v } . Then, there exist scalars e1 , e2 , . . . , em , e such that
Now suppose w

w
~ = e1~v1 + e2~v2 + . . . em~vm + e~v .

Since ~v ∈ S, there exist scalars c1 , c2 , . . . , cm such that

~v = c1~v1 + c2~v2 + . . . + cm~vm .

Substituting the equation for ~v into the equation for w

~ gives

w
~ = e1~v1 + e2~v2 + . . . em~vm + e(c1~v1 + c2~v2 + . . . + cm~vm )

= (e1 + ec1 )~v1 + (e2 + ec2 )~v2 + . . . + (em + ecm )~vm ∈ S.

4.3. BASES FOR SUBSPACES 171

Therefore, span {~v1 , ~v2 , . . . , ~vm , ~v } ⊆ S. This shows the two sets are equal.

Here are two reasons why bases are important.

1. If we know a basis for a subspace S, then we can reconstruct all of S from the vectors in the basis.
Thus, if we know a basis, we know everything about the subspace.

2. The linear independence condition in the definition tells us that any representation of a vector ~v ∈ S
as a linear combination of the basis vectors is unique. To see this, let S be a subspace of Rn and let
{~v1 , ~v2 , . . . , ~vm } be a basis for S. Let ~v ∈ S and suppose there are two representations of ~v as a linear
combination of the basis vectors:

~v = c1~v1 + c2~v2 + . . . + cm~vm , and ~v = d1~v1 + d2~v2 + . . . + dm~vm .

Subtracting these two equations gives

(c1 − d1 )~v1 + (c2 − d2 )~v2 + . . . + (cm − dm )~vm = ~0.

Since ~v1 , ~v2 , . . . , ~vm are linearly independent, all the scalars in the above equation must be zero. This
implies
c1 = d1 , c2 = d2 , . . . cm = dm
so that the two representations are the same. Therefore, any ~v ∈ S has a unique representation as a
linear combination of the basis vectors.

Does every subspace have a basis? If the subspace is not the zero subspace, then the answer is yes.

Theorem 4.3.2
Every non-zero subspace S ⊆ Rn has a basis.

Proof. Start by picking a set of linearly independent vectors B = {~v1 , ~v2 , . . . , ~vm } in S that is maximal in
the following sense: if we add any other vector from S into B, then B becomes linearly dependent. Such a set
necessarily exists because S ⊆ Rn and, therefore, any linearly independent subset of vectors in S contains
at most n vectors by Corollary 2.6.4.

Pick a vector ~v ∈ S not in B. Then, the set {~v1 , ~v2 , . . . , ~vm , ~v } is linearly dependent. Therefore, there exist
scalars c1 , c2 , . . . , cm , c, not all zero, such that

c1~v1 + c2~v2 + . . . + cm~vm + c~v = ~0. (4.1)

Note that c 6= 0. If it were, then

c1~v1 + c2~v2 + . . . + cm~vm = ~0.
Since {~v1 , ~v2 , . . . , ~vm } is linearly independent, this equation implies c1 = c2 = . . . = cm = 0. This contradicts
the assumption that one of the scalars c1 , c2 , . . . , cm , c are non-zero. Therefore, c 6= 0.
Since c 6= 0, we can rearrange the linear dependence relationship in Equation (4.1) as follows.
c c c
1 2 m
~v = − ~v1 + − ~v2 + . . . + − ~vm .
c c c
172 CHAPTER 4. SUBSPACES

This shows ~v ∈ span {~v1 , ~v2 , . . . , ~vm } . Since ~v ∈ S is arbitrary, this shows S = span {~v1 , ~v2 , . . . , ~vm } and,
therefore, B is a basis for S.

Exercise
Use Theorem 4.3.2 to prove every subspace can be written as the span of a finite set of vectors. This
is the converse to Theorem 4.1.1.

n o
What about the zero subspace ~0 ? Does this subspace have a basis? I leave the determination of this
question as an exercise to the reader.

Exercise
Determine if the zero subspace has a basis.

The rest of this section is dedicated to methods for calculating bases of Col(A) and Null(A). We begin with
calculating bases for null spaces. This is easiest shown using an example.

Example 4.3.2
" #
1 3 0 −1
Let A = . Find a basis for Null(A).
3 2 1 −4

Solution. The first step is to determine the vector form of the solution to the homogeneous equation. The
RREF of A is the following. " #
1 0 3/7 −10/7
A∼ .
0 1 −1/7 1/7
The vector form of the solution to A~x = ~0 is
     
x1 −3/7 10/7
 x 
 2 
 1/7   −1/7 
~x =   = s  + t  = s~v1 + t~v2 , s, t ∈ R.
   
 x3   1   0 
x4 0 1
This shows every solution to the homogeneous equation is a linear combination of ~v1 and ~v2 above. Further-
more, the vectors ~v1 and ~v2 are linearly independent by construction; s = t = 0 can be the only solution to
s~v1 + t~v2 = ~0 due to the third and fourth components. Hence, a basis for Null(A) is
   

 −3/7 10/7  

 1/7   −1/7  
,  . ♦
   


  1   0  
 
0 1
 

This method to find a basis for the null space of a matrix always works. All you need to do is write the
vector form of the solution to the homogeneous equation A~x = ~0. The vectors in the vector form of the
4.3. BASES FOR SUBSPACES 173

general solution are always a basis for Null(A).

Finding a basis for column space is easy as well.

Theorem 4.3.3
Let A be an n × k matrix. Let B be an echelon form of A. Then, the columns of A that correspond
to the pivot columns in B form a basis for the column space of A.

Note

Be careful with this theorem. When you pick the basis for Col(A), make sure you are taking the
columns of A that correspond to the pivot columns in B. Do not take the pivot columns of B.
This will give you the wrong answer in general.

Example 4.3.3
 
2 8 −2
−1 −4 1 
 

 
Let A = 
 3 −6  . Find a basis for Col(A).
3 
5 5 0 
 

1 −2 1

Solution. An echelon form of A is  

−1 −4 1
0 −18 6 
 

 
A∼
 0 .
0 0 
0 0 0 
 

0 0 0

The first two columns of the echelon form are pivot columns. Therefore, by Theorem 4.3.3, a basis for the
column space of A consists of the first two columns of A. That is, a basis for Col(A) is
   


 2 8 

 −1   −4 

    
 
   
 3  ,  −6  . ♦
    
5 5

    

    


 
 1 −2 

Proof. Switching the order of the columns of A if necessary, we lose no generality in assuming that the
first m columns of B are pivot columns, and the last n − m columns of B are non-pivot columns. Consider
the matrix A0 = [ ~a1 ~a2 . . . ~am ] . Then, the same row operations that transform A into B will put A0 into
echelon form. By construction, this echelon form of A0 has a pivot in every column. Therefore, ~a1 , ~a2 , . . . , ~am
174 CHAPTER 4. SUBSPACES

are linearly independent by The Linear Independence Theorem Version.

Now consider A0j = [ ~a1 ~a2 . . . ~am | ~am+j ] for any j ∈ {1, 2, . . . , n − m} . Once again, the same row op-
erations that transform A into B will put A0j into echelon form. By construction, the first m columns of
this echelon form are pivot columns, so the last column is a non-pivot column. Therefore, ~am+j is a linear
combination of ~a1 , ~a2 , . . . , ~am for every j ∈ {1, 2, . . . , n − m} . Repeated applications of Theorem 4.3.1 then
yields
Col(A) = span {~a1 , ~a2 , . . . , ~am , ~am+1 , . . . , ~an } = span {~a1 , ~a2 , . . . , ~am } .
This shows the columns of A that correspond to the pivot columns of B form a basis for Col(A).

Suppose you are given a subspace S and a set of vectors that spans that subspace. You can always reduce
the spanning set to a basis of S using a column space argument.

Example 4.3.4

Let S ⊆ R4 be the subspace spanned by the following four vectors

       
−2 −6 10 −2
 1   3   −5   1 
~v1 =   , ~v2 =   , ~v3 =   , ~v4 =  .
       
 3   1   1   −3 
1 5 −9 6

In other words, S = span {~v1 , ~v2 , ~v3 , ~v4 } . Determine a basis for S.

Solution. Form the matrix A = [~v1 ~v2 ~v3 ~v4 ] . Then, Col(A) = span {~v1 , ~v2 , ~v3 , ~v4 } = S. Therefore,
determining a basis for S is the same as determining a basis for Col(A)! The RREF of A is,
   
−2 −6 10 −2 1 0 1 0
 1 3 −5 1   0 1 −2 0 
 
A= ∼ .

 3 1 1 −3   0 0 0 1 
1 5 −9 6 0 0 0 0

The first, second, and fourth columns are pivot columns. Therefore, by Theorem 4.3.3, a basis for Col(A) = S
is      

 −2 −6 −2  

 1   3   1  
{~v1 , ~v2 , ~v4 } =  , ,  . ♦
     

  3   1   −3  
 
1 5 6
 

We end this section by summarizing the steps for finding bases of column and null space.

Strategy: Bases For Column and Null Space

Let A be an n × k matrix.
4.3. BASES FOR SUBSPACES 175

To calculate a basis for Col(A), do the following:

Step 1: Row reduce A to an echelon form B using Gauss-Jordan Elimination.

Step 2: Identify the pivot columns of B.

Step 3: A basis for Col(A) consists of the columns of the original matrix A that correspond to the
pivot columns of B.

To calculate a basis for Null(A), do the following:

Step 1: Put A in reduced row echelon form using Gauss-Jordan Elimination.

Step 2: Use the RREF of A to write down the vector form of the solution to the homogeneous
equation A~x = ~0.

Step 3: The vectors that appear in the vector form of the solution calculated in step 2 are a basis
for Null(A).
176 CHAPTER 4. SUBSPACES

4.4 Dimension of Subspaces

In this section, we define the dimension of a subspace. The dimension is a number and every subspace has
a dimension. However, before we can reasonably define dimension, we need a result.

Theorem 4.4.1
Let S ⊆ Rn be a non-zero subspace. Then, any two bases for S contain the same number of vectors.

Proof. Let B1 = {~u1 , ~u2 , . . . , ~ur } and B2 = {~v1 , ~v2 , . . . , ~vs } be two bases for S. First, suppose r < s. Since
the vectors in B2 are contained in S, and B1 spans S, there exist scalars cij for each i ∈ {1, 2, . . . , s} and
j ∈ {1, 2, . . . , r} such that
~v1 = c11 ~u1 + c12 ~u2 + . . . + c1r ~ur ,
~v2 = c21 ~u1 + c22 ~u2 + . . . + c2r ~ur ,
..
.
~vs = cs1 ~u1 + cs2 ~u2 + . . . + csr ~ur .
Let x1 , x2 , . . . , xs be variables and consider the vector equation

x1~v1 + x2~v2 + . . . + xs~vs = ~0. (4.2)

The ~vi ’s form a basis for S, hence are a linearly independent set. Therefore, the only solution to this vector
equation is x1 = x2 = . . . = xs = 0. Now substitute the expressions for the ~vi ’s in terms of the ~ui ’s into the
vector equation (4.2)

x1 (c11 ~u1 + c12 ~u2 + . . . + c1r ~ur ) + x2 (c21 ~u1 + c22 ~u2 + . . . + c2r ~ur ) + . . . + xs (cs1 ~u1 + cs2 ~u2 . . . + csr ~ur ) = ~0.

Rearranging gives

(c11 x1 +c21 x2 +. . .+cs1 xs )~u1 +(c12 x1 +c22 x2 +. . .+cs2 xs )~u2 +. . .+(c1r x2 +c2r x2 +. . .+csr xs )~ur = ~0 (4.3)

Since the ~ui ’s also form a linear independent set, the coefficients on each must be zero. Therefore, we get
the following linear system from Equation (4.3).

c11 x1 + c21 x2 + . . . + cs1 xs = 0

c12 x1 + c22 x2 + . . . + cs2 xs = 0
..
.
c1r x1 + c2r x2 + . . . + csr xs = 0.

This linear system has r equation and s variables. Since r < s, the system necessarily has infinitely many
solutions. Each such non-trivial solution gives a non-trivial solution to the vector equation in (4.2), which is
not possible. Therefore, it must be the case that r ≥ s. In this case, if r > s the exact same argument with
the roles of the two bases reversed gives the same contradiction we derived above. The only option left is
that r = s which means B1 and B2 have the same number of elements. Since these were any two arbitrary
bases of S, the result follows.
4.4. DIMENSION OF SUBSPACES 177

Theorem 4.4.1 states that all bases of a non-zero subspace S have the same number of elements. Therefore,
we are permitted to make the following definition.

Definition 4.4.1: Dimension

Let S ⊆ Rn be a subspace. The number of vectors in any basis for S is called the dimension of S.
This is denoted by dim(S).

Example 4.4.1

The standard basis {~e1 , ~e2 , . . . , ~en } is a basis for Rn for each positive integer n. This means the
dimension of Rn is n (whence the name n-dimensional Euclidean space).

Example 4.4.2

The dimension of the column space in Example 4.3.3 is 2. The dimension of the null space in Example
4.3.2 is 2. The dimension of the subspace in Example 4.3.4 is 3.

Exercise
n o
What is the dimension of the zero subspace ~0 ?

4.4.1 The Basis Theorem and Three Useful Corollaries for Calculating Bases
and Dimensions
In this section, we introduce a theoretical result called The Basis Theorem. This is a theorem that tells
you how to construct a basis of a subspace S from any spanning set of S and how to turn any linearly
independent set of S into a basis of S.

Theorem 4.4.2: The Basis Theorem

Let S ⊆ Rn be a non-zero subspace and let B = {~v1 , ~v2 , . . . , ~vm } be a subset of S.

1. If B is linearly independent, then either B is a basis for S, or at most finitely many vectors from
S can be added to B to make it a basis.

2. If the vectors in B span S, then either B is a basis for S, or vectors can be removed from B to
turn it into a basis for S.

Proof.

1. If B is already a basis there is nothing to prove. Thus, suppose B is a linearly independent set that
is not a basis. Pick any w~ 1 ∈ S that does not belong to span {B} = span {~v1 , ~v2 , . . . , ~vm } . Then, the
178 CHAPTER 4. SUBSPACES

set B1 = {~v1 , ~v2 , . . . , ~vm , w~ 1 } is linearly independent. The proof of this is left as an exercise. If B1
is a basis, then we are done. If not, pick another vector w ~ 2 ∈ S that is not in span {B1 }. Then,
B2 = {~v1 , ~v2 , . . . , ~vm , w ~ 2 } is a linearly independent set. Iterate this process; it must terminate
~ 1, w
eventually because all of these vectors are in Rn and, in Rn , a set with more than n vectors is not
linearly independent by Corollary 2.6.4. Once the process terminates, we are left with a set of linearly
independent vectors whose span is equal to S, i.e. a basis for S. Therefore, we have added finitely
many vectors to B to get a basis for S.

2. Suppose B spans S. Form the matrix A = [ ~v1 ~v2 . . . ~vm ]. Then, S = Col(A). Row reduce A to
an echelon form B. Switching the order of the columns if necessary, we lose no generality in assum-
ing that the first ` columns of B are pivot columns. By Theorem 4.3.3, a basis for Col(A) = S is
{~v1 , ~v2 , . . . , ~v` } ⊆ B where ` ≤ m. Therefore, we have found a basis for S by removing finitely many
vectors from B.

Exercise
Let Let ~v1 , ~v2 , . . . , ~vk ⊆ Rn be a linearly independent set of vectors and suppose that ~v ∈ Rn is such
that,
~v 6∈ span {~v1 , ~v2 , . . . , ~vk } .

Prove that the set, {~v1 , ~v2 , . . . , ~vk , ~v } is linearly independent.

The Basis Theorem is a useful result, particularly from a theoretical standpoint. Our main use of it is that
it leads to the following three results that allow us to easily answer a wide variety of questions about bases
and dimension of subspaces.

Theorem 4.4.3: The Basis Corollary

Let S ⊆ Rn be a non-zero subspace of dimension m. Let B = {~v1 , ~v2 , . . . , ~vm } be a set of vectors in
S.

1. If B is linearly independent, it is a basis for S.

2. If B spans S (so span {~v1 , ~v2 , . . . , ~vm } = S), then B is a basis for S.

Example 4.4.3
     
 −1
 2 0 
Is B =  2  ,  −1  ,  9  a basis for R3 ?
     
 
1 0 1
 

Solution. Since R3 has dimension 3, by The Basis Corollary, we need only check that B spans R3 . Row
4.4. DIMENSION OF SUBSPACES 179

reducing A to echelon form yields,

   
−1 2 0 −1 2 0
A= 2 −1 9  ∼  0 3 9 .
   

1 0 1 0 0 −5

This echelon form of A has a pivot in every row. Therefore, its columns span R3 by The Span Theorem.
Hence, by The Basis Corollary, B is a basis for R3 . ♦

Example 4.4.4

Let S ⊆ R4 be a subspace of dimension 3. Suppose the following three vectors are in S

     
−1 0 1
 2   1   2 
~v1 =  , ~v2 =  , ~v3 =  .
     
 1   1   1 
1 2 5

Are these three vectors a basis for S?

Solution. Form the matrix A = [ ~v1 ~v2 ~v3 ] and row reduce to an echelon form.
   
−1 0 1 1 0 0
 2 1 2   0 1 0 
A= ∼ .
   
 1 1 1   0 0 1 
1 2 5 0 0 0

The echelon form has a pivot in every column. By The Linear Independence Theorem Version, {~v1 , ~v2 , ~v3 }
is a linearly independent set. Therefore, since dim(S) = 3, The Basis Corollary implies they form a basis for
S. ♦

Proof. I prove part 1 and leave part 2 as an exercise for the reader because it is similar.

Suppose B is a linearly independent set. By way of contradiction, suppose B does not span S. Then, part
1 of The Basis Theorem implies we can add a finite number of vectors in S to B in order to get a basis for
S. But then, a basis for S would consist of more than m vectors. This is impossible as S has dimension m.
Therefore, B must span S, so it is a basis.

Exercise: P
rove the second part of The Basis Corollary.

The next corollary is a generalization of Corollaries and 2.6.4 to any subspace of Rn .

180 CHAPTER 4. SUBSPACES

Corollary 4.4.1

Let S ⊆ Rn be a non-zero subspace of dimension m. Let B = {~v1 , ~v2 , . . . , ~v` } be a set of vectors in S.

1. If m < `, then B is not linearly independent.

2. If ` < m, then B does not span S.

Example 4.4.5

Let S ⊆ R5 be a subspace of dimension 2. Explain why the vectors

     
−1 0 0
 2   1 1
     
  
     
 3  , ~v2 =  2
~v1 =    ,
 ~v3 = 
 1 

 1   −1 2
     
  
5 1 1

can not all be in S.

Solution. Form the matrix A = [ ~v1 ~v2 ~v3 ] . Then,

   
−1 0 0 1 0 0
 2 1 1   0 1 0
   

   
A= ∼ 0
.
 3 2 1  0 1


 1 −1 2   0 0 0
   

5 1 1 0 0 0

By The Linear Independence Theorem Version, {~v1 , ~v2 , ~v3 } is linearly independent set. Since dim(S) = 2,
part 1 of Corollary implies any linearly independent subset of S contains at most 2 vectors. Therefore,
~v1 , ~v2 , ~v3 can not all simultaneously be in S. ♦

Proof. I only prove the first part and leave the second as an exercise for the reader.

Suppose B is a linearly independent set. Part 1 of The Basis Theorem implies either B is a basis for S or
that we may add vectors from S to B in order to get a basis for S. B clearly can’t be a basis because it has
more vectors in it than the dimension of S. Moreover, we can’t add vectors from S to B to get a basis for S
because B already has too many vectors in it. Thus, B can not be linearly independent.

Exercise
Prove the second part of Corollary 4.4.1.

The final corollary tells us how subspaces of different dimensions “fit inside” of one another.
4.4. DIMENSION OF SUBSPACES 181

Theorem 4.4.4: The Nested Dimensions Corollary

Let S1 , S2 ⊆ Rn be two subspaces with S1 ⊆ S2 . Then dim(S1 ) ≤ dim(S2 ). Furthermore, dim(S1 ) =

dim(S2 ) if and only if S1 = S2 .

The Nested Dimensions Corollary allows us to answer a number of questions about subspaces even in cases
where it seems like we don’t have enough information.
Example 4.4.6

Let S ⊆ R4 be a subspace with S 6= R4 . Suppose ~v ∈ S and ~v 6= ~0. What are the possible dimensions
of S.

Solution. Since S 6= R4 , The Nested Dimensions Corollary implies that dim(S) < dim(R4 ) = 4. Since
~0 6= ~v ∈ S, S is not the zero subspace. Therefore, dim(S) ≥ 1. Hence, 1 ≤ dim(S) < 4 so the possible values
for dim(S) are 1, 2, and 3. ♦

Proof. Let dim(S1 ) = m1 and dim(S2 ) = m2 . Let B1 = {~v1 , ~v2 , . . . , ~vm1 } be a basis for S1 . Since S1 is a
subset of S2 , it follows that B1 ⊆ S2 . As B1 is a linearly independent set, either B1 is a basis for S2 , in which
case m1 = m2 , or, by part 1 of The Basis Theorem, we can add vectors from S2 to B1 to get a basis for S2 ,
in which case m1 < m2 . In either case, m1 ≤ m2 .

For the last statement, first suppose dim(S1 ) = dim(S2 ). Let B1 be as above. Then, B1 ⊆ S1 ⊆ S2 , so that
B1 is a linearly independent set of vectors in S2 , and the number of vectors in B1 is equal to the dimension
of S2 . Therefore, by The Basis Corollary, B1 is a basis for S2 , so span {B1 } = S2 . But span {B1 } = S1 as B1
is a basis for S1 . Therefore, S1 = S2 .

Conversely, suppose S1 = S2 . Then, since B1 is a basis for S1 , B1 is also a basis for S2 (since the two
subspaces are equal). Thus, dim(S2 ) is equal to the number of vectors in B1 , which is exactly dim(S1 ).

The following exercise is useful for getting used to the The Nested Dimensions Corollary.

Exercise
Prove that the only subspace S of Rn that has dimension n is Rn itself.

Here are some more examples of the types of questions we can solve using all three of the corollaries from
this section.

Example 4.4.7

Let S ⊆ R3 be a non-zero subspace of R3 . Suppose ~v1 , ~v2 ∈ S with ~v1 6= c~v2 for any scalar c ∈ R.
Furthermore, suppose S 6= R3 . Is {~v1 , ~v2 } necessarily a basis for R3 ? Explain your answer.
182 CHAPTER 4. SUBSPACES

Solution. Yes it is. Since ~v1 6= c~v2 for any c ∈ R, {~v1 , ~v2 } is a linearly independent set of vectors by
Corollary 2.6.2. Therefore, by part 1 of The Basis Theorem, dim(S) ≥ 2. Since S ⊆ R3 , but S 6= R3 , The
Nested Dimensions Corollary implies dim(S) < 3. Therefore, 2 ≤ dim(S) < 3 which implies dim(S) = 2.
Therefore, {~v1 , ~v2 } is a set of 2 vectors in a subspace of dimension 2, hence it is a basis for S by part 1 of
The Basis Corollary. ♦

Example 4.4.8
5
Let S ⊆ R be 
a subspace that contains 4 linearly independent vectors ~v1 , ~v2 , ~v3 , ~v4 . Suppose the
1
 0 
 
 
 0  is not in S. Is {~v1 , ~v2 , ~v3 , ~v4 } necessarily a basis for S?
vector ~v =  
 0 
 
0

Solution. Yes. Since S contains 4 linearly independent vectors, part 1 of The Basis Theorem implies
4 ≤ dim(S). Since S ⊆ R5 , The Nested Dimensions Corollary implies dim(S) ≤ 5. If dim(S) = 5, then
S = R5 by The Nested Dimensions Corollary. This is impossible because ~v is in R5 but is not in S. There-
fore, dim(S) 6= 5 so that dim(S) = 4. Therefore, since {~v1 , ~v2 , ~v3 , ~v4 } is a linearly independent set, part 1 of
The Basis Corollary implies these vectors are a basis for S. ♦
4.5. THE RANK-NULLITY THEOREM 183

4.5 The Rank-Nullity Theorem

In this section, we introduce and prove of the most important theorems in this course: the Rank-Nullity
Theorem. We start with a definition.

Definition 4.5.1: Rank and Nullity

Let A be an n × k matrix. The rank of A, denoted rank(A), is the dimension of the column space
of A. The nullity of A, denoted nul(A), is the dimension of the null space of A.

Example 4.5.1

The rank of the matrix in Example 4.3.3 is 2. The nullity of the matrix in Example 4.3.2 is 2.

Let A be an n × k matrix and let B be an echelon form of A. rank(A) is equal to the number of pivot
columns of B. What about nul(A)? A basis for Null(A) consists of the vectors in the vector form of the
solution to A~x = ~0. Each of these vectors corresponds to a non-pivot column of B. Therefore, nul(A) equals
the number of non-pivot columns of B. With this observation, the following important theorem is evident.

Theorem 4.5.1: The Rank-Nullity Theorem

Let A be an n × k matrix. Then, rank(A) + nul(A) = k.

Example 4.5.2

Let A be 7 × 23 and suppose that rank(A) = 3. What is nul(A)?

Solution. k = 23. By The Rank-Nullity Theorem, we have

rank(A) + nul(A) = k =⇒ 3 + dim(Null(A)) = 23 =⇒ nul(A) = 20. ♦

Example 4.5.3

Let A be 8 × 6. Can rank(A) = 7? Justify your answer using The Rank-Nullity Theorem.

Solution. No it can’t. By The Rank-Nullity Theorem, rank(A) + nul(A) = 6. Thus, if rank(A) = 7, then

7 + nul(A) = 6 =⇒ nul(A) = −1,

which is impossible. Therefore, rank(A) can not be 7. In fact, the maximum that rank(A) could be is 6. ♦

Proof. Let B be an echelon form of A. Every column of B is either a pivot column or a non-pivot column.
rank(A) is equal to the number of pivot columns of B and nul(A) is the number of non-pivot columns of B.
184 CHAPTER 4. SUBSPACES

Therefore,
rank(A) + nul(A) = total number of columns of B = k.

In the context of this course, The Rank-Nullity Theorem might seem a bit trivial. This is because we’ve
rendered many of the problems to row reducing a matrix and counting pivot columns and rows. However,
this theorem is very important. The Rank-Nullity Theorem holds in much more general settings. In our case,
it is easy because we have matrix representations of linear transformations. In many situations, however,
matrices are a non-available luxury.
4.6. ROW SPACE 185

4.6 Row Space

In this section, we introduce row space. This is the third of the important subspaces that are attached to
a given matrix A; the other two of course being column and null space!

Definition 4.6.1: Row Space

Let A be an n × k matrix. The row space of A, denoted Row(A), is the span of the rows of A when
they’re considered as vectors in Rk .

Note

Theorem 4.1.1 implies that Row(A) is a subspace of Rk for any n × k matrix A.

Example 4.6.1
 
2 7 0
 9 −1 7 
Let A =   . The rows of A, considered as vectors in R3 , are
 
 1 −1 1 
0 0 1
       
2 9 1 0
 7 ,  −1  ,  −1  ,  0 .
       

0 7 1 1

Therefore, Row(A) is
       
 2
 9 1 0 
Row(A) = span  7  ,  −1  ,  −1  ,  0  .
       
 
0 7 1 1
 

The main result of this section is dim(Row(A)) = rank(A). This is a very interesting result. Here is why:
Let A = [ ~a1 ~a2 . . . ~ak ] be n × k. Let m = rank(A). Then, any set of linearly independent columns of A
contains at most k vectors. The result dim(Row(A)) = rank(A) means that the same relation holds for the
rows of A as well. This means, regardless of how we construct A, the maximal number of linearly indepen-
dent columns of A is always the same as the maximal number of linearly independent rows! Here, a subset
S of the columns/rows of A is maximally linearly independent if S is linearly independent, but if any other
column/row is added to S, then the subset becomes linearly dependent. How cool is this!?

We need a number of preliminary results to build to the main result of this section. The aim of these results
is to show how we can find a basis of the row space of a matrix using Gauss-Jordan Elimination. The first
result shows that the non-zero rows of an echelon form of a matrix are linearly independent (when considered
as column vectors).
186 CHAPTER 4. SUBSPACES

Theorem 4.6.1
Let A be an n×k matrix and let B be an echelon form of A. Then, the non-zero rows of B, considered
as vectors in Rk , are a linearly independent set.

Proof. Let ~r1 , ~r2 , . . . , ~rm denote the non-zero rows of B, in the correct order, considered as column vectors
in Rk , 1 ≤ m ≤ n. Then, ~r1 is the first non-zero row of B considered as a column vector, ~r2 is the second
non-zero row of B considered as a column vector, and so on. Because B is in echelon form, every non-zero
row has a pivot position, and every pivot position is in a column to the right of the pivot position above it.
We exploit this fact to show these vectors are linearly independent.

Suppose the first non-zero entry in ~r1 occurs in the i1 th component, where 1 ≤ i1 ≤ k. Denote this entry by
a1 . Then a1 is the left most pivot in B. Because B is in echelon form, every entry in B in the same column
as a1 below a1 is zero. Therefore, the i1 th component in all of ~r2 , . . . , ~rm is zero. Consider

c1~r1 + c2~r2 + . . . + cm~rm = ~0. (4.4)

If we consider the i1 th component of this linear combination, we get

c1 (a1 ) + c2 (0) + . . . + ck (0) = 0 =⇒ c1 (a1 ) = 0.

Since a1 is non-zero, this implies c1 = 0. Thus, Equation (4.4) becomes

c2~r2 + c3~r3 + . . . + cm~rm = ~0. (4.5)

Now let a2 be the first non-zero entry in ~r2 , and suppose it occurs in the i2 th component, i1 < i2 ≤ k.
As before, the i2 th component must be equal to zero in ~r3 , . . . , ~rm because a2 corresponds to a pivot in B.
Thus, if we look at the i2 th component of the vector equation in Equation (4.5), we have

c2 (a2 ) + c3 (0) + . . . + cm (0) = 0 =⇒ c2 (a2 ) = 0

so that c2 = 0 as a2 is non-zero. Thus, Equation (4.5) becomes

c3~r3 + . . . + cm~rm = ~0.

Repeating this process for each ~ri yields c1 = c2 = . . . = cm = 0. Therefore, the vector equation

x1~r1 + x2~r2 + . . . + xm~rm = ~0,

has only the trivial solution. This shows {~r1 , . . . , ~rm } is linearly independent. ♦

The next result describes a basis for Row(A).

4.6. ROW SPACE 187

Theorem 4.6.2
Let A be an n × k matrix and let B be an echelon form of A. Then, the non-zero rows of B, when
considered as vectors in Rk , form a basis for Row(A).

Example 4.6.2
 
−2 2 0 −5
Let A =  3 1 −1 2 . Calculate a basis for Row(A).
 

4 2 −4 1

Solution. An echelon form of A is,

 
−2 2 0 −5
A∼ 0 4 −1 −11/2  .
 

0 0 −5/2 −3/4

Therefore, by Theorem 4.6.2, a basis for Row(A) is,

     

 −2 0 0 

 
 2   4 0
     
,  ,  . ♦
  
−1 −5/2
 



 0     


−5 −11/2 −3/4
 

Warning!
The row space basis consists of the rows of the echelon form of A, not the rows in the original matrix.
This is different from the column space basis. Be careful not to confuse the two!

Proof. Let ~a1 , ~a2 , . . . , ~an denote the rows of A considered as column vectors in Rk and, let ~r1 , ~r2 , . . . , ~rm
denote the non-zero rows of B considered as column vectors in Rk , 1 ≤ m ≤ n. We start with a claim.

Claim. span {~a1 , ~a2 , . . . , ~an } = span {~r1 , ~r2 , . . . , ~rm } .

Proof. The rows of B are obtained from the rows of A using elementary row operations. It suffices to show
the span of the rows of A is unchanged under these elementary row operations.

Swapping row. If we interchange two rows ~ai and ~aj , this clearly does not change span {~a1 , ~a2 , . . . , ~an } .

Scaling rows. Suppose we scale the row ~ai by a non-zero scalar c. Then, it is a fact that

span {~a1 , ~a2 , . . . , ~ai , . . . , ~an } = span {~a1 , ~a2 , . . . , c~ai , . . . , ~an } .

We leave the proof of this fact as an exercise.

188 CHAPTER 4. SUBSPACES

Row replacement. Suppose we perform the row operation ai + raj for some non-zero scalar r. We show

S1 = span {~a1 , ~a2 , . . . , ~ai , . . . , ~an } = span {~a1 , . . . , ~ai + r~aj , . . . , ~an } = S2 .

First, let ~v ∈ S1 . Without loss of generality, suppose i < j. Then, there exist scalars c1 , c2 , . . . , ci , . . . , cj , . . . cn
such that
~v = c1~a1 + c2~a2 . . . + ci~ai + . . . + cj~aj + . . . + cn~an .

Adding and subtracting rci~aj from the right hand side yields

~v = c1~a1 + c2~a2 + . . . + ci~ai + . . . + cj~aj + . . . + cn~an + rci~aj − rci~aj

= c1~a1 + c2~a2 + . . . + ci (~ai + r~aj ) + . . . + (cj − rci )~aj + . . . + cn~an

which is a linear combination of the vectors ~a1 , ~a2 , . . . , ~ai + r~aj , . . . , ~an . Thus, ~v ∈ S2 . The other direction
follows similarly (convince yourself!)

This shows the elementary row operations do not change the span of the rows of A. The rows of B are
constructed from the rows of A using elementary row operations. Therefore,
n o
span {~a1 , ~a2 , . . . , ~an } = span ~r1 , ~r2 , . . . , ~rm , ~0, . . . , ~0 = span {~r1 , ~r2 , . . . , ~rm }

where the last equality follows from the fact that

n o
span {~v1 , . . . , ~vp } = span ~v1 , . . . , ~vp , ~0

for any set of vectors (exercise). This proves the claim.

By the claim,
Row(A) = span {~r1 , . . . , ~rm } .

By Theorem 4.6.1, {~r1 , . . . , ~rm } is a linearly independent set. Thus, {~r1 , . . . , ~rm } is a basis for Row(A).

Exercise
Prove the 3 things left as exercises in the proof of Theorem 4.6.2.

Theorem 4.6.2 provides a different way of constructing a basis for a subspace.

Example 4.6.3

Use row space to find bases for the following subspaces of R3 :

           
 1
 2 0    2
 0 5 
S1 = span  2  ,  0  ,  −4  , S2 = span  3  ,  1  ,  2  .
           
   
1 2 0 1 −1 1
   
4.6. ROW SPACE 189

Solution. Let A1 be the matrix whose rows are the vectors in S1 .

 
1 2 1
A1 =  2 0 2 .
 

0 −4 0

Then, S1 = Row(A1 ). An echelon form of A1 is

 
1 2 1
A1 ∼  0 −4 0  .
 

0 0 0

By Theorem 4.6.2, the non-zero rows of this matrix, when considered as column vectors, form a basis for
Row(A1 ). Since Row(A1 ) = S1 , a basis for S1 is
   
 1
 0 
2 , −4  .
   
  
 
1 0
 

We do the same procedure for S2 . Let A2 be the following matrix.

 
2 3 1
A2 =  0 1 −1  .
 

5 2 1

Then, Row(A2 ) = S2 . An echelon form of A2 is

 
2 3 1
A2 ∼  0 1 −1  .
 

0 0 −7

Thus, by Theorem 4.6.2, a basis for S2 is

     
 2
 0 0 
 3 , 1 , 0  . ♦
     
 
1 −1 −7
 

Note
When using row space to calculate bases for subspaces, we can row reduce to whatever echelon form
we want. This will give different bases, of course, but they are all correct.

We are ready to prove the main theorem in this section.

190 CHAPTER 4. SUBSPACES

Theorem 4.6.3

Let A be an n × k matrix. Then, dim(Row(A)) = rank(A).

Proof. Let B be an echelon form of A. Theorem 4.6.2 implies that dim(Row(A)) is the number of non-zero
rows in B. By Theorem 4.3.3, rank(A) is the number of columns of B with a pivot in them. Each non-zero
row in B has exactly one pivot, and each of these different pivots have to occur in different columns by
definition of the echelon form. Thus, the number of columns with a pivot is exactly equal to the number of
non-zero rows of B; i.e. dim(Row(A)) = rank(A).

The following relation between the rank of A and its transpose is easily proved using Theorem 4.6.3.

Exercise

Let A be n n × k matrix. Prove that rank(A) = rank(AT ).

4.6.1 The Canonical Basis for Col(A)

In this section, we give an alternative method for calculating a basis for Col(A). This method calculates one
particular basis, called the canonical basis for Col(A). This is the basis most computer software will output
if you ask it to calculate a basis for the column space of a matrix.

Let A be n × k. By definition, the rows of A are the columns of AT , and vice versa. Therefore, the span
of the rows of A, when considered as vectors in Rk , is the same as the span of the columns of AT . That
is, Row(A) = Col(AT ) and, similarly, Row(AT ) = Col(A). With this observation in mind, we define the
following.

Definition 4.6.2: The Canonical Basis for Col(A)

Let A be an n × k matrix. The canonical basis for Col(A) is the basis obtained from the non-zero
rows of the reduced row echelon form of AT .

It is clear that the non-zero rows of the RREF of AT form a basis for Col(A) because Row(AT ) = Col(A).
We give some examples of calculating the canonical basis.
4.6. ROW SPACE 191

Example 4.6.4

Let
 
−2 −4 −1 3 −7
 
3 2 6 12 −1 11 17 
" #  

1 −1 −2 3  
A =  4 3 , B= , C= 2 4 −1 7 .
5 
 
2 0 3 −1 
5 6 0 0 2 −10 2 
 

5 10 1 0 16

Find the canonical basis for the column spaces of A, B, and C.

Solution. First, " #

T 3 4 5
A = .
2 3 6

The RREF of AT is " #

T 1 0 −9
A ∼ .
0 1 8

Therefore, the canonical basis for Col(A) is

   

 1 0 
0 , 1  .
   

 
−9 8
 

For B,
 
1 2
 −1 0 
BT =  ,
 
 −2 3 
3 −1
and its RREF is  
1 0
 0 1 
BT ∼  .
 
 0 0 
0 0
Thus, the canonical basis for Col(B) is (" # " #)
1 0
, .
0 1
For C,  
−2 6 2 0 5
−4 12 4 0 10
 
 
CT = 
 
 −1 −1 −1 2 1 ,

3 11 7 −10 0
 
 
−7 17 5 2 16
192 CHAPTER 4. SUBSPACES

and its RREF is  

1 0 1/2 −3/2 −11/8
0 1 1/2 −1/2 3/8 
 

T
 
C ∼
 0 0 0 0 .
0 
0 0 0 0 0 
 

0 0 0 0 0
Therefore, the canonical basis for Col(C) is
   


 1 0 


0 1

   


   
  

 1/2  ,  1/2  .
   ♦
 
−3/2   −1/2 

    

  


 −11/8 
3/8 

Why do we bother with the canonical basis? There are a couple reasons. First, it is unique. Secondly, if A
is n × k and Col(A) = Rk , then the canonical basis is always the standard basis for Rk . The algorithm for
column space bases we developed prior will not return this in general.
4.7. THE INVERTIBLE MATRIX THEOREM 193

4.7 The Invertible Matrix Theorem

We end this on subspace by updating The Invertible Matrix Theorem with some new conditions that deal
with rank, nullity, column, null, and row spaces.

Theorem 4.7.1: The Invertible Matrix Theorem

Let A be an n × n matrix. Let FA : Rn → Rn be the linear transformation whose standard matrix is
A. Let B be an echelon form of A. The following are equivalent.

1. A is invertible.

2. The RREF of A is In .

3. Then B has n pivot positions.

4. The homogeneous equation A~x = ~0 has only the trivial solution.

5. The columns of A are linearly independent.

6. FA is one-to-one.

7. The equation A~x = ~b has exactly one solution for each ~b ∈ Rn .

8. The columns of A span Rn .

9. FA is onto.

10. There is an n × n matrix C such that AC = In .

11. There is an n × n matrix D such that DA = In .

12. AT is invertible.

13. The columns of A form a basis for Rn .

14. Col(A) = Rn .

15. rank(A) = n.

16. nul(A) = 0.
n o
17. Null(A) = ~0 .

18. dim(Row(A)) = n.

19. Row(A) = Rn

Proof. We have already shown the first 12 conditions are equivalent. Therefore, it doesn’t matter which
one of these we pick to prove the chain of equivalences. For convenience, we show the new conditions are
equivalent to 5, the columns of A being linearly independent.
194 CHAPTER 4. SUBSPACES

5 =⇒ 13: Suppose the columns of A are linearly independent. Then, the columns of A are a set of n
linearly independent vectors in Rn . Since dim(Rn ) = n, part 1 of The Basis Corollary implies the columns
of A are a basis for Rn .

13 =⇒ 14 : Suppose the columns of A form a basis for Rn . Then, Col(A) = Rn by definition.

14 =⇒ 15 : Suppose Col(A) = Rn . Then, rank(A) = dim(Rn ) = n.

15 =⇒ 16 : Suppose rank(A) = n. By The Rank-Nullity Theorem,

rank(A) + nul(A) = n ⇒n + nul(A) = n =⇒ nul(A) = 0.

n o n o
16 =⇒ 17 : Suppose nul(A) = 0. Then, Null(A) = ~0 as ~0 is the only subspace with dimension zero.
n o
17 =⇒ 18 : Suppose Null(A) = ~0 . Then, nul(A) = 0, so by The Rank-Nullity Theorem and Theorem
4.6.2,
n = rank(A) = dim(Row(A)).

18 =⇒ 19 : Suppose dim(Row(A)) = n. Since Row(A) is a subspace of Rn , The Nested Dimensions

Corollary implies that Row(A) = Rn .

19 =⇒ 5 : Suppose Row(A) = Rn . Then, dim(Row(A)) = rank(A) = n by Theorem 4.6.2 so that the

columns of A are linearly independent.
Chapter 5

Determinants

In this chapter, we introduce determinants. The idea of a determinant goes back some 2000 years. Ancient
mathematicians had some idea of what a determinant was in the context of solutions to linear systems, but
didn’t have the language of matrices to make it precise. In fact, Leibniz knew that if certain coefficient iden-
tities were satisfied, then homogeneous linear systems with three equations in three variables had non-trivial
solutions. These identities are essentially what the determinant of a 3 × 3 matrix is.

Determinants have many uses which we will see in this document. For example, they can be used to deter-
mine whether or not a square matrix is invertible and they generalize volume to higher dimensional Euclidean
spaces. Another important property of determinant is that they are continuous. This is useful for theoretical
purposes that, unfortunately, extend beyond the score of this document.

Note
From now on, unless otherwise stated, all matrices are square.

5.1 Calculation of Determinants

Defining determinants is a little bit tricky. The definition I’ll use in this document is based off of a recursion.
There are other ways to define determinants, but for us, this one will suffice.

First we introduce some notation. Let A be an n × n matrix. Denote by Aij the (n − 1) × (n − 1) sub-matrix
of A that results from deleting the ith row jth column of A.

195
196 CHAPTER 5. DETERMINANTS

Example 5.1.1
 
2 7 1 0
 0 8 −2 6 
Let A =   . Determine A32 , A14 , A23 , A11 , A21 ?
 
 1 1 1 1 
1 2 −7 0

Solution. A32 is the 3 × 3 sub-matrix of A that results from deleting the third row and second column.
Thus,  
2 1 0
A32 =  0 −2 6  .
 

1 −7 0
Similarly,
       
0 8 −2 2 7 0 8 −2 6 7 1 0
A14 = 1 1 1 , A23 = 1 1 1 , A11 = 1 1 1 , A21 = 1 1 1 . ♦
       

1 2 −7 1 2 0 2 −7 0 2 −7 0

We now define determinants.

Definition 5.1.1: Cofactors and Determinants

Let A = [aij ] be an n × n matrix. The determinant of A, denoted det(A), is defined as follows:

1. If A is 1 × 1, say A = [a11 ], then det(A) = a11 .

" #
a11 a12
2. If A is 2 × 2, say A = , then det(A) = a11 a22 − a12 a21 .
a21 a22
A
3. Suppose A is n × n with n ≥ 3. The (i, j)-cofactor of A, denoted Cij , is the quantity
A i+j
Cij = (−1) det(Aij ). The determinant of A is
n
X
A
det(A) = a1j C1j .
j=1

This definition probably looks a bit dubious. How can we define cofactors based off of determinants, and
then define determinants based off of cofactors? Here is how it works: Cofactors of 3 × 3 matrices can be
calculated because they are determinants of 2 × 2 matrices. Therefore, we can calculate 3 × 3 determinants.
To calculate the determinant of a 4 × 4 matrix, we need to calculate the cofactors, which are determinants of
3 × 3 matrices. We know we can calculate these, therefore we can calculate determinants of 4 × 4 matrices.
The same goes for 5 × 5 matrices, 6 × 6 and etcetera.

For a general n × n matrix A, calculation of the determinant is based off of calculating n cofactors, each
of which is a determinant of an (n − 1) × (n − 1) matrix. To calculate each of these cofactors, we need to
5.1. CALCULATION OF DETERMINANTS 197

calculate n − 1 cofactors, each of which is a determinant of an (n − 2) × (n − 2) matrix. As you might guess,

this gets out of hand very quickly; its order of growth is n! Therefore, we need a more efficient method
for calculating determinants. We use a method called cofactor expansion. It is not at all obvious why this
method works and we take it as a fact in this course.

Theorem 5.1.1: Cofactor Expansion

Let A be an n × n matrix. Fix any i0 , j0 between 1 and n. Then,

n
X
det(A) = ai0 j CiA0 j , This is called cofactor expansion across the i0 th row, and
j=1

n
X
A
det(A) = aij0 Cij 0
, This is called cofactor expansion down the j0 th column.
i=1

Cofactor Expansion implies we can expand across/down which ever row/column we like to calculate det(A)
so we aren’t restricted to doing it along the first row. Generally, Cofactor Expansion greatly reduces the
amount of work needed to calculate det(A). The best strategy is to do Cofactor Expansion across/down the
row/column with the most zeroes.

Example 5.1.2
 
1 0 2 0
 0 −7 10 0 
Let A =   . Calculate det(A) using Cofactor Expansion.
 
 0 2 0 −3 
4 1 −1 0

Solution. The fourth column contains three zeroes, which is the most of any row or column in the matrix.
Therefore, we perform cofactor expansion down the fourth column.

4
X
A
det(A) = ai4 Ci4
i=1

A A A A
= a14 C14 + a24 C24 + a34 C34 + a44 C44

A A A A
= 0 · C14 + 0 · C24 − 3 · C34 + 0 · C44

A
= −3 · C34 .
198 CHAPTER 5. DETERMINANTS

Now we need to calculate the (3, 4)-cofactor for the matrix A. From definition, it is
 
 
 1 0 2 
A 3+4
 
C34 = (−1) det 
 0  = − det(B).
−7 10 
 4 1 −1 
 
| {z }
=B

To calculate det(B), we do cofactor expansion again. This time, there is at most one zero in any row or
column. We can pick any of these to do cofactor expansion across/down. Let’s do cofactor expansion across
the first row.
X3
B
det(B) = b1j C1j
j=1

B B B
= b11 C11 + b12 C12 + b13 C13

B B B
= 1 · C11 + 0 · C12 + 2 · C13

B B
= C11 + 2 · C13
We need to calculate the two above cofactors for B. We get
" #!
B −7 10
C11 = (−1)1+1 det = (7 − 10) = −3,
1 −1
" #!
B 0 −7
C13 = (−1)1+3 det = (0 − (−28)) = 28.
4 1

Therefore,
det(B) = −3 + 2(28) = 53.
Putting it all together with what we calculated above, we have

det(A) = −3(− det(B)) = 3 det(B) = 3(53) = 159. ♦

Next, we calculate the determinant of A from Example 5.1.2 using the definition of determinants to ensure
that Cofactor Expansion gives the correct answer. To ease the calculation, we give a formula for determi-
nants of 3 × 3 matrices. This formula can be derived using the definition of the determinant. The proof is
left as an exercise.

Exercise: 3x3 Determinant Formula

The determinant of a 3 × 3 matrix,
 
a11 a12 a13
A =  a21 a22 a23  .
 

a31 a32 a33

5.1. CALCULATION OF DETERMINANTS 199

is, det(A) = (a11 a22 a33 + a12 a23 a31 + a13 a21 a32 ) − (a13 a22 a31 + a12 a21 a33 + a11 a23 a32 ).

Example 5.1.3

Calculate the determinant of the matrix A from Example 5.1.2 using the definition of a determinant.

Solution. Using the definition of determinants, we have,

4
X
A
det(A) = a1j C1j
j=1

4
X
A
= a1j C1j
j=1

A A A A
= 1 · C11 + 0 · C12 + 2 · C13 + 0 · C14

A A
= C11 + 2C13 .

We need to calculate the two cofactors. Using the formula for a 3 × 3x determinant,

 
−7 10 0
A 1+1
C11 = (−1) det  2 0 −3  = 1 · (−9) = −9,
 

1 −1 0
 
0 −7 0
A
C13 = (−1)1+3 det  0 2 −3  = 1 · 84 = 84.
 

4 1 0

Therefore, det(A) = −9 + 2 · 84 = 168 − 9 = 159 which is the answer we arrived at in Example 5.1.2. ♦

Example 5.1.4
 
3 2 0 1
 4 0 1 2 
Find the determinant of A =  .
 
 3 0 2 1 
9 2 3 1

Solution. The second column has the most zeroes. Therefore, we do Cofactor Expansion down the second
200 CHAPTER 5. DETERMINANTS

column.    
4 1 2 3 0 1
det(A) = −2 det  3 2 1  + 2 det  4 1 2 
   

9 3 1 3 2 1

= −2((8 + 9 + 18) − (36 + 3 + 12)) + 2((3 + 0 + 8) − (3 + 12))

= −2(35 − 51) + 2(11 − 15)

= −2(−16) + 2(−4)

= 32 − 8

= 24. ♦
We end this section with the following result that follows immediately from Cofactor Expansion.

Corollary 5.1.1

Let A be an n × n matrix that has either a row or a column of all zeroes. Then det(A) = 0.

Proof. If A has a row/column of all zeroes, do Cofactor Expansion across/down this row/column. It follows
immediately from the formulas in Cofactor Expansion that det(A) = 0.
5.2. PROPERTIES OF DETERMINANTS 201

5.2 Properties of Determinants

In this section, we prove some properties of determinants. Many of the proofs in this section use induction.
Please read more about this proof technique here if you are unfamiliar with it.

5.2.1 Determinants of Triangular Matrices

Triangular matrices are those that have a specific form. Triangular matrices are very useful in linear algebra.

Definition 5.2.1: Triangular Matrices

Let A be an n × n matrix. A is called upper triangular if every entry below the main diagonal is
zero. A is called lower triangular if every entry above the main diagonal is zero. A triangular
matrix is a one that is either upper triangular or lower triangular.

Example 5.2.1

Echelon forms of matrices are upper triangular.

Note
The definition of triangular matrices places no restriction on what the entries on the main diagonal
or above/below the main diagonal can be. For example, the zero matrix is both upper and lower
triangular. So is the identity matrix In . Every elementary matrix is either upper or lower triangular.

Triangular matrices are useful because they’re easy to work with. Many of the quantities associated to them
are easy to calculate. For example, determinants of triangular matrices are very easy.

Theorem 5.2.1
Let A be an n × n triangular matrix. Then, the determinant of A is the product of the entries on the
main diagonal.

Example 5.2.2
 
2 0 0 0 0
0 2 0 0 0
 
 
 
Let A = 
 1 −10 8 0 0  . Calculate det(A).

−6 −6 2 3 0
 
 
0 −2 −3 8 10

Solution. The matrix is lower triangular. Therefore, by Theorem 5.2.1,

det(A) = 2 · 2 · 8 · 3 · 10 = 960. ♦
202 CHAPTER 5. DETERMINANTS

Example 5.2.3

In is an n × n matrix with all ones on the diagonal and zeroes elsewhere. Therefore In is triangular.
Then, by Theorem 5.2.1, det(In ) = 1, and

det(−In ) = (−1) · (−1) · . . . · (−1) = (−1)n .

| {z }
n-times

Proof. The proof for upper triangular matrices is similar for lower triangular matrices. Therefore, I only
give it for lower triangular matrices and leave the proof for upper triangular matrices to the reader.

The proof is by induction on n. If n = 1, this is trivial. If n = 2, write

" #
a 0
A= .
c d

By definition of determinants, det(A) = ad, the product of the entries on the main diagonal.

Now assume the determinant of any n × n lower triangular matrix A is equal to the product of the en-
tries on the main diagonal. This is the induction hypothesis. We must show that the determinant of any
(n + 1) × (n + 1) lower triangular matrix is the product of the entries along the main diagonal.

Let A be an (n + 1) × (n + 1) lower triangular matrix. Write,

 
a11 0 0 ... 0

 a21 a22 0 ... 0 

a31 a32 a33 . . . 0
 
A=  .
.. .. .. . . ..

 
 . . . . . 
an+1,1 an+1,2 an+1,3 . . . an+1,n+1
Performing Cofactor Expansion along the first row yields

det(A) = a11 (−1)1+1 det(A11 ) + 0(−1)1+2 det(A12 ) + . . . + 0(−1)1+n+1 det(A1,n+1 ) = a11 det(A11 ). (5.1)

A11 is equal to
 
a22 0 0 ... 0

 a32 a33 0 ... 0 

a42 a43 a44 ... 0
 
A11 = .
.. .. .. ..
 
 .. 
 . . . . . 
an+1,2 an+1,3 an+1,4 ... an+1,n+1 .
This is an n×n lower triangular matrix. Therefore, by the induction hypothesis, det(A11 ) = a22 a33 . . . ann an+1,n+1 .
Thus,
det(A) = a11 det(A11 ) = a11 a22 a33 . . . ann an+1,n+1
which completes the proof.
5.2. PROPERTIES OF DETERMINANTS 203

Exercise
Prove Theorem 5.2.1 for upper triangular matrices.

5.2.2 Determinants of Transposes

In this section, we show that the determinant of AT is equal to the determinant of A.

Theorem 5.2.2

Let A be an n × n matrix. Then, det(A) = det(AT ).

Proof. We use induction on n. If n = 1, then the result is trivial. If n = 2, then

" # " #
a b T a c
A= , and A = .
c d b d

Thus,
det(AT ) = ad − cb = ad − bc = det(A),
which verifies the base case.

Assume that det(A) = det(AT ) for all n × n matrices A. This is the induction hypothesis.

Let A be (n + 1) × (n + 1). Performing Cofactor Expansion along the first row of A yields
A A A
det(A) = a11 C11 + a12 C12 + . . . + a1n C1n .
A
Then, C1j = (−1)1+j det(A1j ) = (−1)1+j det((A1j )T ) by the induction hypothesis because A1j is n × n for
each j = 1, 2, . . . , n. Therefore,

det(A) = a11 (−1)1+1 det((A11 )T ) + a12 (−1)1+2 det((A12 )T ) + . . . + a1n (−1)1+n det((A1n )T ). (5.2)

Fix 1 ≤ p ≤ n and 1 ≤ q ≤ n. Then,

Apq = [aij ]
with i 6= p, j 6= q and,
(AT )qp = [aji ]
with j 6= q, i 6= p. Thus,
(Apq )T = [aji ]
with i 6= p, j 6= q so that (AT )qp = (Apq )T . Applying this to Equation (5.2) gives

det(A) = a11 (−1)1+1 det((AT )11 ) + a12 (−1)1+2 det((AT )21 ) + . . . + a1n (−1)1+n det((AT )n1 ).

Since a1j are the elements in the first column of AT , the above expression is just the cofactor expansion for
det(AT ) down the first column of AT . Hence,

det(A) = a11 (−1)1+1 det((AT )11 ) + a12 (−1)1+2 det((AT )21 ) + . . . + a1n (−1)1+n det((AT )n1 ) = det(AT )
204 CHAPTER 5. DETERMINANTS

and we are done.

5.2.3 Elementary Row Operations

Elementary row operations and determinants play nicely together. In this section, we see how elementary
row operations affect the determinant.

Theorem 5.2.3

Let A be an n × n matrix. Let B be a matrix produced by swapping two rows of A. Then det(B) =
− det(A).

Note

I have adapted this proof from [1].

Proof. It suffices to show the result after swapping row 1 with the kth row for some k > 1. This is
because swapping any 2 rows, say the kth row with the `th row is equivalent to doing three row swaps
R1 ⇐⇒ Rk , R1 ⇐⇒ R` , R1 ⇐⇒ Rk .

We use induction on n. If n = 1, this is trivial because no row swaps can be made. If n = 2, write
" #
a b
A= .
c d

Then, det(A) = ad − bc. Swapping the only two rows we can yields
" #
c d
B= .
a b

Thus, det(B) = bc − ad = −(ad − bc) = − det(A). This verifies the base case.

Now assume if A is n × n, then making the row swap R1 ⇐⇒ Rk produces a matrix B with det(B) =
− det(A). This is the induction hypothesis.

Let A be (n + 1) × (n + 1) and suppose B is the matrix obtained by performing R1 ⇐⇒ Rk on A. Using

Cofactor Expansion down the first column, we calculate the determinants of A and B:
n+1
X n+1
X
det(A) = ai1 (−1)i+1 det(Ai1 ), and det(B) = bi1 (−1)i+1 det(Bi1 ).
i=1 i=1

If i 6= 1, k, then it is obvious Bi1 differs from Ai1 by one row interchange. Therefore, det(Ai1 ) = − det(Bi1 )
by the induction hypothesis since Ai1 and Bi1 are n × n matrices.
The result will follow given we can show

−ak1 (−1)k+1 det(Ak1 ) = b11 (−1)1+1 det(B11 ), mboxand − a11 (−1)1+1 det(A11 ) = bk1 (−1)k+1 det(Bk1 ).
5.2. PROPERTIES OF DETERMINANTS 205

For the first equation, notice b11 = ak1 by construction of B. Consider Ak1 . Then, we can swap the rows of
Ak1 as follows in order to transform Ak1 into B11 : swap row 1 with row 2, then that row with row 3, then
 we getto row k − 1. We will show how this works by performing each
that row with row 4, and iterate until
a12
 a22 
 
 .. 
.
 
 
 
swap on the first column of Ak1 :   k−1,2  .
a 
 ak+1,2 
 

 .. 

 . 
an,2

 
a12
a22
 
 
 .. 
.
 
 
 
 ak−1,2 
 
 ak+1,2
 


 .. 

 . 
an,2
 
a22
 a12
 

 .. 
.
 
 
 
=⇒  a
 k−1,2
 right]

 ak+1,2
 


 .. 

 . 
an,2
 
a22
 
 a32 
 
 a12 
 
 .. 
 . 
=⇒ 
 

 ak−1,2 
 
 ak+1,2 
 
 .. 

 . 

an,2
206 CHAPTER 5. DETERMINANTS
 
a22
a32
 
 
 

 a42 


 a12 

 .. 
=⇒  .
 

 
 ak−1,2 
 
 a 
 k+1,2 
 .. 
.
 
 
an,2
..
.
 
a22
a32
 
 
 

 a42 

 .. 

 . 

=⇒  ak−1,2 .
 
 
 a12 
 
 a 
 k+1,2 
 .. 
.
 
 
an,2

This is precisely the first column of B11 . To move the first row down to the (k − 1)th position, we performed
k − 2 different row swaps. Since Ak1 is n × n, repeated application of the induction hypothesis implies that

det(B11 ) = (−1)k−2 det(Ak1 ) = (−1)k det(Ak1 ).

Thus,
b11 (−1)2 det(B11 ) = ak1 (−1)k det(Ak1 ) = −(ak1 (−1)k+1 det(Ak1 ))

which is what we wanted to prove. The second equation we need to prove is true by the same argument as
above replacing Ak1 with Bk1 and B11 with A11 . This proves the theorem.

In view of Theorem 5.2.3, we have the following result.

Corollary 5.2.1

Let A be an n × n matrix. If A has two identical rows or columns, then det(A) = 0.

Proof. Suppose A has two identical rows. Let B be the matrix that results from interchanging these two
rows. Then, det(A) = − det(B). But the two rows are identical, thus B = A and so, det(A) = − det(A).
This implies det(A) = 0. The proof for two identical columns follows from the fact that det(A) = det(AT ).
I leave the details to the reader.
5.2. PROPERTIES OF DETERMINANTS 207

Exercise
Finish the proof of Corollary 5.2.1.

Showing how the other two row operations change the determinant is much easier. First, we need a lemma.

Lemma 5.2.1

Let A = [aij ] be an n × n matrix. Consider the sum

A A
ai1 Cj1 + . . . + ain Cjn

A
where Cji denotes the (j, i)-cofactor of A. If i 6= j, then this sum is zero.

Proof. Fix i0 and j0 between 1 and n with i0 6= j0 . It suffices to show

ai0 ,1 CjA0 1 + . . . + ai0 ,n CjA0 ,n = 0.

Let B = [bij ] be the matrix defined by


a
ij if i 6= j0
bij =
ai if i = j0
0j

Then, the i0 th row of B is the same as the j0 th. Therefore, by Corollary 5.2.1, det(B) = 0. Since A differs
from B by the j0 th row, and we are deleting the j0 th row in the calculation of CjA0 k for any k between 1
and n, it follows that the (j0 , k)-cofactors for B are the same as they are for A. Hence, taking the cofactor
expansion for B along the j0 th row gives

0 = det(B) = bj0 ,1 CjA0 ,1 + . . . + bj0 ,n CjA0 ,n = ai0 ,1 CjA0 ,1 + . . . + ai0 ,n CjA0 ,n ,

which completes the proof.

Theorem 5.2.4
Let A be an n × n matrix.

1. If we perform Ri ⇒ Ri + cRj , where c 6= 0, on A to produce B, then det(B) = det(A).

2. If we perform Ri ⇒ cRi , where c 6= 0, on A to produce B, then det(B) = c det(A).

Proof.

1. Suppose B is produced from A by performing Ri ⇒ Ri + cRj on A where c 6= 0. Then, B differs from

A by the ith row. Doing Cofactor Expansion along the ith row of B gives

B B B
det(B) = (caj1 + ai1 )Ci1 + (caj2 + ai2 )Ci2 + . . . + (cajn + ain )Cin .
208 CHAPTER 5. DETERMINANTS

Expanding and rearranging gives

B B B B B B
det(B) = (ai1 Ci1 + ai2 Ci2 + . . . + ain Cin ) + c(aj1 Ci1 + aj2 Ci2 + . . . + ajn Cin ). (5.3)

Since we are doing cofactor expansion along the ith row of B, and A differs only from B in this row,
B
the (i, j)-cofactors in Equation (5.3) for B are the same as they are for A. Therefore, (ai1 Ci1 + ... +
B B
ain Cin ) = det(A) as this is just the cofactor expansion of A along the ith row, and aj1 Ci1 + . . . +
B
ajn Cin = 0 by Lemma 5.2.1 since i 6= j. Thus, Equation (5.3) implies det(B) = det(A).

2. Suppose B is produced from A by performing Ri ⇒ cRi on A with c 6= 0. Then, B differs from A by

the ith row. Therefore, the (i, j)-cofactors of B are the same as the (i, j)-cofactors of A. Thus, doing
Cofactor Expansion along the ith row of B gives

B B A A A A
det(B) = bi1 Ci1 + . . . + bin Cin = cai1 Ci1 + . . . + cain Cin = c(ai1 Ci1 + . . . + ain Cin ) = c det(A).

Echelon forms of matrices are triangular and we know how elementary row operations affect determinants.
Therefore, we can calculate the determinant of a matrix by row reducing it to echelon form, making the
appropriate alterations along the way, and then using Theorem 5.2.1.

Example 5.2.4
 
−2 2 −1
Let A =  3 9/2 −1  . Calculate det(A).
 

1 −11/2 2

Solution. First use Gauss-Jordan Elimination to put A into echelon form.

 
1 −11/2 2
∼
A |{z}  3 9/2 −1  = A1
 
R1 ⇐⇒ R3 −2 2 −1

 
1 −11/2 2
A1 ∼  0 21 −7  = A2
 
|{z}
R2 ⇒R2 −3R1 ,R3 ⇒R3 +2R1 0 −9 3
 
1 −11/2 2
A2 ∼ 0 −3 1  = A3
 
|{z} 
R2 ⇒−(1/7)R2 0 −9 3
 
1 −11/2 2
A3 ∼  0 −3 1  = A4 .
 
|{z}
R3 ⇒(1/3)R3 0 −3 1
5.2. PROPERTIES OF DETERMINANTS 209

Since A4 has two identical rows, det(A4 ) = 0 by Corollary 5.2.1. From Theorems 5.2.3 and 5.2.4, the
determinant of A4 is related to the determinant of A as follows:
1
det(A4 ) = · det(A3 ) by part 2 of Theorem 5.2.4,
3

1
det(A3 ) = − · det(A2 ) by part 2 of Theorem 5.2.4,
7

det(A2 ) = det(A1 ) by part 1 of Theorem 5.2.4,

det(A1 ) = − det(A) by Theorem 5.2.3.

Tracing back through each step, we get

1 1 1 1
0 = det(A4 ) = · det(A3 ) = − · det(A2 ) = − · det(A1 ) = · det(A).
3 21 21 21
Thus, det(A) = 21 · 0 = 0. ♦

Example 5.2.5
 
2 7 6 1
 1 2 9 −1 
Use elementary row operations to calculate the determinant of A =  .
 
 8 6 2 −1 
2 3 3 0

Solution. The following operations put A into echelon form:

R1 ⇐⇒ R2 ; R2 ⇒ R2 − 2R1 ; R3 ⇒ R3 − 8R1 ; R4 ⇒ R4 − 2R1 ;

R2 ⇒ (1/3)R2 ; R3 ⇒ R3 + 10R2 ; R4 ⇒ R4 + R2 ; R4 ⇒ R4 − (19/110)R3 .

This gives the matrix
 
1 2 9 −1
 0 1 −4 1 
B= .
 
 0 0 −110 17 
0 0 0 7/110
By Theorem 5.2.1, det(B) = −7. The only row operations we performed that effect the determinant of A
were the one row swap, and the one row scaling. Therefore,
1
−7 = det(B) = − det(A0 ) = − · det(A),
| {z } 3 {z
row swap | }
row scaling

and hence, det(A) = (−3) · (−7) = 21. ♦

We can use this technique to calculate determinants of matrices whose entries aren’t specified.
210 CHAPTER 5. DETERMINANTS

Example 5.2.6
   
a b c 2(g + a) 2(h + b) 2(i + c)
Let A =  d e f ,B =  −a −b −c . If det(A) = 2, calculate det(B).
   

g h i d e f

Solution. We have  
d e f
A |{z}∼  a b c  = A1
 
R1 ⇐⇒ R2 g h i
 
g h i
A1 |{z}∼  a b c  = A2
 
R1 ⇐⇒ R3 d e f
 
g+a h+b i+c
A2 ∼  a b c  = A3
 
|{z}
R1 ⇒R1 +R2 d e f
 
2(g + a) 2(h + b) 2(i + c)
∼ 
A3 |{z} a b c  = A4
 
R1 ⇒2R1 d e f
 
2(g + a) 2(h + b) 2(i + c)
∼ 
A4 |{z} −a −b −c  = B.
 
R2 ⇒−R2 d e f
Using Theorems 5.2.3 and 5.2.4 yields

det(B) = − det(A4 ) = −2 det(A3 ) = −2 det(A2 ) = 2 det(A1 ) = −2 det(A).

Thus, if det(A) = 2, we have det(B) = −4. ♦

The following are immediate from the results in this section. The proof o these are left as exercises to the
reader.

Corollary 5.2.2

Let A be an n × n matrix.

1. Let B be a matrix row equivalent to A. Then, det(B) = C det(A) for some non-zero C ∈ R.

2. Let c ∈ R be a non-zero scalar. Then, det(cA) = cn det(A).

Exercise
Prove Corollary 5.2.2.
5.2. PROPERTIES OF DETERMINANTS 211

Example 5.2.7

Let A be n × n. Suppose that we perform row operations on A to get B and that det(B) = 0. Then,
applying part 1 of Corollary 5.2.2, we immediately conclude that det(A) = 0, so we don’t have to
trace back through each row operation.

5.2.4 Determinants and Invertibility

An important property of the determinant of a matrix is that it is zero if and only if the matrix is not
invertible.

Theorem 5.2.5

An n × n matrix A is invertible if and only if det(A) 6= 0.

Proof. Suppose A is invertible. Then, the RREF of A is equal to In by part 2 of The Invertible Matrix
Theorem. By part 1 of Corollary 5.2.2, det(A) = C det(In ) = C where C is some non-zero scalar. Thus,
det(A) 6= 0.

We prove the converse with contraposition. Assume A is not invertible. Let B be the RREF of A. By part 3
of The Invertible Matrix Theorem, B does not have a pivot in every row and, therefore, must contain a row
of zeroes. By Corollary 5.2.1, det(B) = 0. By part 1 of Corollary 5.2.2, det(A) = C det(B) for a non-zero
scalar C. Hence, det(A) = 0.

In view of Theorem 5.2.5, we can add another condition to The Invertible Matrix Theorem.

Theorem 5.2.6: The Invertible Matrix Theorem

Let A be an n × n matrix. Let FA : Rn → Rn be the linear transformation whose standard matrix is
A. Let B be an echelon form of A. The following are equivalent.

1. A is invertible.

2. The RREF of A is In .

3. Then B has n pivot positions.

4. The homogeneous equation A~x = ~0 has only the trivial solution.

5. The columns of A are linearly independent.

6. FA is one-to-one.

7. The equation A~x = ~b has exactly one solution for each ~b ∈ Rn .

8. The columns of A span Rn .

212 CHAPTER 5. DETERMINANTS

9. FA is onto.

10. There is an n × n matrix C such that AC = In .

11. There is an n × n matrix D such that DA = In .

12. AT is invertible.

13. The columns of A form a basis for Rn .

14. Col(A) = Rn .

15. rank(A) = n.

16. nul(A) = 0.
n o
17. Null(A) = ~0 .

18. dim(Row(A)) = n.

19. Row(A) = Rn .

20. det(A) 6= 0.

5.2.5 The Multiplicative Property of Determinants

One of the most important properties of determinants is that they are multiplicative. This means that if
A and B are n × n matrices, then det(AB) = det(A) det(B). Before we show this in general, we show it for
a matrix A multiplied by an elementary matrix.

Lemma 5.2.2

Let A be an n × n matrix and let E be any n × n elementary matrix. Then, det(EA) = det(E) det(A).

Proof. First suppose E is an elementary matrix produced from In by swapping the ith and jth rows. Then
det(E) = − det(In ) = −1 by Theorem 5.2.3. The product EA is the same matrix as A after Ri ⇐⇒ Rj .
Thus, det(A) = − det(EA) by Theorem 5.2.3 and so

det(EA) = − det(A) = det(E) det(A).

Now suppose E is an elementary matrix obtained from In by multiplying the ith row of In by a non-zero scalar
c. Then, det(E) = c det(In ) = c by Theorem 5.2.4. Moreover, EA is the matrix produced by multiplying
the ith row of A by c. Thus, det(EA) = c det(A) by Theorem 5.2.4. Hence,

det(EA) = c det(A) = det(E) det(A).

Finally, suppose E is an elementary matrix produced from In by adding c times row j to row i. Then
det(E) = det(In ) = 1 by Theorem 5.2.4. Moreover, EA is the matrix produced by performing Ri ⇒ Ri +cRj
5.2. PROPERTIES OF DETERMINANTS 213

on A. Thus, det(EA) = det(A) by Theorem 5.2.4. Hence,

det(EA) = det(A) = det(E) det(A).

Lemma 5.2.2 generalizes to any finite set of elementary matrices using induction. That is, if E1 , . . . , Ek is
any sequence of elementary matrices, then

det(Ek . . . E1 A) = det(Ek ) det(Ek−1 ) . . . det(E2 ) det(E1 ) det(A).

We are now in a position to prove the multiplicative property of determinants.

Theorem 5.2.7: Multiplicative Property of Determinants

If A and B are n × n, then det(AB) = det(A) det(B).

Proof. If A and B are n × n matrices with at least one of A or B not invertible, then AB is not invertible.
We leave the proof of this as an exercise. In this case, it follows that det(AB) = 0 = det(A) det(B) by
Theorem 5.2.5.

Now assume both A and B are invertible. By part 2 of The Invertible Matrix Theorem, the RREF of A is In .
Therefore, there exists a sequence of elementary matrices E1 , E2 , . . . , Ep−1 , Ep such that Ep Ep−1 . . . E2 E1 A =
In . Thus,

A = E1−1 E2−1 . . . Ep−1

−1
Ep1 In = E1−1 . . . Ep−1 ⇒ AB = E1−1 E2−1 . . . Ep−1
−1
Ep−1 B.

By Theorem 3.2.7, Ei−1 is an elementary matrix for each i = 1, 2, . . . , p. Taking determinants of both sides
gives
det(AB) = det(E1−1 E2−1 . . . Ep−1
−1
Ep−1 B)

= det(E1−1 ) det(E2−1 ) . . . det(Ep−1

−1
) det(Ep−1 ) det(B) by Lemma 5.2.2,

= det(E1−1 E2−1 . . . Ep−1

−1
Ep−1 ) det(B) by Lemma 5.2.2,

= det(A) det(B)

which completes the proof.

Exercise
Prove that if A and B are n × n matrices with one of A or B not invertible, then the product AB is
not invertible.
214 CHAPTER 5. DETERMINANTS

Warning!
In general, the property det(A + B) = det(A) + det(B) does not hold. Do not confuse this with the
multiplicative property.

The multiplicative property has some interesting consequences.

Corollary 5.2.3

Let A and B be two n × n matrices. Then,

1. det(AB) = det(BA) (even if AB 6= BA);

1
2. If A is invertible, then det(A−1 ) = .
det(A)

Proof. We prove 1 and leave the proof of the second as an exercise for the reader. By 5.2.7, we have

det(AB) = det(A) det(B).

But det(A), det(B) are real numbers so we can swap their order of multiplication. Therefore,

det(AB) = det(A) det(B) = det(B) det(A) = det(BA)

where the last equality follows from another application Theorem 5.2.7.

Exercise
Prove the second part of Corollary 5.2.3.
5.3. CRAMER’S RULE 215

5.3 Cramer’s Rule

In this section, we introduce an identity involving determinants of matrices called Cramer’s Rule. It gives
a new method for calculating the solution to A~x = ~b, and also a new formula for the inverse of a matrix.
The methods we have already seen to solve these two problems are more efficient, so Cramer’s Rule isn’t
that great in application. However, it is quite useful in theory.

5.3.1 Cramer’s Rule

Let A = [ ~a1 ~a2 . . . ~an ] be n × n and let ~b ∈ Rn . For any i ∈ {1, 2, . . . , n}, the matrix resulting from
replacing the ith column of A with the vector ~b is denoted by Ai (~b); that is
h i
Ai (~b) = ~a1 ~a2 . . . ~ai−1 ~b ~ai+1 . . . ~an .

Example 5.3.1
   
−2 4 6 9 −9
 9 0 1 1   0 
Let A =   and ~b =   . Then,
   
 −1 2 3 −2   0 
−8 −7 6 6 2
   
−9 4 6 9 −2 −9 6 9
 0 0 1 1   9 0 1 1 
A1 (~b) =  , A2 (~b) =  ,
   
 0 2 3 −2   −1 0 3 −2 
2 −7 6 6 −8 2 6 6

   
−2 4 −9 9 −2 4 6 −9
 9 0 0 1   9 0 1 0 
A3 (~b) =  , A4 (~b) =  .
   
 −1 2 0 −2   −1 2 3 0 
−8 −7 2 6 −8 −7 6 2

Cramer’s rule is as follows.

Theorem 5.3.1: Cramer’s Rule

Let A be an invertible n × n matrix, let ~b ∈ Rn . Let ~v ∈ Rn denote the unique solution to A~x = ~b,
and write  
v1
 v2
 

~v = 
 ..
.

 . 
vn
216 CHAPTER 5. DETERMINANTS

Then,
det(Ai (~b))
vi = , for each i ∈ {1, 2, . . . , n} .
det(A)

Example 5.3.2

Use Cramer’s Rule to solve A~x = ~b where

" # " #
−2 1 5
A= , and ~b = .
3 8 −1

Solution. We have " # " #

5 1 −2 5
A1 (~b) = , and A2 (~b) = .
−1 8 3 −1

Thus,
det A1 (~b) = 40 + 1 = 41, and det A2 (~b) = 2 − 15 = −13,

and so,
det(A1 ~b) 41 det A2 (~b) 13
v1 = =− , and v2 = = .
det(A) 19 det(A) 19

Therefore, by Cramer’s Rule, the solution to A~x = ~b is

" #
1 −41
~v = . ♦
19 13

Example 5.3.3

Use Cramer’s Rule to solve A~x = ~b where

   
−2 −11 2 0
A= 1 1 1 , and ~b =  3  .
   

4 1 1 1

Solution. We have
     
0 −11 2 −2 0 2 −2 −11 0
~
A1 (b) =  3

1 1 ,
 ~
A2 (b) =  1

3 1 ,
 ~
and A3 (b) =  1

1 3 .


1 1 1 4 1 1 4 1 1

Thus,
det(A) = −39, det A1 (~b) = 26, det A2 (~b) = −26, and det A3 (~b) = −117,
5.3. CRAMER’S RULE 217

and so,

det(A1 ~b) 26 2 det A2 (~b) 26 2 det A3 (~b) 117
v1 = =− =− , v2 = = = , and v3 = = = 3.
det(A) 39 3 det(A) 39 3 det(A) 39

Therefore, by Cramer’s Rule, the solution to A~x = ~b is

 
−2
1
~v =  2  . ♦

3
9

Proof. Write A = [ ~a1 ~a2 . . . ~an ] and let {~e1 , ~e2 , . . . , ~en } denote the standard basis of Rn . Then, In =
[ ~e1 ~e2 . . . ~en ] and,

A · (In )i (~v ) = A · [ ~e1 ~e2 . . . ~ei−1 ~v ~ei+1 . . . ~en ]

= [ A~e1 A~e2 . . . A~ei−1 A~v A~ei+1 . . . A~en ]

h i
= ~a1 ~a2 . . . ~ai−1 ~b ~ai+1 . . . ~an

= Ai (~b).
Therefore,
det(Ai (~b))
det(A · (In )i (~v )) = det(Ai (~b)) ⇒ det(A) det (In )i (~v ) = det(Ai (~b)) ⇒ det (In )i (~v ) =

.
det(A)
Doing Cofactor Expansion along the ith row of (In )i (~v ) yields

det (In )i (~v ) = vi det(In−1 ) = vi for each i ∈ {1, 2, . . . , n} .

Hence,
det(Ai (~b))
vi = for each i ∈ {1, 2, . . . , n} .
det(A)
This method for computing solutions to A~x = ~b isn’t too bad for small matrices but, if the matrix is large,
then there are too many determinant calculations to justify using Cramer’s Rule in practice. That said,
Cramer’s Rule is used in theory a lot. In particular, there are a number of proofs in algebraic number theory
that make use of it.

5.3.2 A New Formula for A−1

We use Cramer’s Rule to derive a formula for the inverse of an invertible matrix A. Let {~e1 , ~e2 , . . . , ~en }
denote the standard basis of Rn and write
 
b11 b12 ... b1n
i  b21 b22 ... b2n
h 
A−1 = ~b1 ~b2 . . . ~bn = 
 
.
 .. .. .. ..
.

 . . . 
bn1 bn2 ... bnn
218 CHAPTER 5. DETERMINANTS

Then, h i
AA−1 = In ⇒ A~b1 A~b2 . . . A~bn = [ ~e1 ~e2 . . . ~en ] ,

so that ~bj is the unique solution to A~x = ~ej for each j ∈ {1, 2, . . . , n}. Therefore, by Cramer’s Rule, the
(i, j)-entry of A−1 , bij , is given by
det (Ai (~ej ))
bij = , for all i, j ∈ {1, 2, . . . , n} .
det(A)
Compute det (Ai (e~j )) using Cofactor Expansion down the ith column of Ai (~ej ). Because the only non-zero
entry in this column is a 1 in the jth position, it follows that

det (Ai (~ej )) = (−1)i+j det(Aji ) = Cji

A
.

Therefore,
A
Cji
bij = for all i, j ∈ {1, 2, . . . , n} ,
det(A)
and so, a formula for A−1 is
 A A A

C11 C21 ... Cn1
A A A
C12 C22 ... Cn2
 
−1 1  
A =  .. .. .. .. .
det(A) 
 . . . .


A A A
C1n C2n ... Cnn

Definition 5.3.1: Adjugate Matrix

A
Let A be an n × n matrix with (i, j)-cofactors Cij . Then, the adjugate of A, denoted adj(A), is the
matrix  A 
A A
C11 C21 . . . Cn1
 A A A 
 C12 C22 . . . Cn2 
adj(A) =   .. .. .. .
.. 
 . . . . 
A A A
C1n C2n ... Cnn

Using the above notation, we have proved that

1
A−1 = adj(A).
det(A)

Example 5.3.4

Let  
1 −1 2
A= 0 2 1 .
 

2 0 4
i) Calculate adj(A).
5.3. CRAMER’S RULE 219

ii) Use adj(A) to calculate A−1 .

Solution.

i) The cofactors of A are,

" #! " #!
A 2 1 A 0 1
C11 = det(A11 ) = det = 8, C12 = − det(A12 ) = − det = 2,
0 4 2 4
" #! " #!
A 0 2 A −1 2
C13 = det(A13 ) = det = −4, C21 = − det(A21 ) = − det = 4,
2 0 0 4
" #! " #!
A 1 2 A 1 −1
C22 = det(A22 ) = det = 0, C23 = − det(A23 ) = − det − 2,
2 4 2 0
" #! " #!
A −1 2 A 1 2
C31 = det(A31 ) = det = −5, C32 = − det(A32 ) = − det − 1,
2 1 0 1
" #!
A 1 −1
C33 = det(A33 ) = det = 2.
0 2

Therefore,    
A A A
C11 C21 C31 8 4 −5
 A A A =
adj(A) =  C12 C22 C32   2 0 −1  .

A A A
C13 C23 C33 −4 −2 2

ii) Using the formula for a 3 × 3 matrix,

det(A) = (8 − 2 + 0) − (8 + 0 + 0) = −2.

Therefore,  
8 4 −5
1 1
A−1 = adj(A) = −  2 0 −1  . ♦

det(A) 2
−4 −2 2
220 CHAPTER 5. DETERMINANTS

5.4 Determinants as Areas and Volumes

In this section, we see how determinants are related to areas and volumes of parallelograms and paral-
lelepipeds respectively. We also see a neat application of determinants in calculating areas and volumes of
regions after a linear transformations

5.4.1 Determinants as Areas and Volumes

Any pair of linearly independent vectors ~v1 , ~v2 ∈ R2 define a unique parallelogram with vertices at (0, 0),
and the points defined by ~v1 , ~v2 , and ~v1 + ~v2 . Denote this parallelogram by P(~v1 , ~v2 ).

Theorem 5.4.1

Let ~v1 , ~v2 ∈ R2 and let A = [ ~v1 ~v2 ] . Let P(~v1 , ~v2 ) be the parallelogram in R2 determined by ~v1 and
~v2 . Then,
area(P(~v1 , ~v2 )) = |det(A)| .

Example 5.4.1
" # " #
−2 3
Calculate the area of the parallelogram defined by the vectors ~v1 = and ~v2 = .
1 1

Solution. A graph of P(~v1 , ~v2 ), ~v1 , and ~v2 is,

By Theorem 5.4.1, the area of this parallelogram is,

" #!
−2 3
det = −2(1) − 3(1) = −5. ♦
1 1
5.4. DETERMINANTS AS AREAS AND VOLUMES 221

Proof. If {~v1 , ~v2 } is linearly dependent, then P(~v1 , ~v2 ) is a line, hence has zero area, and det(A) = 0, so the
theorem is verified. Therefore, we assume {~v1 , ~v2 } is a linearly independent set. Write
" # " #
a b
~v1 = , and ~v2 = .
c d

Let Fθ : R2 → R2 be a linear transformation that rotates vectors counterclockwise through an angle θ where
θ is the angle between ~v1 and the positive horizontal axis. Then,
" #! " # " #! " #
a a0 b b0
Fθ = = ~u1 , and Fθ = = ~u2 ,
c 0 d d0

where a0 > 0. Moreover, rotations preserve area, so

area(P(~v1 , ~v2 )) = area(P(~u1 , ~u2 )).

Let B = [ ~u1 ~u2 ] . Then the base length of P(~u1 , ~u2 ) is a0 , and the vertical height is |d0 |, so
" #!
0 0 0 0 a0 b0
area(P(~u1 , ~u2 )) = a |d | = |a d | = det = |det(B)| ,
0 d0

where the second equality follows because a0 > 0.

Let Rθ be the standard matrix for Fθ . Since Fθ is a rotation, it follows that Rθ has the form,
" #
cos(θ) − sin(θ)
Rθ = .
sin(θ) cos(θ)

Now,
B = [ ~u1 ~u2 ] = [ Fθ (~v1 ) Fθ (~v2 ) ] = [ Rθ ~v1 Rθ ~v2 ] = Rθ A.

Hence,
|det(B)| = |det(Rθ A)| = |det(Rθ ) det(A)| = |det(Rθ )| · |det(A)| .

Finally,
det(Rθ ) = cos2 (θ) − (− sin2 (θ)) = cos2 (θ) + sin2 (θ) = 1.

which completes the proof.

Similarly, given ~v1 , ~v2 , ~v3 ∈ R3 , we denote the parallelepiped having ~v1 , ~v2 , ~v3 as edges by P(~v1 , ~v2 , ~v3 ). The
volume of this parallelepiped is also related to determinants of matrices.

Theorem 5.4.2

Let ~v1 , ~v2 , ~v3 ∈ R3 and let P(~v1 , ~v2 , ~v3 ) be the parallelepiped defined above. Let A = [ ~v1 ~v2 ~v3 ] .
222 CHAPTER 5. DETERMINANTS

Then,
volume(P(~v1 , ~v2 , ~v3 )) = |det(A)| .

Proof. We borrow the following fact from calculus:

Fact: Let ~v1 , ~v2 , ~v3 ∈ R3 . Then,

volume(P(~v1 , ~v2 , ~v3 )) = |~v1 · (~v2 × ~v3 )|

where · denotes the vector dot product, and × denotes the vector cross product.

Write
 
v11 v12 v13
A = [ ~v1 ~v2 ~v3 ] =  v21 v22 v23  .
 

v31 v32 v33

Then,
 
v22 v33 − v32 v23
~v2 × ~v3 =  v32 v13 − v12 v33  ,
 

v12 v23 − v22 v13

and

~v1 · (~v2 × ~v3 ) = v11 (v22 v33 − v32 v23 ) + v21 (v32 v13 − v12 v33 ) + v31 (v12 v23 − v22 v13 )
= (v11 v22 v33 + v21 v13 v32 + v31 v12 v23 ) − (v31 v22 v13 + v12 v21 v33 + v11 v23 v32 )
= det(A).

Therefore, by the fact,

volume(P(~v1 , ~v2 , ~v3 )) = |~v1 · (~v2 × ~v3 )| = |det(A)| .

5.4.2 Linear Transformations

In this section, we consider areas of finite areas and volumes in R2 and R3 respectively and see how they
change under linear transformations. Start by considering a subset S ⊆ R2 and let F : R2 → R2 be a linear
transformation. Define
F (S) = {F (~v ) : ~v ∈ S} ⊆ R2 .

The set F (S) is called the image of S under the linear transformation F , or S under F for short.
A similar definition holds for subsets of R3 and linear transformations mapping R3 to itself. Let’s see what
some of these image sets look like.
5.4. DETERMINANTS AS AREAS AND VOLUMES 223

Example 5.4.2

Let S1 denote the unit circle in R2 :

( " # )
v1
S1 = ~v = : v12 + v22 =1 .
v2

Let F : R2 → R2 be the linear transformation whose standard matrix is given by

" #
a 0
A=
0 b

where a, b > 0 are real numbers. Then, F scales vectors by a factor of a in the horizontal direction,
and scales vectors by a factor of b in the vertical direction. Therefore, F (S1 ) is an ellipse centred at
the origin whose horizontal radius is a and whose vertical radius is b.

Example 5.4.3

Let S2 be the unit sphere in R3 :

   

 v1 

3 2 2 2
S2 = ~v =  v2  ∈ R : v1 + v2 + v3 = 1 .
 
 
v3
 

Let Fx1 ,θ : R3 → R3 be the linear transformation that rotates vectors in R3 about the x1 -axis
counter-clockwise through an angle of θ. Then, the standard matrix for Fx1 ,θ is
 
1 0 0
Ax1 ,θ =  0 cos(θ) − sin(θ) 
 

0 sin(θ) cos(θ)

and Fx1 ,θ (S2 ) is a rotated S2 , which is the same as S2 .

Linear transformations play nicely with parallelograms/parallelepipeds as the next result shows.

Theorem 5.4.3

Let F : R2 → R2 and G : R3 → R3 be linear transformations with standard matrices A and B

respectively. Let P be a parallelogram in R2 and let P be a parallelepiped in R3 . Then,

area(F (P)) = |det(A)| · area(P),

and
volume(G(P)) = |det(B)| · volume(P).
224 CHAPTER 5. DETERMINANTS

Example 5.4.4
" # " #
−2 3
Let ~v1 = and ~v2 = . Let F be the linear transformation whose standard matrix is
1 1
" #
a 0
where a and b are non-zero real numbers. Calculate the area of F (P(~v1 , ~v2 )).
0 b

Solution. We’ve already calculated that P(~v1 , ~v2 ) has area 5. Therefore,
" #!
a 0
area(F (P(~v1 , ~v2 )) = det · 5 = 5 |ab| .
0 b
" #
3 0
Here is an example of the transformation applied to P(~v1 , ~v2 ).
0 −1

Here, the black parallelogram is P(~v1 , ~v2 ) and the blue parallelogram is the transformation. The transformed
parallelogram has area 15. ♦

Proof. Given any parallelogram P in R2 , there is a vector p~ ∈ R2 that translates P so its leftmost vertex
is at the origin. This new parallelogram P + p~ has the same area as P and, moreover, because F is linear,
we have F (P + p~) = F (P) + F (~
p). Therefore,

p + P)) = area(F (~
area(F (~ p) + F (P)) = area(F (P)),

where the last equality follows because F (~

p) + F (P) is a translation of F (P) by the vector F (~
p), so the area
does not change.
5.4. DETERMINANTS AS AREAS AND VOLUMES 225

Let ~v1 , ~v2 be the two vectors that define the parallelogram P + p~, and let M1 = [ ~v1 ~v2 ]. Then,

P + p~ = {s1~v1 + s2~v2 : 0 ≤ s1 ≤ 1, 0 ≤ s2 ≤ 1} .

Therefore,
F (P + p~) = {s1 F (~v1 ) + s2 F (~v2 ) : 0 ≤ s1 ≤ 1, 0 ≤ s2 ≤ 1}
is a parallelogram defined by the vectors F (~v1 ) and F (~v2 ). Let M2 = [ F (~v1 ) F (~v2 ) ], so that M2 = AM1 .
Then, by Theorem 5.4.1,

where the last equality follows because translation does not change area. The proof for the 3-dimensional
case is the same.

Exercise
Let F : Rn → Rn be a linear transformation and let S ⊆ Rn . For any ~v ∈ Rn , prove that

F (S + ~v ) = F (S) + F (~v ).

Theorem 5.4.3 gives a formula for the area/volume of a parallelogram/parallelepiped under a linear trans-
formation. The question now is how do linear transformations effect the area of any region of bounded
area/volume? The answer lies in approximation of such areas by infinitesimally small squares/cubes. The
argument involves a limiting process similar to what you see in calculus when you prove formulas for inte-
grals. The formula is given in the following theorem which we state without proof.

Theorem 5.4.4

. Let F : R2 → R2 and G : R3 → R3 be linear transformations with standard matrices A and B

respectively. Let R1 ⊆ R2 be any region of finite area, and let R2 ⊆ R3 be any region of finite volume.
Then,
area(F (R1 )) = |det(A)| · area(R1 ),

and
volume(G(R2 )) = |det(A)| · volume(R2 ).

Example 5.4.5

Let S2 denote the unit sphere in R3 and let Fx1 ,θ be the linear transformation in Example 5.4.3.
Calculate the volume of F (S2 ).

Solution. We have,
 
1 0 0 " #!
cos(θ) − sin(θ)
det(Ax1 ,θ ) = det  0 cos(θ) − sin(θ)  = det = 1.
 
sin(θ) cos(θ)
0 sin(θ) cos(θ)
226 CHAPTER 5. DETERMINANTS

Then, by Theorem 5.4.3,

volume(Fx1 ,θ (S2 )) = |det(Ax1 ,θ )| · volume(S2 ) = volume(S2 ).

This makes sense because rotations do not effect volume. ♦

5.4. DETERMINANTS AS AREAS AND VOLUMES 227

Example 5.4.6

We can use Theorem 5.4.4 to derive common formulas for areas and volumes of shapes. Use this
theorem to derive a formula for the volume of f an ellipse with horizontal radius a and vertical radius
b.

Solution. Denote such an ellipse by Eab . From Example 5.4.2, Eab is obtained by applying the linear
transformation " #
a 0
A=
0 b
to the unit circle S1 . Then, by Theorem 5.4.4,
" #!
a 0
area(Eab ) = det · area(S1 ) = abπ
0 b

which is exactly the formula for the area of Eab . ♦

228 CHAPTER 5. DETERMINANTS
Chapter 6

Eigenvalues and Diagonalization

In this chapter, we introduce eigenvalues and eigenvectors. Eigenvalues are special scalars associated with
matrices. Eigenvalues and eigenvectors are used a lot in mathematics. We start with an example.

Example 6.0.1
" # " √ #
2 3 3
Let A = and let ~v = . Describe the vector A~v .
1 2 1

Solution. The vector A~v is " #" √ # " √ #

2 3 3 2 3+3
A~v = = √ .
1 2 1 2+ 3
√ √
3 can be factored out of 2 3 + 3 to get,
√ √ √
2 3 + 3 = 3(2 + 3).

Thus, we can rewrite A~v as " √ #

√ 3
A~v = (2 + 3) .
1
√
This shows A~v = (2 + 3)~v , so multiplication of A and ~v is the same as scalar multiplication of ~v by the
√
scalar 2 + 3. ♦

The scalar in the previous example is an example of an eigenvalue and the corresponding vector is an example
of an eigenvector. We make these terms precise.

Definition 6.0.1: Eigenvalues and Eigenvectors

Let A be an n × n matrix. An eigenvector of A is a non-zero vector ~v such that A~v = λ~v for some
scalar λ ∈ R. The value λ is called an eigenvalue.

229
230 CHAPTER 6. EIGENVALUES AND DIAGONALIZATION

It is important that eigenvectors are defined to be non-zero. If ~0 were an eigenvector, then since A~0 = ~0 is
always true, the relation A~0 = r~0 is true for all scalars r ∈ R. This would imply all scalars are eigenvalues
of A which would make their definition trivial.

Note
Just because eigenvectors must be non-zero does not mean eigenvalues must be non-zero. It is
perfectly reasonable for a matrix to have zero as an eigenvalue.

6.1 Calculating Eigenvectors and Eigenspaces

In this section, we show how to calculate eigenvectors and eigenspaces. We start by showing how to check
that a given vector is an eigenvector of a matrix A. This is very easy! All you need to do is see if A~v is equal
to a scalar times ~v !

Example 6.1.1
 
1 −3 3
Let A =  3 −5 3  . Determine which of the following vectors (if any) are eigenvectors of A and
 

6 −6 4
determine their corresponding eigenvalue:
     
2 2 3
~v1 =  7  , ~v2 =  3  , ~v3 =  3 .
     

1 1 6

Solution. We calculate A~vi for i = 1, 2, 3. For ~v1 ,

    
1 −3 3 2 −16
A~v1 =  3 −5 3   7  =  −26  .
    

6 −6 4 1 −26
As A~v1 is not a scalar multiple of ~v1 , it is not an eigenvector of A.

For ~v2 ,       
1 −3 3 2 −4 2
A~v2 =  3 −5 3   3  =  −6  = −2  3  = −2~v2 .
      

6 −6 4 1 −2 1
Since A~v2 = −2~v2 , ~v2 is an eigenvector of A with corresponding eigenvalue −2.

For ~v3 ,       
1 −3 3 3 12 3
A~v3 =  3 −5 3   3  =  12  = 4  3  = 4~v3 .
      

6 −6 4 6 24 6
6.1. CALCULATING EIGENVECTORS AND EIGENSPACES 231

Since A~v3 = 4~v3 , ~v3 is an eigenvector of A with corresponding eigenvalue 4. ♦

Finding eigenvectors of matrices is a little bit more involved than checking to see if a given vector is an
eigenvector. If you’re given an eigenvalue, it is not too bad as the next theorem shows.

Theorem 6.1.1
Let A be an n × n matrix with eigenvalue λ. Then, ~v is an eigenvector of A corresponding to λ if and
only if ~v ∈ Null(A − λIn ) and ~v 6= ~0.

Example 6.1.2
 
7 12 8
Let A =  14 29 20  . It is a fact that λ = 1 is an eigenvalue of A. Find an eigenvector
 

−24 −48 −33

corresponding to this eigenvalue.

Solution. By Theorem 6.1.1, if ~v is an eigevector of A corresponding to λ = 1, then ~v ∈ Null(A − I3 ). We

can find a vector in this null space by calculating a basis for the null space. First,
     
7 12 8 1 0 0 6 12 8
A − I3 =  14 29 20  −  0 1 0  =  14 28 20 .
     

−24 −48 −33 0 0 1 −24 −48 −34

The RREF of A − I3 is  
1 2 0
A − I3 ∼  0 0 1 .
 

0 0 0

The vector form of the solution to (A − I3 )~x = ~0 is

   
x1 −2
~x =  x2  = s  1  , s ∈ R.
   

x3 0

Any non-zero value of s ∈ R produces a solution to (A − I3 )~x = ~0. Thus, if say s = 1, an eigenvector for A
corresponding to λ = 1 is  
−2
 1 ,
 

0
so a basis for Null(A − I3 ) is,
 
 −2 
 
 1  .
 
 
0
 
232 CHAPTER 6. EIGENVALUES AND DIAGONALIZATION
 
−2
Therefore,  1  is an eigenvector of A corresponding to the eigenvalue λ = 1 and so is any scalar multiple
 

0
of this vector. ♦

Note
We can always check that we have a correct eigenvector corresponding to an eigenvalue λ by simply
evaluating A~v and making sure that it equals λ~v .

Proof. First suppose ~v is an eigenvector of A corresponding to an eigenvalue λ. By definition, ~v 6= ~0 and,

A~v = λ~v .

Rearranging this equation yields,

A~v − λ~v = ~0.

Write ~v = In~v . Then,

A~v − λ~v = ~0 =⇒ A~v − λ(In~v ) = ~0 =⇒ A~v − (λIn )~v = ~0 =⇒ (A − λIn )~v = ~0.

This shows that ~v is a non-trivial solution to the matrix equation

(A − λIn )~x = ~0,

so that ~v ∈ Null(A − λIn ). The converse is proved by tracing through these steps in the reverse order and
is left as an exercise to the reader.

Exercise
Finish the proof of Theorem 6.1.1.

The null space Null(A − λIn ) is given a special name: eigenspace.

Definition 6.1.1: Eigenspaces

Let A be an n × n matrix with eigenvalue λ. The null space Null(A − λIn ) is called the eigenspace of
A corresponding to λ. This is denoted by EλA and it consists of all eigenvectors of A that correspond
to λ along with the zero vector.
6.1. CALCULATING EIGENVECTORS AND EIGENSPACES 233

Example 6.1.3
 
23 18 −36 36
 −36 −31 36 −36 
Let A =  . Given λ = −13 is an eigenvalue for A, find a basis for the
 
 12 6 −25 12 
20 10 −20 7
A
eigenspace E−13 .

A
Solution. We need to find a basis for E−13 . Start by calculating,

 
36 18 −36 36
 −36 −18 36 −36 
A + 13I4 =  .
 
 12 6 −12 12 
20 10 −20 20

The RREF of A + 13I4 is

 
1 1/2 −1 −1
 0 0 0 0 
.
 

 0 0 0 0 
0 0 0 0

The vector form of the solution to (A + 13I4 )~x = ~0 is

       
x1 −1/2 1 1
 x2   1   0   0 
~x =   = s1   + s2   + s3   , s1 , s2 , s3 ∈ R.
       
 x3   0   1   0 
x4 0 0 1

A
Therefore, a basis for E−13 is,
     

 −1/2 1 1 

 
 1   0
    0 
, ,  . ♦
   




 0   1   0 


0 0 1
 

Example 6.1.4
 
−14 14 −63 14
 30 −30 135 −30 
Let A =  . Given that λ = 0 is an eigenvalue of A, find a basis for the
 
 32 −32 144 −32 
−60 60 −270 60
corresponding eigenspace E0A .
234 CHAPTER 6. EIGENVALUES AND DIAGONALIZATION

Solution. For this one, we need to find a basis for the null space of A − 0I4 = A. The RREF of A is
 
1 −1 9/2 −1
 0 0 0 0 
A∼ .
 
 0 0 0 0 
0 0 0 0

Thus, the vector form of the solution to A~x = ~0 is

       
x1 1 −9/2 1
 x2   1   0   0 
 = s1   + s2   + s3   , s1 , s2 , s3 ∈ R.
       

 x3   0   1   0 
x4 0 0 1

Therefore, a basis for the eigenspace E0A = Null(A) is

     
 1
 −9/2 1 


 1   
0   0 
, ,  . ♦
     

 0  
 1   0 

 
0 0 1
 

In each of the examples we’ve done, the eigenspaces we’ve calculated have dimension at least 1. This is
necessarily the case: if λ is an eigenvalue of A, then the dimension of EλA is at least one. I leave it as an
exercise to determine why this is true.

Theorem 6.1.1 implies that the set of all eigenvectors of a matrix corresponding to a specific eigenvalue λ,
along with ~0, form a subspace of Rn . This can be shown via the definition of a subspace as well and is left
as an exercise for the reader.

Exercise
Let A be an n × n matrix with eigenvalue λ.

i) Show that dim(EλA ) ≥ 1.

ii) Show that the set of all eigenvectors of A corresponding to λ, along with ~0, are a subspace of
Rn using the definition of a subspace.
6.2. CALCULATING EIGENVALUES 235

6.2 Calculating Eigenvalues

In this section, we develop a method for calculating eigenvalues. The logic behind the calculation is straight-
forward, but executing the calculation can be highly non-trivial.

Theorem 6.2.1

Let A be an n × n matrix. Then, λ ∈ R is an eigenvalue of A if and only if det(A − λIn ) = 0.

Example 6.2.1
" #
2 −1
Find the eigenvalues of A = .
0 2

Solution. First, " # " # " #

2 −1 λ 0 2−λ −1
A − λI2 = − = .
0 2 0 λ 0 2−λ
Taking determinants gives

det(A − λI2 ) = (2 − λ)(2 − λ) − (−1)(0) = (2 − λ)2 .

By Theorem 6.2.1, the eigenvalues of A are the solutions to the equation,

det(A − λI2 ) = (2 − λ)2 = 0.

This is a polynomial in λ. Clearly, the only root of this polynomial is λ = 2. Therefore, λ = 2 is the only
eigenvalue of A. ♦

Proof. First suppose λ is an eigenvalue of A with eigenvector ~v . Then, ~v ∈ Null(A − λIn ) is non-zero so
that Null(A − λIn ) is not the zero subspace. Thus, by The Invertible Matrix Theorem, det(A − λIn ) = 0.
The converse is similar and its proof is left as an exercise to the reader.

Exercise
Finish the proof of Theorem 6.2.1.

Let’s do another example.

Example 6.2.2
 
4 20 2
Find the eigenvalues of A =  0 −3 0  .
 

1 26 5
236 CHAPTER 6. EIGENVALUES AND DIAGONALIZATION

Solution. The determinant of A − λI3 is

 
4−λ 20 2
det(A − λI3 ) = det  0 −3 − λ 0 
 

1 26 5−λ

= (4 − λ)(−3 − λ)(5 − λ) − ((2)(−3 − λ)(1))

= −60 + 7λ + 6λ2 − λ3 + 6 + 2λ

= −54 + 9λ + 6λ2 − λ3

We now solve the following equation for λ:

−54 + 9λ + 6λ2 − λ3 = 0.

That is, we need to find the roots of the above polynomial. Factoring this polynomial yields,

−(λ − 6)(λ + 3)(λ − 3).

Hence, the eigenvalues of A are λ = 3, −3, 6. ♦

Can we always use this method of factoring a polynomial to find eigenvalues? The answer is yes!

Theorem 6.2.2

Let A be an n × n matrix. Then, det(A − λIn ) is a polynomial in λ of degree n.

Proof. We prove this using induction on n. If n = 1, then det(A − λI1 ) = a11 − λ which is a degree one
polynomial.

Now suppose det(A − λIn ) is a polynomial in λ of degree n for all n × n matrices A. This is the induction
hypothesis.

Consider A − λIn+1 . This matrix has the form

 
a11 − λ a12 ... a1n
 a21 a22 − λ ... a2n
 

.. .. .. .
..

.
 
 . . . 
an1 an2 ... ann − λ

Cofactor Expansion across the first row gives

det(A − λIn+1 ) = (a11 − λ)(−1)1+1 det((A − λIn+1 )11 ) + . . . + a1n (−1)1+n det((A − λIn+1 )1n ).

Each of (A − λIn+1 )1j is an n × n matrix. Furthermore, regardless of which column of A − λIn+1 we delete, it
is easy to see that (A−λIn+1 )1j is a matrix of the form Bj −λIn where Bj is an n×n matrix for j = 1, . . . , n.
6.2. CALCULATING EIGENVALUES 237

Therefore, by the induction hypothesis, each of the above determinants is a polynomial in λ of degree n.
Letting det((A − λIn+1 )1j = pj (λ) for j = 1, . . . , n, we have

det(A − λIn+1 ) = (a11 − λ)p1 (λ) + . . . + a1n (−1)1+n pn (λ).

Every term in this sum is a polynomial of degree n, except for the first, which is a polynomial in λ of degree
n + 1 (because λp1 (λ) has degree n + 1). Thus, when we sum everything together, we get a polynomial in λ
of degree n + 1.

Definition 6.2.1: Characteristic Polynomial & Equation

Let A be an n × n matrix. The degree n polynomial det(A − λIn ) is called the characteristic
polynomial for A. The equation
det(A − λIn ) = 0

is called the characteristic equation for A.

It is clear from Theorem 6.2.1 that the eigenvalues of an n × n matrix A are exactly the roots of the char-
acteristic polynomial. Combining this with 6.2.2, the following is evident.

Corollary 6.2.1

Let A be an n × n matrix. Then, A has at most n distinct eigenvalues.

Exercise
Prove Corollary 6.2.1.

6.2.1 Eigenvalues of Triangular Matrices

We have seen that calculating determinants of triangular matrices is really easy. Finding their eigenvalues
is easy as well.

Theorem 6.2.3
The eigenvalues of a triangular matrix are the entries along the main diagonal.
238 CHAPTER 6. EIGENVALUES AND DIAGONALIZATION

Example 6.2.3
 
1 0 0 0 0
0 −1 0 0 0 
 

 
Find the eigenvalues of A = 
 2 .
3 4 0 0 
1 −1 0 0 0 
 

1 1 1 1 1

Solution. This is a 5 × 5 matrix, so the characteristic polynomial for A has degree 5. Typical, this is not
easy to factor. Luckily A is lower triangular so, by Theorem 6.2.3, the eigenvalues are the entries on the
main diagonal: 1, −1, 4, and 0. ♦

Proof. Let λ be an eigenvalue of A. Then, A − λIn differs from A only on the main diagonal, hence is also
triangular. By Theorem 5.2.1, its determinant is the product of the entries on the main diagonal. Thus,

det(A − λIn ) = (a11 − λ)(a22 − λ) . . . (ann − λ).

Clearly, the roots of this polynomial are aii for i = 1, 2, . . . , n. Therefore, the eigenvalues of A, are the entries
on the main diagonal.

Warning!
Do not fall into the trap of row reducing a matrix A to echelon form, and then taking the elements
on the main diagonal as the eigenvalues of A. You will get the wrong answer.

6.2.2 Matrices with Eigenvalue 0

The zero vector can not be an eigenvector, but an eigenvalue of 0 is perfectly fine. In fact, an eigenvalue of
0 tells you something about a matrix.

Theorem 6.2.4
Let A be an n × n matrix. Then, λ = 0 is an eigenvalue of A if and only if A is not invertible.

Proof. 0 is an eigenvalue of A if and only if Null(A − 0In ) = Null(A) contains a non-zero vector, which is
equivalent to A being non-invertible. You fill in the details.

With Theorem 6.2.4 in tow, we make one final amendment to the Invertible Matrix Theorem.
6.2. CALCULATING EIGENVALUES 239

Theorem 6.2.5: The Invertible Matrix Theorem

Let A be an n × n matrix. Let FA : Rn → Rn be the linear transformation whose standard matrix is
A. Let B be an echelon form of A. The following are equivalent.

1. A is invertible.

2. The RREF of A is In .

3. Then B has n pivot positions.

4. The homogeneous equation A~x = ~0 has only the trivial solution.

5. The columns of A are linearly independent.

6. FA is one-to-one.

7. The equation A~x = ~b has exactly one solution for each ~b ∈ Rn .

8. The columns of A span Rn .

9. FA is onto.

10. There is an n × n matrix C such that AC = In .

11. There is an n × n matrix D such that DA = In .

12. AT is invertible.

13. The columns of A form a basis for Rn .

14. Col(A) = Rn .

15. rank(A) = n.

16. nul(A) = 0.
n o
17. Null(A) = ~0 .

18. dim(Row(A)) = n.

19. Row(A) = Rn .

20. det(A) 6= 0.

21. λ = 0 is not an eigenvalue of A.

6.2.3 Multiplicities of Eigenvalues

As we have seen, it is possible for the characteristic polynomial to have repeated roots. The multiplicity of
such a root is of interest.
240 CHAPTER 6. EIGENVALUES AND DIAGONALIZATION

Definition 6.2.2: Algebraic Multiplicity

Let A be an n × n matrix with distinct eigenvalues λ1 , λ2 , . . . , λk , 1 ≤ k ≤ n. The algebraic

multiplicity of the eigenvalue λi , denoted m(λi ), is the multiplicity of λi as a root of the characteristic
polynomial for A.

Note
In this course, we will refer to the algebraic multiplicity of an eigenvalue simply as multiplicity.

Calculating multiplicities is easy as long as we can factor the characteristic polynomial. Suppose that
λ1 , λ2 , . . . , λk are the distinct eigenvalues of an n × n matrix A. Then, the characteristic polynomial for A
factors as
det(A − λIn ) = (λ1 − λ)r1 (λ2 − λ)r2 ) . . . (λk − λ)rk

where r1 , r2 , . . . , rk are positive integers. Then, m(λi ) = ri for each i = 1, 2, . . . , k.

Example 6.2.4


41 27 −18
Determine the multiplicity of the eigenvalue λ = 50 for the matrix A =  0 50 0 .
 

−8 24 34

Solution. Factoring the characteristic polynomial for A using MAPLE yields

det(A − λI3 ) = 62500 − 5000λ + 125λ2 − λ3 = −(λ − 25)(λ − 50)2 .

Therefore, m(50) = 2. ♦

Example 6.2.5

Suppose that the characteristic polynomial of a matrix A is given by λ8 − 17λ7 + 80λ6 − 64λ5 .
Determine the eigenvalues of A and their corresponding multiplicities.

Solution. Factor out λ5 to get

λ5 (λ3 − 17λ2 + 80λ6 − 64).

Factoring the leftover cubic gives

λ5 (λ − 8)2 (λ − 1).

The roots of this polynomial are 0, 8, and 1. Therefore, the eigenvalues of A are 0, 8, and 1 and, their
corresponding multiplicities are

m(0) = 5, m(8) = 2, and m(1) = 1. ♦

6.2. CALCULATING EIGENVALUES 241

We end this section with the following fact that relates algebraic multiplicities of eigenvalues to the dimen-
sions of the corresponding eigenspaces.

Fact 6.2.1

Let A be an n × n matrix with eigenvalue λ. Then, dim(EλA ) ≤ m(λ).

The quantity dim(EλA ) is called the geometric multiplicity of λ. This is is a really nice result because it
relates something purely algebraic to something purely geometric. However, the proof of this is hard and is
well beyond the scope of the course.
242 CHAPTER 6. EIGENVALUES AND DIAGONALIZATION

6.3 Diagonalization
Analogous to how we can factor numbers and polynomials, matrices can be factored as well. A diagonaliza-
tion is a special type of matrix factorization. The goal of this section is to give criteria for determining when
a matrix is diagonalizable and to determine such a factorization.

6.3.1 Diagonal Matrices

We start by defining diagonal matrices. We’ve seen these before.

Definition 6.3.1: Diagonal Matrices

An n × n matrix D is called diagonal if it has all zeroes both above and below the main diagonal.

Diagonal matrices have nice properties. In particular, it is really easy to perform many types of calculations
with them. Here are some examples of such properties. We leave their proofs as exercises for the reader.

Properties of Diagonal Matrices

1. Given two n × n diagonal matrices

   
d1 0 . . . 0 e1 0 ... 0
 0 d2 . . . 0 0 e2 ... 0
   
  
D=  .. .. . . .. , E= .. .. .. .. ,
. .
  
 . . .   . . . 
0 0 ... dn 0 0 ... en

the product DE is calculated via component-wise multiplication:

 
d1 e1 0 ... 0
 0 d2 e2 ... 0 
 
DE =   .. .. .. .. .
 . . . . 
0 0 . . . d n en

2. Diagonal matrices commute; that is DE = ED for all n × n diagonal matrices D and E.

3. Given an n × n diagonal matrix

 
d1 0 ... 0
0 d2 ... 0
 
 
D= .. .. .. .. 
.
 
 . . . 
0 0 ... dn
6.3. DIAGONALIZATION 243

where di 6= 0 for each i = 1, 2, . . . , n, the inverse of D is

 1 
d 0 ... 0
 1 1
 0 d2 ... 0

D−1 = 

.
 .. .. .. ..
.

 . . . 
1
0 0 ... dn

4. Given a positive integer k, and an n × n diagonal matrix

 
d1 0 . . . 0
 0 d2 . . . 0
 

D=  .. .. . . . ,
. ..

 . . 
0 0 . . . dn

the matrix power

Dk = |D · D {z
· . . . · D}
k-times

is given by  
dk1 0 ... 0
0 dk2 ... 0
 
k
 
D = .. .. .. .. .
.
 
 . . . 
0 0 ... dkn
If di 6= 0 for each i = 1, 2, . . . , n, this definition can be extended to any negative integer −m,
where m ≥ 1, and  
1
dm 0 ... 0
1
1
 
 0 ... 0 
−m dm2
D = . .
 
 .. .. .. .. 
 . . . 

0 0 . . . d1m
n

5. Diagonal matrices are simultaneously upper and lower triangular, so the determinant of a di-
agonal matrix is the product of the entries on the diagonal and its eigenvalues are the entries
on its diagonal.

Functions that are familiar to us from caclulus can be defined on matrices as well as long as we are clever
√
about it. For example, one can define square roots of matrices as follows: if A is n × n, define A to be the
√ √
n × n matrix that satisfies A · A = A.

Similarly, one can define functions on matrices using Taylor series. For example, recall that the Taylor series
for ex is:
∞
X xn
ex = .
n=0
n!
244 CHAPTER 6. EIGENVALUES AND DIAGONALIZATION

Replacing x with an n × n diagonal matrix A, we can define the matrix exponential eA :

∞
X An
eA = .
n=0
n!

In a similar fashion, one can define trig functions on matrices, logarithms, you name it!

The reason diagonal matrices are so useful is because they are so easy to calculate with. Indeed, suppose you
wanted to compute the first 100 terms of the power series for eA . If A is not diagonal, then this calculation
is really intensive because large powers of matrices require lots of computation to calculate. However, if we
calculate the exponential of a diagonal matrix,
 
d1 0 . . . 0
 0 d2 . . . 0 
 
D=  .. .. . . . ,
 . . . .. 
0 0 . . . dn

then the terms in the power series of eD , are easy to calculate! In particular,
 k 
d1 0 . . . 0
∞
Dk
∞ 0 dk2 . . . 0 
 
X X 1 
eD = =  . .. . . . 
k!  ..
k!  . .. 

k=0 k=0 .
0 0 . . . dkn

Similarly, things like square roots become much easier to calculate:

 √ 
d1 0 ... 0
√
√  0 d2 ... 0
 

D=  .. .. .. .. ,
.

 . . .
√

0 0 ... dn

because then,
 √ 2   
d1 0 ... 0 d1 0 ... 0
√ 2
√ √ 0 d2 ... 0 0 d2 ... 0
   
   
D D= .. .. .. .. = .. .. .. ..  = D.
. .
   
 . . .   . . . 
√ 2
0 0 ... dn 0 0 ... dn

So, we can see that all of these things are easier to work with as long as we are working with diagonal
matrices. But certainly not all matrices are diagonal. This is where diagonalization comes in! If A is
diagonalizable, then it can be written in a special form using a diagonal matrix. Such matrices are not quite
as easy as diagonal matrices to calculate with, but it is the next best thing.

6.3.2 Similar Matrices

Because we can discuss the diagonalization of a matrix, we need to introduce the idea of similarity.
6.3. DIAGONALIZATION 245

Definition 6.3.2: Similar Matrices

Let A and B be n × n matrices. A is called similar to B if there exists an invertible n × n matrix P
such that
A = P BP −1 ⇐⇒ AP = P B.

We write A ≡ B to denote similarity.

Example 6.3.1
   
1 −4 12 9 0 0
The matrix A =  −4 7 18  is similar to B =  0 1 0  because A = P BP −1 where
   

0 0 1 0 0 −1
 
−1 9 2
P = 2 3 1 .
 

0 1 0

Similarity defines something called an equivalence relation on the set of n × n matrices. We leave the proof
of this as an exercise.

Exercise
Prove that similarity is an equivalence relation. To do this, you must prove the following three things
for all n × n matrices A, B, C

1. A ≡ A

2. If A ≡ B, then B ≡ A

3. If A ≡ B and B ≡ C, then A ≡ C.

If A and B are similar, then they share some of the same properties.

Theorem 6.3.1
Let A and B be two n × n similar matrices.

1. rank(A) = rank(B) and nul(A) = nul(B).

2. det(A) = det(B).

3. A and B have the same eigenvalues.

Proof. The proof of 1 is beyond the scope of this course. The proof of 2 is left as an exercise for the reader.
We prove part 3.
246 CHAPTER 6. EIGENVALUES AND DIAGONALIZATION

Suppose A is similar to B. Then, there exists an invertible n × n matrix P such that A = P BP −1 . Noting
In = P P −1 , we have

A − λIn = P BP −1 − λP P −1 = (P BλP )P −1 = (P B − P (λIn ))P −1 = P (B − λIn )P −1 .

Therefore, (A − λIn ) ≡ (B − λIn ). By part 2, det(A − λIn ) = det(B − λIn ). Therefore, the characteristic
polynomials for A and B are the same, so they have the same roots, which implies that A and B have the
same eigenvalues.

Exercise
Show that if A and B are similar then they have the same determinant.

The converse of this theorem is not true. That is, two matrices with the same rank, determinant, and
eigenvalues are not necessarily similar. Here is a counter example.

Example 6.3.2

Consider the following two matrices

" # " #
2 1 2 0
, .
0 2 0 2

Both of these matrices have one eigenvalue, 2, of multiplicity 2, they both have determinant equal to
4, and
" they both
# have rank 2, but they are not similar. To see this, suppose there exists a matrix
a b
P = such that
c d
" # " #
2 1 2 0
P = P.
0 2 0 2
Doing the matrix multiplication gives
" #" # " #" # " # " #
a b 2 1 2 0 a b 2a a + 2b 2a 2b
= =⇒ = .
c d 0 2 0 2 c d 2c c + 2d 2c 2d

Equating the (2, 1) and (2, 2) entries implies that a = c = 0. Thus,

" #
0 b
P =
0 d

which is not invertible. Therefore, the two matrices are not similar despite having the same eigenval-
ues.

Suppose that A and B are similar. Then, there exists an n × n matrix P such that A = P BP −1 . Suppose
6.3. DIAGONALIZATION 247

we want to calculate A2 . Then,

A2 = A · A = (P BP −1 )(P BP −1 ) = P B(P P −1 )BP −1 = P B 2 P −1 .

For A3 ,
A3 = A2 · A = (P BP 2 P −1 )(P BP −1 ) = P B 2 (P −1 P )BP −1 = P B 3 P −1 .

Repeating the pattern, we can see that for any positive integer k ≥ 1 that

Ak = P B k P −1 .

Therefore, if A is similar to a diagonal matrix, we can calculate powers of A by calculating powers of a

diagonal matrix!

6.3.3 Diagonalization
In this section, we show how to determine if a matrix is diagonalizable and, if it is, how to find the diago-
nalization.

Definition 6.3.3: Diagonalizable & Diagonalization

An n × n matrix A is called diagonalizable if it is similar to a diagonal matrix D; that is, there

exists an n × n invertible matrix P such that

A = P DP −1 ⇐⇒ AP = P D.

A decomposition A = P DP −1 is called a diagonalization of A.

A given n × n matrix A is not guaranteed to be diagonalizable. The next theorem gives an exact criterion
for determining when a matrix is diagonalizable and, even better, its proof gives a recipe for finding the
diagonalization!

Theorem 6.3.2: The Diagonalization Theorem

An n × n matrix A is diagonalizable if and only if it there exists a linearly independent set of n

eigenvectors of A. Moreover, in the case that A is diagonalizable, the diagonalization is A = P DP −1
where the columns of P are linearly independent eigenvectors for A and the diagonal entries of D are
the distinct eigenvalues of A, counted with multiplicity, arranged in such a way so that the ith entry
on the main diagonal of D is an eigenvalue corresponding to the ith column of P .

Generally we would do an example before the proof but, because the proof of this theorem also gives a recipe
for the diagonalization, we’ll do the proof first and follow it with an example.
248 CHAPTER 6. EIGENVALUES AND DIAGONALIZATION

Proof. Let λ1 , λ2 , . . . , λn be the eigenvalues of A counted with multiplicity and suppose that {~v1 , ~v2 , . . . , ~vn }
is a linearly independent set of eigenvectors where ~vi is an eigenvector of A corresponding to λi for each
i = 1, 2, . . . , n. Consider the matrix P = [ ~v1 ~v2 . . . ~vn ] and define,
 
λ1 0 ... 0
0 λ2 ... 0
 
 
D= .. .. .. .. .
.
 
 . . . 
0 0 ... λn

Notice we can write,

D = [ λ1~e1 λ2~e2 . . . λn~en ] ,

where ~e1 , ~e2 , . . . , ~en is the standard basis for Rn . Since λi is an eigenvalue of A corresponding to the
eigenvector ~vi , the relation A~vi = λi~vi is satisfied for each i = 1, 2, . . . , n. Therefore, we have

AP = A [ ~v1 ~v2 . . . ~vn ] = [ A~v1 A~v2 . . . A~vn ] = [ λ1~v1 λ2~v2 . . . λn~vn ] . (6.1)

Also, P ~ei = ~vi , the ith column of P . Hence,

P D = [ P (λ~e1 ) P (λ~e2 ) . . . P (λn~en ) ] = [ λ1 (P ~e1 ) λ2 (P ~e2 ) . . . λn (P ~en ) ] = [ λ1~v1 λ2~v2 . . . λn~vn ] . (6.2)

Combining Equations (6.1) and (6.2) gives AP = P D. Since the columns of P are linearly independent
by assumption, P is invertible by The Invertible Matrix Theorem. This shows that A ≡ D so that A is
diagonalizable.

Conversely, suppose A is diagonalizable. Then, there exists an n × n invertible matrix P and a diagonal
matrix D such that AP = P D. Write P = [ ~v1 ~v2 . . . ~vn ] and
 
d1 0 ... 0
0 d2 ... 0
 
 
D= .. .. .. .. .
.
 
 . . . 
0 0 ... dn

Then,
AP = [A~v1 A~v2 . . . A~vn ] = [ d1~v1 d2~v2 . . . dn~vn ] = P D.

Since these matrices are equal, their columns must be equal. This means,

A~vi = di~vi for each i = 1, 2, . . . , n.

Since P is invertible, none of the ~vi ’s are ~0, and so each di is an eigenvalue of A with corresponding eigen-
vector ~vi . Finally, since P is invertible, the ~vi ’s must be linearly independent by The Invertible Matrix
Theorem. Thus, there exists a linearly independent set of n eigenvectors of A.
6.3. DIAGONALIZATION 249

Example 6.3.3
" #
−2 3
Diagonalize the matrix A = .
−1 2

Solution. First, we need to find the eigenvalues of A. We have,

" #!
−2 − λ 3
det (A − λI2 ) = det = (−2 − λ)(2 − λ) + 3 = λ2 − 4 + 3 = λ2 − 1 = (λ − 1)(λ + 1).
−1 2−λ

This shows the eigenvalues of A are ±1. Now, we find bases for the eigenspaces. For λ = 1,
" # " #
−3 3 1 −1
A − I2 = ∼ .
−1 1 0 0
(" #)
1
Thus, a basis for the eigenspace E1A is .
1

For the second eigenvalue λ = −1,

" # " #
−1 3 1 −3
A + I2 = = .
−1 3 0 0
(" #)
A 3
Thus, a basis for the eigenspace E−1 is .
1

Now, the set of eigenvectors (" # " #)

1 3
,
1 1
is linearly independent. Therefore, by The Diagonalization Theorem, A is diagonalizable and if we set
" # " #
1 3 1 0
P = , D= ,
1 1 0 −1

the diagaonlization is A = P DP −1 . ♦

This might seem a bit complicated, but it’s not too bad. Before we continue on with an algorithm for
diagonalizing matrices, we have a few results.

Theorem 6.3.3
Let A be an n × n matrix with distinct eigenvalues λ1 , λ2 , . . . , λk , 1 ≤ k ≤ n. Let ~v1 , ~v2 , . . . ~vk be
eigenvectors of A where ~vi is an eigenvector of A corresponding to the eigenvalue λi , i = 1, . . . , k.
Then, {~v1 , ~v2 , . . . , ~vk } is a linearly independent set.

Proof. By way of contradiction, suppose {~v1 , ~v2 , . . . , ~vk } is a linearly dependent set. Since ~v1 is non-zero,
Theorem 2.6.3 implies the following:
250 CHAPTER 6. EIGENVALUES AND DIAGONALIZATION

1. There is a smallest index p, 1 ≤ p ≤ k − 1, such that ~vp+1 is a linear combination of the previous
vectors ~v1 , ~v2 , . . . , ~vp ,

2. {~v1 , ~v2 , . . . , ~vp } are linearly independent.

Thus, there exist scalars c1 , c2 , . . . , cp ∈ R, not all zero, such that

c1~v1 + c2~v2 + . . . + cp~vp = ~vp+1 . (6.3)

Multiplying both sides of Equation (6.3) by A and using the relation A~vi = λi~vi , for each i = 1, 2, . . . , p, we
get

c1 (A~v1 ) + c2 (A~v2 ) + . . . + cp (A~vp ) = A~vp+1 =⇒ (c1 λ1 )~v1 + (c2 λ2 )~v2 + . . . + (cp λp )~vp = λp+1~vp+1 . (6.4)

Multiplying Equation (6.3) by λp+1 and subtracting from the right hand side of Equation (6.4) gives

c1 (λ1 − λp+1 )~v1 + c2 (λ2 − λp+1 )~v2 + . . . + cp (λp − λp+1 )~vp = ~0.

Since {~v1 , . . . , ~vp } is linearly independent, this equation implies ci (λi − λp+1 ) = 0 for each i = 1, 2, . . . , p.
Since one of the ci ’s is non-zero by assumption, this implies that λi0 = λp+1 for some i0 ≤ p < p + 1. This
contradicts the assumption that the λi ’s are distinct for each i = 1, 2, . . . , k. Therefore, we conclude that
{~v1 , ~v2 , . . . , ~vk } is linearly independent.

The following corollary is immediate from Theorem 6.3.3.

Corollary 6.3.1

Let A be an n × n matrix with n distinct eigenvalues. Then, A is diagonalizable.

Proof. Suppose A has n distinct eigenvalues. Then, any set of eigenvectors corresponding to these eigenval-
ues is linearly independent by Theorem 6.3.3. Thus, there exists a set of n linearly independent eigenvectors
of A, so it is diagonalizable by The Diagonalization Theorem.

The following theorem relate diagoanlizability to the dimensions of eigenspaces.

Theorem 6.3.4: Diagonalization and Dimension

Let A be an n × n matrix with distinct eigenvalues λ1 , λ2 , . . . , λk . The following are equivalent.

1. A is diagonalizable,

2. dim(EλA1 ) + dim(EλA2 ) + . . . + dim(EλAk ) = n,

3. dim(EλAi ) = m(λi ) for each i = 1, 2, . . . , k.

Proof. 1 =⇒ 2: Suppose that A is diagonalizable. By The Diagonalization Theorem, there is a linearly

independent set of n eigenvectors for A.
6.3. DIAGONALIZATION 251

Let Bi be a basis for EλAi for each i = 1, 2, . . . , k. Then, each Bi is a linearly independent set containing
dim(EλAi ) eigenvectors of A. Furthermore, from the definition of basis, these are the largest sets of linearly
independent eigenvectors each eigenspace can provide. Take all of these bases and combine them into one set
S. Then, since eigenvectors corresponding to distinct eigenvalues are linearly independent, Theorem 6.3.3
implies that S is a linearly independent set, and there are

dim(EλA1 ) + dim(EλA2 ) + . . . + dim(EλAk )

eigenvectors in S. We now show this sum equals n.

To see this, we first note that S is the largest set of linearly independent eigenvectors corresponding to A
that we can find. Any other set with more eigenvectors would imply the existence of an eigenspace, say
EλAi , with dim(EλAi ) + 1 linearly independent eigenvectors in it; a contradiction due to construction of S.
Therefore, S is the largest possible set of linearly independent eigenvectors of A. Since the existence of a set
of n linearly independent eigenvectors of A is assumed, it follows that

n ≤ dim(EλA1 ) + dim(EλA2 ) + . . . + dim(EλAk ).

However, Fact 6.2.1 implies dim(EλAi ) ≤ m(λi ) for each i = 1, 2, . . . , k. Thus,

n ≤ dim(EλA1 ) + dim(EλA2 ) + . . . + dim(EλAk ) ≤ m(λ1 ) + m(λ2 ) + . . . + m(λk ) = n

from which the equality follows.

2 =⇒ 3 : We prove the contrapositive. By Fact 6.2.1, we may assume dim(EλAi ) < m(λi ) for some i. Then,

dim(EλA1 ) + dim(EλA2 ) + . . . + dim(EλAk ) < m(λ1 ) + m(λ2 ) + . . . + m(λk ) = n

which completes the proof.

3 =⇒ 1 : Suppose dim(EλAi ) = m(λi ) for each i = 1, 2, . . . , k. Then

dim(EλA1 ) + dim(EλA2 ) + . . . + dim(EλAk ) = n.

As before, let Bi be a basis of EλAi for each i = 1, 2, . . . , k. Then, each Bi is a set of dim(EλAi ) linearly
independent eigenvectors of A. Put all of these vectors into a set S. Such a set is necessarily linearly
independent by Theorem 6.3.3 and contains n elements by assumption. Therefore, there is a set of n lin-
early independent eigenvectors for A, and therefore A is diagonalizable by The Diagonalization Theorem.

Diagonalization and Dimension is a great theorem because it relates diagonalizability to a completely geo-
metric concept: dimensions of eigenspaces! It also allows us a bit of a shortcut in finding diagonalizations.
252 CHAPTER 6. EIGENVALUES AND DIAGONALIZATION

6.4 An Algorithm for Diaongalization & Examples

In this section, we give an algorithm for finding diagonalizations of matrices and then give some examples.

Algorithm 3: The Diagonalization Algorithm

To diagonalize a given n × n matrix A, do the following.

Step 1. Calculate the distinct eigenvalues of A and their respective multiplicities. List them, taking
into account their multiplicities, as λ1 , λ2 , . . . , λn . This list will not, in general, be distinct.

Step 2. Calculate bases for eigenspaces corresponding to each distinct eigenvalue. Start with
eigenvalues that have algebraic multiplicity greater than 1. If any of the geometric multiplicities
are strictly less than the corresponding algebraic multiplicities, then stop. The matrix is not
diagonalizable by Diagonalization and Dimension. If the geometric multiplicity equals the algebraic
multiplicity fore each distinct eigenvalue, then A is diagonalizable by Diagonalization and Dimension,
so go to the next step.

Step 3. Let ~v1 , ~v2 , . . . , ~vn denote the basis vectors you found in Step 2 labelled so that A~vi = λi~vi
for each i = 1, 2, . . . , n. Form the matrix

P = [ ~v1 ~v2 . . . ~vn ]

and the diagonal matrix  

λ1 0 ... 0
0 λ2 ... 0
 
 
D= .. .. .. .. .
.
 
 . . . 
0 0 ... λn
Then a diagonalization of A is A = P DP −1 .

Example 6.4.1

Diagonalize the following matrix if possible

" #
0 0
A= .
−1 2

Solution. We apply the The Diagonalization Algorithm.

Step 1. Start by calculating the eigenvalues of A.

−λ 0
det(A − λI2 ) = = −λ(2 − λ) − (−1)(0) = λ(λ − 2).
−1 2−λ
6.4. AN ALGORITHM FOR DIAONGALIZATION & EXAMPLES 253

The roots of this polynomial are 0 and 2, hence the eigenvalues of A are 0 and 2. Since A has 2 distinct
eigenvalues, A is diagonalizable by Corollary 6.3.1.

Step 2. Now calculate bases for the eigenspaces corresponding to the two eigenvalues in step 1. For λ = 0,
" # " #
0 0 1 −2
A − 0I2 = ∼ .
−1 2 0 0

Therefore, the vector form of the solution to A~x = ~0 is

" # " #
x1 2
=s , s ∈ R.
x2 1

Thus, a basis for E0A = Null(A) is

(" #)
2
.
1

For λ = 2,
" # " #
−2 0 1 0
A − 2I2 = ∼ .
−1 0 0 0

Thus, the vector form of the solution to (A − 2I2 )~x = ~0 is

" # " #
x1 0
=s , s ∈ R.
x2 1

Hence, a basis for E2A = Null(A − 2I2 ) is

(" #)
0
.
1

Step 3. Let
" # " #
2 0 0 0
P = [ ~v1 ~v2 ] = , D= .
1 1 0 2

Then A = P DP −1 . ♦

Note

If A = P DP −1 is a diagonalization of A, then the ith column of P is an eigenvector of A corresponding

to the eigenvalue in the (i, i)-position of D. This means there are multiple ways to diagonalize
matrices. For example, another correct diagonalization for the matrix in Example 6.4.1 is
" # " #
0 2 2 0
P = , D= .
1 1 0 0
254 CHAPTER 6. EIGENVALUES AND DIAGONALIZATION

Example 6.4.2

Diagonalize the following matrix if possible

 
−23 −125 −50
A =  22 127 28  .
 

73 25 67

Solution. We use The Diagonalization Algorithm:

Step 1. The characteristic polynomial of A is

det(A − λI3 ) = −λ3 + 171λ2 − 9747λ + 185193 = −(λ − 57)3 .

Therefore, 57 is the only eigenvalue of A and m(57) = 3.

A
Step 2. We calculate a basis for the eigenspace E57 .
   
−80 −125 −50 1 0 0
A − 57I3 =  22 70 28  ∼  0 1 2/5  .
   

73 25 10 0 0 0

The vector form of the solution to (A − 57I3 )~x = ~0 is then

   
x1 0
 x2  = s  −2/5  .
   

x3 1
A
Thus, dim(E57 ) = 1 < 3 = m(57), therefore A is not diagonalizable by 6.3.4. ♦

Example 6.4.3

Diagonalize the following matrix if possible

 
3 0 0
A= 0 3 0 .
 

4 0 −1

Solution. We use The Diagonalization Algorithm:

Step 1. The characteristic polynomial of A is

det(A − λI3 ) = −λ3 + 5λ2 − 3λ − 9 = −(λ + 1)(λ − 3)2 .

Thus, the eigenvalues of A are -1 and 3 with multiplicities m(−1) = 1 and m(3) = 2.
6.4. AN ALGORITHM FOR DIAONGALIZATION & EXAMPLES 255

Step 2. Calculate bases for the eigenspaces corresponding to each eigenvalue. For λ = 3,
   
0 0 0 1 0 −1
A − 3I3 =  0 0 0 ∼ 0 0 0 .
   

4 0 −4 0 0 0

Therefore, the vector form of the solution to (A − 3I3 )~x = ~0 is

     
x1 0 1
 x2  = s1  1  + s2  0  , s1 , s2 ∈ R.
     

x3 0 1

Thus, a basis for E3A = Null(A − 3I3 ) is    

 0
 1 
 1 , 0 
   
 
0 1
 

and dim(E3A ) = 2 = m(3).

For λ = −1,    
4 0 0 1 0 0
A + I3 =  0 4 0 ∼ 0 1 0 .
   

4 0 0 0 0 0

Therefore, the vector form of the solution to (A + I3 )~x = ~0 is

   
x1 0
 x2  = s  0  , s ∈ R.
   

x3 1

A
Thus, a basis for E−1 = Null(A + I3 ) is
 
 0 
 
 0  .
 
 
1
 

A
and we have dim(E−1 ) = 1 = m(−1). Therefore, the matrix is diagonalizable by Diagonalization and Di-
mension.

Step 3. Let    
0 0 1 −1 0 0
P = 0 1 0 , D= 0 3 0 .
   

1 0 1 0 0 3
Then A = P DP −1 . Other valid diagonalizations for A are
   
1 0 0 3 0 0
P =  0 1 0 , D =  0 3 0 ,
   

1 0 1 0 0 −1
256 CHAPTER 6. EIGENVALUES AND DIAGONALIZATION

or    
1 0 0 3 0 0
P = 0 0 1 , D= 0 −1 0  . ♦
   

1 1 0 0 0 3

Example 6.4.4

Diagonalize the following matrix if possible

 
−45 −35 −126 63 −21
 −2 −48 −36 18 −6
 

 
A=  0 0 −76 0 0 .

2 10 −40 −56 6
 
 
−1 −5 −18 9 −41

Solution. We use The Diagonalization Algorithm:

Step 1. Using MAPLE, we factor the characteristic polynomial of A:

det(A − λI5 ) = −(λ + 76)2 (λ + 38)3 .

Therefore, the eigenvalues of A are −76 and −38 with multiplicities m(−38) = 3 and m(−76) = 2.

Step 2. Now calculate bases for the eigenspaces. For −76,

   
31 −35 −126 63 −21 1 0 0 0 −7
 −2 28 −36 18 −6   0 1 0 0 −2 
   
   
A + 76I5 =  0 0 0 0 0 ∼ 0 0 1 −1/2 −1 
.

 2 10 −40 20 6   0 0 0 0 0 
   
−1 −5 −18 9 35 0 0 0 0 0

Therefore, the vector form of the solution to (A + 76I5 )~x = ~0 is

     
x1 0 7
 x2   0   2 
     
     
 x3  = s1  1/2  + s2  1  ,
      s1 , s2 ∈ R.
 x4   1   0 
     
x5 0 1
A
Thus, a basis for E−76 = Null(A + 76I5 ) is
   


 0 7 


0   2

    

 

   
 1/2  ,  1 
    
1   0

   

  


 
 0 1 
6.4. AN ALGORITHM FOR DIAONGALIZATION & EXAMPLES 257

A
and dim(E−76 ) = 2 = m(−76).

For −38, we have

   
−7 −35 −126 63 −21 1 5 0 −9 3
−2 −10 −36 18 −6   0 0 1 0 0 
   

   
A + 38I5 =  0 0 −38 0 ∼ 0
0  0 0 .
0 0 


2 10 −40 −18 6   0 0 0 0 0 
   

−1 −5 −18 9 −3 0 0 0 0 0

Therefore, the vector form of the solution to (A + 38I5 )~x = ~0 is

       
x1 −5 9 −3
 x2  1   0   0
       
 
       
 x3

 = s1  0  + s2  0  + s3  0
     
,
 s1 , s2 , s3 ∈ R.
 x4  0   1   0
       
 
x5 0 0 1
A
Thus, a basis for E−38 = Null(A + 38I5 ) is
     


 −5 9 −3 

 1   0   0 

      
 
     
 0 , 0 , 0 
      
0 1 0

     

      


 
 0 0 1 
A
and dim(E−38 ) = 3 = m(−38). Hence, A is diagonalizable by Diagonalization and Dimension.

Step 3. Let
   
0 7 −5 9 −3 −76 0 0 0 0
0 2 1 0 0  0 −76 0 0 0
   
  
   
P =
 1/2 1 0 0 ,
0  D=
 0 0 −38 0 0 .

1 0 0 1 0  0 0 0 −38 0
   
  
0 1 0 0 1 0 0 0 0 −38

Then A = P DP −1 . ♦

Exercise
Write down some other valid diagonalizations for the matrix in Example 6.4.4.

Example 6.4.5

Let A be a 5×5 matrix and suppose that A has eigenvalues 2 and 3. If the dimension of the eigenspace
corresponding to 2 is 3, is A diagonalizable? What if it has dimension 4?
258 CHAPTER 6. EIGENVALUES AND DIAGONALIZATION

Solution. If the eigenspace corresponding to 2 has dimension 3, then this is not enough to conclude that A
is diagonalizable because it is possible the dimension of the eigenspace corresponding to 3 has dimension 1.
In this case, the sum of the dimensions of the eigenspaces would be 3 + 1 = 4, so that A is not diagonalizable
by Diagonalization and Dimension. However, if the eigenspace corresponding to 2 has dimension 4, then the
eigenspace corresponding to 3 necessarily has dimension exactly 1. Therefore, the sum of the dimensions of
the eigenspaces is 4 + 1 = 5 so that A is diagonalizable by Diagonalization and Dimension. ♦
6.5. SYSTEMS OF FIRST ORDER DIFFERENTIAL EQUATIONS 259

6.5 Systems of First Order Differential Equations

In this section, we give an application of diagonalization of matrices to solving linear systems of differential
equations. A differential equation is an equation that involves differentiable functions x1 (t), x2 (t), . . . , xn (t)
of a real variable t, and their derivatives. As an example, we show how to solve the “simplest” differential
equation.

Example 6.5.1

Solve the differential equation

x0 (t) = ax(t)

where a is any fixed real number.

Solution. A solution to this differential equation is a function x(t) whose derivative is equal to a times
itself. To find such a function, divide both sides by x(t) to get

x0 (t)
= a.
x(t)

x0 (t)
By the chain rule, it is clear that is the derivative of ln(x(t)). Therefore, if we integrate both sides with
x(t)
respect to t, we get Z 0 Z
x (t)
dt = a dt =⇒ ln(x(t)) = at + C
x(t)
where C is an arbitrary constant. Solving for x(t) yields,

x(t) = eat+C = Aeat

where A is an arbitrary real constant. We leave it to the reader to verify that this function does in fact
satisfy the above differential equation. ♦

In Example 6.5.1, we get an entire family of solutions to the differential equation x0 (t) = ax(t): one for each
real value of A. If we want a specific solution, we introduce a condition x(t) must satisfy at t = 0. Such a
condition is called an initial condition.

Example 6.5.2

Find a solution to the following differential equation subject to the given initial condition.

x0 (t) = 2x(t), x(0) = 4.

Solution. Example 6.5.1 shows that the solution to the differential equation is x(t) = Ae2t where A is an
arbitrary constant. If the initial condition is also to be satisfied, we must have

x(0) = 4 =⇒ Ae2·0 = 4 =⇒ A = 4.
260 CHAPTER 6. EIGENVALUES AND DIAGONALIZATION

Therefore, a solution to the differential equation subject to the initial condition is x(t) = 4e2t . ♦

Note
A problem that asks to find a solution to a differential equation subject to an initial condition is
called an initial value problem.

We will solve systems of differential equations of the form

x01 (t) = a11 x1 (t) + a12 x2 (t) + ... + a1n xn (t)
x02 (t) = a21 x1 (t) + a22 x2 (t) + ... + a2n xn (t)
.. .. .. .. .. (6.5)
. . . . .
x0n (t) = an1 x1 (t) + an2 x2 (t) + ... + ann xn (t).
Each of the differential equations about is a special case of a first order, linear, differential equation.
Notice, we can write this system as the following matrix equation
 0    
x1 (t) a11 a12 . . . a1n x1 (t)
 0
 x2 (t)   a21 a22 . . . a2n   x2 (t) 
   
 . = . .. .. ..   ..  . (6.6)
 
 .   .
 .   . . . .  . 
x0n (t) an1 an2 . . . ann xn (t)
| {z } | {z }
=A ~
x(t)

The matrix A is called the coefficient matrix for the system. We introduce some notation that allows us
to further simplify this equation.

Definition 6.5.1: Vector Derivative

Let f1 (t), f2 (t), . . . , fn (t) be differentiable functions of a real variable t. Consider the vector of func-
tions  
f1 (t)
 f2 (t)
 
f~(t) = 

.
 .. 
 . 
fn (t)

The derivative of f~(t), denoted f~ 0 (t), is the vector obtained by differentiating each component of
f~(t) component-wise,
 0 
f1 (t)
 0
 f2 (t) 

f~ 0 (t) = 
 ..  .

 . 
fn0 (t)

The vector derivative satisfies the following two properties. Both are based off of properties of derivatives so
the proofs are left to the reader.
6.5. SYSTEMS OF FIRST ORDER DIFFERENTIAL EQUATIONS 261

Lemma 6.5.1: Properties of Vector Derivatives

Let f1 (t), f2 (t), . . . , fn (t), g1 (t), g2 (t), . . . , gn (t) be differentiable functions of a real variable t. Define
   
f1 (t) g1 (t)
f2 (t) g2 (t)
   
~
   
f (t) = 
 .. ,
 and ~g (t) = 
 .. .

 .   . 
fn (t) gn (t)

Then,

1. (f~(t) + ~g (t))0 = f~ 0 (t) + ~g 0 (t),

2. (cf~(t))0 = cf~ 0 (t) for all c ∈ R.

Exercise
Prove Properties of Vector Derivatives.

In view of vector derivatives, it is clear we can write Equation (6.6) as

~x 0 (t) = A~x(t), (6.7)

and this matrix equation represents the linear system of differential equations in Equation (6.5).

Notice the similarity between Equation (6.7) and the differential equation in Example 6.5.1. The solution
to the differential equation suggests that a solution to Equation (6.7) may have the form
   
v1 eλt v1
v2 eλt v2
   
 = eλt   = eλt~v
   
~x(t) = 
 ..   .. 
 .   . 
vn eλt vn

where λ ∈ R and ~v ∈ Rn . We show this is the case under the assumption that A is diagonalizable.

Suppose A in Equation (6.7) is diagonalizable. Let λ1 , λ2 , . . . , λn denote the eigenvalues of A, counted

with multiplicity, and let {~v1 , ~v2 , . . . , ~vn } be a set of n linearly eigenvalues of A where ~vi is an eigenvector
corresponding to the eigenvalue λi for each i = 1, 2, . . . , n. Motivated by the solution in Example 6.5.1,
consider ~vi (t) = eλi t~vi . From the definition vector derivatives above, and the fact that λi is an eigenvalue of
A with corresponding eigenvector ~vi , we have

(e i v1i )0
 λt   
(λi eλi t )v1i
 (e v2i )0   (λi eλi t )v2i 
 λi t   
v~i 0 (t) = 
 .. 
 =  ..  = λi eλi t~vi = eλi t (λi v~i ) = eλi t (A~vi ) = A(eλi t~vi ) = A~vi (t) i = 1, 2, . . . , n.

 .   . 
λi t 0 λi t
(e vni ) (λi e )vni
262 CHAPTER 6. EIGENVALUES AND DIAGONALIZATION

Therefore, ~x(t) = ~vi (t) is a solution to the matrix equation in (6.7) for each i = 1, 2, . . . , n. Furthermore, any
linear combination of the ~vi (t)’s is a solution to this equation. This is called the superposition principle.
To see this, define
~v (t) = c1~v1 (t) + c2~v2 (t) + . . . + cn~vn (t)
where c1 , c2 , . . . , cn ∈ R. Then,
~v 0 (t) = (c1~v1 (t) + c2~v2 (t) + . . . + cn~vn (t))0

= c1~v1 0 (t) + c2~v2 0 (t) + . . . + cn~vn 0 (t) by repeated application of Properties of Vector Derivatives

= c1 A~v1 (t) + c2 A~v2 (t) + . . . + cn A~vn (t)

= A(c1~v1 (t) + c2~v2 (t) + . . . + cn~vn (t))

= A~v (t).
This shows ~x(t) = ~v (t) is a solution to the matrix differential equation in (6.7). In fact, if A is diagonalizable,
every solution to this equation has the form given (this isn’t proved in this course). Therefore, ~v (t) is called
the general solution to the matrix differential equation. We summarize all of this in the following theorem.

Theorem 6.5.1
Consider a system of first order, linear, differential equations

x01 (t) = a11 x1 (t) + a12 x2 (t) + ... + a1n xn (t)

x02 (t) = a21 x1 (t) + a22 x2 (t) + ... + a2n xn (t)
.. .. .. .. ..
. . . . .
0
xn (t) = an1 x1 (t) + an2 x2 (t) + ... + ann xn (t)

subjection to the initial conditions xi (0) = bi for i = 1, 2, . . . , n. Write the system as

~x 0 (t) = A~x(t), ~x(0) = ~b

where      
a11 a12 ... a1n x1 (t) b1
a21 a22 ... a2n x2 (t) b2
     
~b = 
     
A= .. .. .. .. , ~x(t) =  .. , .. .
.
     
 . . .   .   . 
an1 an2 . . . ann xn (t) bn
Assume A is diagonalizable. Let λ1 , λ2 , . . . , λn be the eigenvalues of A, counted with multiplicity, and
let {~v1 , ~v2 , . . . , ~vn } be a set of n linearly independent eigenvectors of A, where ~vi is an eigenvector
corresponding to λi for each i = 1, 2, . . . , n. Then, the general solution to the matrix differential
equation is
~v (t) = c1~v1 (t) + c2~v2 (t) + . . . + cn~vn (t)
where c1 , c2 , . . . , cn are arbitrary real constants. The ci ’s can be solved by applying the initial condi-
tion ~v (0) = ~b and solving the resulting linear system.
6.5. SYSTEMS OF FIRST ORDER DIFFERENTIAL EQUATIONS 263

Example 6.5.3

Consider the system of linear, first order, differential equations.

x01 (t) = x1 (t) + 4x2 (t)

x02 (t) = x1 (t) + x2 (t).

i) Find the general solution for this system.

" #
−1
ii) Calculate a solution subject to the initial condition ~x(0) = .
−3

Solution.
" #
1 4
i) The coefficient matrix for the system is A = . We leave it as an exercise to show that a
1 1
diagonalization of A is " # " #
−1 1 −1 0
P = , D= .
2 2 0 3
The columns of P are linearly independent eigenvectors, with the first corresponding to the eigenvalue
−1 and the second corresponding to the eigenvalue 3. Therefore, the general solution to the system is
" # " #
−1 1
~v (t) = c1 e−t + c2 e3t , c1 , c2 ∈ R.
2 2

ii) Applying the initial condition gives

" # " # " #
−1 1 −1
~v (0) = c1 + c2 = .
2 2 −3

We must solve this vector equation for c1 and c2 . Therefore, we consider

" # " #
−1 1 −1 1 0 −1/4
∼ .
2 2 −3 0 1 −5/4

Therefore, the solution subject to the initial condition is

" # " #
1 −t −1 5 3t 1
~v (t) = − e − e . ♦
4 2 4 2

Example 6.5.4

Solve the following system of linear, first order, differential equations.

x01 (t) = −8x1 (t) − 12x2 (t) − 6x3 (t)

x02 (t) = 2x1 (t) + x2 (t) + 2x3 (t)
x03 (t) = 7x1 (t) + 12x2 (t) + 5x3 (t)
264 CHAPTER 6. EIGENVALUES AND DIAGONALIZATION

subjection to the initial conditions x1 (0) = 2, x2 (0) = 1, x3 (0) = −1.

Solution. The coefficient matrix for the system is

 
−8 −12 −6
A= 2 1 2 .
 

7 12 5

A diagonalization for A is
   
−6 −4 −1 −1 0 0
P = 1 1 0 , D= 0 1 0 .
   

5 4 1 0 0 −2

Therefore, the general solution to the system is

     
−6 −4 −1
~v (t) = c1 e−t  1  + c2 et  1  + c3 e−2t  0  , c1 , c2 , c3 ∈ R.
     

5 4 1

Applying the initial condition,

       
−6 −4 −1 2
~v (0)(t) = c1  1  + c2  1  + c3  0  =  1  .
       

5 4 1 −1

To solve this vector equation, we row reduce

   
−6 −4 −1 2 1 0 0 −1
 1 1 0 1 ∼ 0 1 0 2 .
   

5 4 1 −1 0 0 1 −4

Therefore, the solution to the system subject to the initial condition is

     
−6 −4 −1
~v (t) = −e−t  1  + 2et  1  + −4e−2t  0  . ♦
     

5 4 1

The initial condition does not need to be at t = 0. It can be at t = a for any constant a.

Example 6.5.5
 
2
Repeat Example 6.5.4 with the initial condition ~x(1) =  1 .
 

−1
6.5. SYSTEMS OF FIRST ORDER DIFFERENTIAL EQUATIONS 265

Solution. The general solution to the system is

     
−6 −4 −1
~v (t) = c1 e−t  1  + c2 et  1  + c3 e−2t  0  , c1 , c2 , c3 ∈ R.
     

5 4 1

Applying the initial condition,

       
−6 −4 −1 2
~v (1)(t) = c1 e−1  1  + c2 e  1  + c3 e−2  0  =  1  .
       

5 4 1 −1

Therefore, we row reduce the matrix

   
−6e−1 −4e −e−2 2 1 0 0 −e
e−1 e 0 1 ∼ 0 1 0 2e−1  .
   

−1 −2
5e 4e e −1 0 0 1 −4e2

Hence, the solution is

     
−6 −4 −1
~v (t) = −e1−t  1  + 2et−1  1  − 4e2−2t  0  . ♦
     

5 4 1
266 CHAPTER 6. EIGENVALUES AND DIAGONALIZATION
Bibliography

[1] Buss, S. Some proofs about determinants. Link here. Document pulled May 3, 2017.

[2] Holt, J. Linear Algebra with Applications, 2nd Edition. W.H Freeman and Company, New York NY,
2017.

[3] Lay, D.C. Linear Algebra and its Applications, 3rd Edition. Pearson Education Inc. Boston MA, 2006.

267

Linear Guest PDF
100% (1)
Linear Guest PDF
436 pages
Linear Guest
100% (1)
Linear Guest
436 pages
Linear Guest (001 109)
No ratings yet
Linear Guest (001 109)
109 pages
Lecture Notes PDF
No ratings yet
Lecture Notes PDF
163 pages
Linear Algebra Done Openly
100% (1)
Linear Algebra Done Openly
288 pages
Mat2114notes Written
No ratings yet
Mat2114notes Written
61 pages
Lecture Notes
No ratings yet
Lecture Notes
128 pages
Lecture Notes On Linear Algebra
No ratings yet
Lecture Notes On Linear Algebra
128 pages
Linear Algebra
No ratings yet
Linear Algebra
91 pages
Maths 2 Book Linear Algebra (All The Chapters)
No ratings yet
Maths 2 Book Linear Algebra (All The Chapters)
240 pages
LA Lecture Notes
No ratings yet
LA Lecture Notes
392 pages
Tommy 21b Notes
No ratings yet
Tommy 21b Notes
19 pages
Kuttler LinearAlgebra AFirstCourse
No ratings yet
Kuttler LinearAlgebra AFirstCourse
318 pages
Math 211 Course Pack V5
No ratings yet
Math 211 Course Pack V5
114 pages
Linear Algebra LectureNote
No ratings yet
Linear Algebra LectureNote
288 pages
Lecture Notes
No ratings yet
Lecture Notes
255 pages
Maths 2 Iit Kanpur Peeyush Chandra, A. K. Lal, V. Raghavendra, G. Santhanam
100% (1)
Maths 2 Iit Kanpur Peeyush Chandra, A. K. Lal, V. Raghavendra, G. Santhanam
255 pages
Linear Algebra
100% (1)
Linear Algebra
245 pages
Linear Algebra Through Matrices
100% (2)
Linear Algebra Through Matrices
245 pages
Engineering Mathematics
60% (5)
Engineering Mathematics
234 pages
Linear Algebra
No ratings yet
Linear Algebra
100 pages
Linear Algebra MATH 211 Textbook
No ratings yet
Linear Algebra MATH 211 Textbook
253 pages
La PDF
No ratings yet
La PDF
257 pages
Instructor's Solutions Manual For Elementary Linear Algebra With Applications, 9th Edition Bernard Kolman - Ebook PDF Download
100% (2)
Instructor's Solutions Manual For Elementary Linear Algebra With Applications, 9th Edition Bernard Kolman - Ebook PDF Download
37 pages
MATH 233 - Linear Algebra I Lecture Notes: Cesar O. Aguilar
No ratings yet
MATH 233 - Linear Algebra I Lecture Notes: Cesar O. Aguilar
206 pages
Print Book
No ratings yet
Print Book
214 pages
Ubco Math221
100% (1)
Ubco Math221
241 pages
Math For Computer Science and Machine Learning
No ratings yet
Math For Computer Science and Machine Learning
17 pages
Lecture Notes On Linear Algebra: Prepared by Muhammad Shahnewaz Bhuyan
No ratings yet
Lecture Notes On Linear Algebra: Prepared by Muhammad Shahnewaz Bhuyan
75 pages
Linear Algebra LectureNote
No ratings yet
Linear Algebra LectureNote
223 pages
Linear ALgebra1
No ratings yet
Linear ALgebra1
101 pages
Instructors Solutions Manual For Elementary Linear Algebra With Applications 9th Edition Ebook PDF
No ratings yet
Instructors Solutions Manual For Elementary Linear Algebra With Applications 9th Edition Ebook PDF
89 pages
Continue
No ratings yet
Continue
5 pages
Discrete Math Lecture Notes
No ratings yet
Discrete Math Lecture Notes
195 pages
Main Notes
No ratings yet
Main Notes
5 pages
Linear Algebra 2
No ratings yet
Linear Algebra 2
412 pages
Linear Algebra
100% (1)
Linear Algebra
412 pages
Mat2114notes - Written (1) - 2
No ratings yet
Mat2114notes - Written (1) - 2
100 pages
Algebra Topology Differential Calculus and Optimization Theory For Computer Science and Engineering Jean Gallier Download
100% (2)
Algebra Topology Differential Calculus and Optimization Theory For Computer Science and Engineering Jean Gallier Download
56 pages
La PDF
No ratings yet
La PDF
208 pages
Linear Algebra Summary
No ratings yet
Linear Algebra Summary
80 pages
Advanced Linear Algebra Guide
100% (1)
Advanced Linear Algebra Guide
270 pages
Linear Algebra Lankham Et Al 2011
No ratings yet
Linear Algebra Lankham Et Al 2011
247 pages
Optimization in Operations Research 2nd Edition Rardin
100% (1)
Optimization in Operations Research 2nd Edition Rardin
313 pages
Crime Punishment Islamic Law
No ratings yet
Crime Punishment Islamic Law
232 pages
Prevent Arguments with One Phrase
No ratings yet
Prevent Arguments with One Phrase
3 pages
Tenpin:kw: Kementerian Keuangan Republik Indonesia Badan Pendidikan Dan Pelatihan Keuangan
100% (1)
Tenpin:kw: Kementerian Keuangan Republik Indonesia Badan Pendidikan Dan Pelatihan Keuangan
10 pages
Experiment 17
No ratings yet
Experiment 17
5 pages
Sense and Sensibility PDF
No ratings yet
Sense and Sensibility PDF
459 pages
The Human Factor - The Critical Importance of Effective Teamwork and Communication in Providing Quality and Safe Care
No ratings yet
The Human Factor - The Critical Importance of Effective Teamwork and Communication in Providing Quality and Safe Care
5 pages
05 Guard of The House Hellsgarde
No ratings yet
05 Guard of The House Hellsgarde
2 pages
Finance & Marketing Professional CV
No ratings yet
Finance & Marketing Professional CV
2 pages
Personal Banker Job Description For Resume
100% (1)
Personal Banker Job Description For Resume
6 pages
Nebiyon Dawit
No ratings yet
Nebiyon Dawit
89 pages
Writing Task 2
No ratings yet
Writing Task 2
3 pages
Birch Reduction: Pseudoephedrine to Methamphetamine Guide
100% (4)
Birch Reduction: Pseudoephedrine to Methamphetamine Guide
6 pages
OCS 4.X Troubleshooting
No ratings yet
OCS 4.X Troubleshooting
96 pages
Questionnare On Work Culture
No ratings yet
Questionnare On Work Culture
6 pages
PBB School Level Form 1.0 Region IV A Elementary Final
No ratings yet
PBB School Level Form 1.0 Region IV A Elementary Final
4,317 pages
03 ForSci
No ratings yet
03 ForSci
12 pages
OWASP Vuln MGM Guide Jul23 2020
No ratings yet
OWASP Vuln MGM Guide Jul23 2020
20 pages
Environmental Impacts of Metallurgical Engineering
No ratings yet
Environmental Impacts of Metallurgical Engineering
38 pages
NXP Femto Cell Solution
No ratings yet
NXP Femto Cell Solution
29 pages
A-Level English Lit: Comedy Analysis
No ratings yet
A-Level English Lit: Comedy Analysis
8 pages
Notes:: Highways and Airports B.SC Graduation Project 2016
No ratings yet
Notes:: Highways and Airports B.SC Graduation Project 2016
1 page
Diagnosis by Palpation in Traditional Chinese Medicine
97% (30)
Diagnosis by Palpation in Traditional Chinese Medicine
71 pages
Bio-Controlling Capability of Probiotic Strain Lactobacillus Rhamnosus Against Some Common Foodborne Pathogens in Yoghurt ACCEPTED MANUSCRIPT PDF
No ratings yet
Bio-Controlling Capability of Probiotic Strain Lactobacillus Rhamnosus Against Some Common Foodborne Pathogens in Yoghurt ACCEPTED MANUSCRIPT PDF
30 pages
Limits Fits and Tolerances
No ratings yet
Limits Fits and Tolerances
66 pages
(Ebook PDF) Maternal and Child Health: Programs, Problems, and Policy in Public Health 3rd Edition Instant Download
100% (1)
(Ebook PDF) Maternal and Child Health: Programs, Problems, and Policy in Public Health 3rd Edition Instant Download
57 pages
The Owl at Purdue Apa
100% (1)
The Owl at Purdue Apa
5 pages
Rad Prod Lesson 2
No ratings yet
Rad Prod Lesson 2
6 pages
Ways of Managing Deficiency Diseases
No ratings yet
Ways of Managing Deficiency Diseases
4 pages
SOMOL Robert Dummy Text or The Diagramma
No ratings yet
SOMOL Robert Dummy Text or The Diagramma
22 pages