Anne Schilling, Isaiah Lankham, Bruno Nachtergaele - Linear Algebra As An Introduction To Abstract Mathematics-World Scientific (2016) PDF
Anne Schilling, Isaiah Lankham, Bruno Nachtergaele - Linear Algebra As An Introduction To Abstract Mathematics-World Scientific (2016) PDF
EtU_final_v6.indd 358
9808_9789814730358_TP.indd 2 11/11/15 2:45 pm
Published by
:RUOG6FLHQWL¿F3XEOLVKLQJ&R3WH/WG
7RK7XFN/LQN6LQJDSRUH
86$RI¿FH:DUUHQ6WUHHW6XLWH+DFNHQVDFN1-
8.RI¿FH6KHOWRQ6WUHHW&RYHQW*DUGHQ/RQGRQ:&++(
,6%1
,6%1 SEN
3ULQWHGLQ6LQJDSRUH
Preface
B. Nachtergaele
A. Schilling
v
This page intentionally left blank
EtU_final_v6.indd 358
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page vii
Contents
Preface v
vii
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page viii
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4. Vector Spaces 29
4.1 Definition of vector spaces . . . . . . . . . . . . . . . . . . . . . . . 29
4.2 Elementary properties of vector spaces . . . . . . . . . . . . . . . . 31
4.3 Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.4 Sums and direct sums . . . . . . . . . . . . . . . . . . . . . . . . . 34
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
6. Linear Maps 51
6.1 Definition and elementary properties . . . . . . . . . . . . . . . . . 51
6.2 Null spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6.3 Range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.4 Homomorphisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
6.5 The dimension formula . . . . . . . . . . . . . . . . . . . . . . . . . 56
6.6 The matrix of a linear map . . . . . . . . . . . . . . . . . . . . . . 57
6.7 Invertibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Contents ix
Appendices 129
Index 191
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 1
Chapter 1
1.1 Introduction
This book aims to bridge the gap between the mainly computation-oriented lower
division undergraduate classes and the abstract mathematics encountered in more
advanced mathematics courses. The goal of this book is threefold:
(1) You will learn Linear Algebra, which is one of the most widely used math-
ematical theories around. Linear Algebra finds applications in virtually every
area of mathematics, including multivariate calculus, differential equations, and
probability theory. It is also widely applied in fields like physics, chemistry,
economics, psychology, and engineering. You are even relying on methods from
Linear Algebra every time you use an internet search like Google, the Global
Positioning System (GPS), or a cellphone.
(2) You will acquire computational skills to solve linear systems of equations,
perform operations on matrices, calculate eigenvalues, and find determinants of
matrices.
(3) In the setting of Linear Algebra, you will be introduced to abstraction. As
the theory of Linear Algebra is developed, you will learn how to make and use
definitions and how to write proofs.
The exercises for each Chapter are divided into more computation-oriented exercises
and exercises that focus on proof-writing.
1
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 2
Example 1.2.1. Let us take the following system of two linear equations in the
two unknowns x1 and x2 :
2x1 + x2 = 0
.
x1 − x2 = 1
x1 = 1 + x2
from the second equation. Then, substituting this in place of x1 in the first equation,
we have
2(1 + x2 ) + x2 = 0.
Example 1.2.2. Take the following system of two linear equations in the two
unknowns x1 and x2 :
x1 + x2 = 1
.
2x1 + 2x2 = 1
We can eliminate variables by adding −2 times the first equation to the second
equation, which results in 0 = −1. This is obviously a contradiction, and hence this
system of equations has no solution.
Example 1.2.3. Let us take the following system of one linear equation in the two
unknowns x1 and x2 :
x1 − 3x2 = 0.
In this case, there are infinitely many solutions given by the set {x2 = 13 x1 | x1 ∈
R}. You can think of this solution set as a line in the Euclidean plane R2 :
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 3
x2
1 x2 = 13 x1
x1
−3 −2 −1 1 2 3
−1
where the aij ’s are the coefficients (usually real or complex numbers) in front of the
unknowns xj , and the bi ’s are also fixed real or complex numbers. A solution is a
set of numbers s1 , s2 , . . . , sn such that, substituting x1 = s1 , x2 = s2 , . . . , xn = sn
for the unknowns, all of the equations in System (1.1) hold. Linear Algebra is a
theory that concerns the solutions and the structure of solutions for linear equations.
As we progress, you will see that there is a lot of subtlety in fully understanding
the solutions for such equations.
f :X→Y (1.2)
from a set X to a set Y . The set X is called the domain of the function, and the
set Y is called the target space or codomain of the function. An equation is
f (x) = y, (1.3)
where x ∈ X and y ∈ Y . (If you are not familiar with the abstract notions of sets
and functions, please consult Appendix B.)
The set R2 can be viewed as the Euclidean plane. In this context, linear functions of
the form f : R2 → R or f : R2 → R2 can be interpreted geometrically as “motions”
in the plane and are called linear transformations.
Example 1.3.3. Recall the following linear system from Example 1.2.1:
2x1 + x2 = 0
.
x1 − x2 = 1
Each equation can be interpreted as a straight line in the plane, with solutions
(x1 , x2 ) to the linear system given by the set of all points that simultaneously lie on
both lines. In this case, the two lines meet in only one location, which corresponds
to the unique solution to the linear system as illustrated in the following figure:
1 y =x−1
x
−1 1 2
−1
y = −2x
Example 1.3.4. The linear map f (x1 , x2 ) = (x1 , −x2 ) describes the “motion” of
reflecting a vector across the x-axis, as illustrated in the following figure:
1 (x1 , x2 )
x
1 2
−1 (x1 , −x2 )
Example 1.3.5. The linear map f (x1 , x2 ) = (−x2 , x1 ) describes the “motion” of
◦
rotating a vector by 90 counterclockwise, as illustrated in the following figure:
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 6
(−x2 , x1 ) 2
1 (x1 , x2 )
x
−1 1 2
This example can easily be generalized to rotation by any arbitrary angle using
Lemma 2.3.2. In particular, when points in R2 are viewed as complex numbers,
then we can employ the so-called polar form for complex numbers in order to model
the “motion” of rotation. (Cf. Proof-Writing Exercise 5 on page 20.)
Linear equations pop up in many different contexts. For example, you can view the
df
derivative dx (x) of a differentiable function f : R → R as a linear approximation of
f . This becomes apparent when you look at the Taylor series of the function f (x)
centered around the point x = a (as seen in calculus):
df
f (x) = f (a) + (a)(x − a) + · · · . (1.11)
dx
In particular, we can graph the linear part of the Taylor series versus the original
function, as in the following figure:
f (x) f (x)
3 f (a) + df
dx
(a)(x − a)
x
1 2 3
df df
Since f (a) and dx (a) are merely real numbers, f (a)+ dx (a)(x−a) is a linear function
in the single variable x.
Similarly, if f : Rn → Rm is a multivariate function, then one can still view the
derivative of f as a form of a linear approximation for f (as seen in a multivariate
calculus course).
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 7
What if there are infinitely many variables x1 , x2 , . . .? In this case, the system
of equations has the form
⎫
a11 x1 + a12 x2 + · · · = y1 ⎬
a21 x1 + a22 x2 + · · · = y2 .
⎭
···
Hence, the sums in each equation are infinite, and so we would have to deal with
infinite series. This, in particular, means that questions of convergence arise, where
convergence depends upon the infinite sequence x = (x1 , x2 , . . .) of variables. These
questions will not occur in this course since we are only interested in finite systems
of linear equations in a finite number of variables. Other subjects in which these
questions do arise, though, include
• Differential equations;
• Fourier analysis;
• Real and complex analysis.
In algebra, Linear Algebra is also seen to arise in the study of symmetries, linear
transformations, and Lie algebras.
Calculational Exercises
(1) Solve the following systems of linear equations and characterize their solution
sets. (I.e., determine whether there is a unique solution, no solution, etc.)
Also, write each system of linear equations as an equation for a single function
f : Rn → Rm for appropriate choices of m, n ∈ Z+ .
(a) System of 3 equations in the unknowns x, y, z, w:
⎫
x + 2y − 2z + 3w = 2 ⎬
2x + 4y − 3z + 4w = 5 .
⎭
5x + 10y − 8z + 11w = 12
(b) System of 4 equations in the unknowns x, y, z:
⎫
x + 2y − 3z = 4 ⎪⎪
⎬
x + 3y + z = 11
.
2x + 5y − 4z = 13 ⎪
⎪
⎭
2x + 6y + 2z = 22
(c) System of 3 equations in the unknowns x, y, z:
⎫
x + 2y − 3z = −1 ⎬
3x − y + 2z = 7 .
⎭
5x + 3y − 4z = 2
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 8
(2) Find all pairs of real numbers x1 and x2 that satisfy the system of equations
x1 + 3x2 = 2, (1.12)
x1 − x2 = 1. (1.13)
Proof-Writing Exercises
(1) Let a, b, c, and d be real numbers, and consider the system of equations given
by
ax1 + bx2 = 0, (1.14)
cx1 + dx2 = 0. (1.15)
Note that x1 = x2 = 0 is a solution for any choice of a, b, c, and d. Prove that
if ad − bc = 0, then x1 = x2 = 0 is the only solution.
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 9
Chapter 2
Let R denote the set of real numbers, which should be a familiar collection of
numbers to anyone who has studied calculus. In this chapter, we use R to build the
equally important set of so-called complex numbers.
C = {(x, y) | x, y ∈ R}.
Given a complex number z = (x, y), we call Re(z) = x the real part of z and
Im(z) = y the imaginary part of z.
9
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 10
Even though we have formally defined C as the set of all ordered pairs of real
numbers, we can nonetheless extend the usual arithmetic operations on R so that
they also make sense on C. We discuss such extensions in this section, along with
several other important operations on complex numbers.
As with the real numbers, subtraction is defined as addition with the so-called
additive inverse, where the additive inverse of z = (x, y) is defined as −z =
(−x, −y).
√ √ √ √
Example 2.2.3. (π, 2) − (π/2, 19) = (π, 2) + (−π/2, − 19), where
√ √ √ √ √ √
(π, 2) + (−π/2, − 19) = (π − π/2, 2 − 19) = (π/2, 2 − 19).
The addition of complex numbers shares many of the same properties as the
addition of real numbers, including associativity, commutativity, the existence and
uniqueness of an additive identity, and the existence and uniqueness of additive
inverses. We summarize these properties in the following theorem, which you should
prove for your own practice.
Theorem 2.2.4. Let z1 , z2 , z3 ∈ C be any three complex numbers. Then the fol-
lowing statements are true.
The proof of this theorem is straightforward and relies solely on the definition of
complex addition along with the familiar properties of addition for real numbers.
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 11
(1) z1 + z 2 = z1 + z2 .
(2) z1 z2 = z1 z2 .
(3) 1/z1 = 1/z1 , for all z1 = 0.
(4) z1 = z1 if and only if Im(z1 ) = 0.
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 13
(5) z1 = z1 .
(6) the real and imaginary parts of z1 can be expressed as
1 1
Re(z1 ) = (z1 + z1 ) and Im(z1 ) = (z1 − z1 ).
2 2i
|z| = x2 + y 2 .
In particular, given x ∈ R, note that
5
(3, 4)
4 •
3
2
1
0 x
0 1 2 3 4 5
and apply the Pythagorean theorem to the resulting right triangle in order to find
the distance from the origin to the point (3, 4).
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 14
The following theorem lists the fundamental properties of the modulus, and
especially as it relates to complex conjugation. You should provide a proof for your
own practice.
(1) |z
1 z2 | = |z1 | · |z2 |.
z1 |z1 |
(2) = , assuming that z2 = 0.
z2 |z2 |
(3) |z1 | = |z1 |.
(4) |Re(z1 )| ≤ |z1 | and |Im(z1 )| ≤ |z1 |.
(5) (Triangle Inequality) |z1 + z2 | ≤ |z1 | + |z2 |.
(6) (Another Triangle Inequality) |z1 − z2 | ≥ | |z1 | − |z2 | |.
(7) (Formula for Multiplicative Inverse) z1 z1 = |z1 |2 , from which
z1
z1−1 =
|z1 |2
When complex numbers are viewed as points in the Euclidean plane R2 , several
of the operations defined in Section 2.2 can be directly visualized as if they were
operations on vectors.
For the purposes of this Chapter, we think of vectors as directed line segments
that start at the origin and end at a specified point in the Euclidean plane. These
line segments may also be moved around in space as long as the direction (which we
will call the argument in Section 2.3.1 below) and the length (a.k.a. the modulus)
are preserved. As such, the distinction between points in the plane and vectors is
merely a matter of convention as long as we at least implicitly think of each vector
as having been translated so that it starts at the origin.
As we saw in Example 2.2.11 above, the modulus of a complex number can be
viewed as the length of the hypotenuse of a certain right triangle. The sum and
difference of two vectors can also each be represented geometrically as the lengths
of specific diagonals within a particular parallelogram that is formed by copying
and appropriately translating the two vectors being combined.
Example 2.2.13. We illustrate the sum (3, 2) + (1, 3) = (4, 5) as the main, dashed
diagonal of the parallelogram in the left-most figure below. The difference (3, 2) −
(1, 3) = (2, −1) can also be viewed as the shorter diagonal of the same parallelogram,
though we would properly need to insist that this shorter diagonal be translated so
that it starts at the origin. The latter is illustrated in the right-most figure below.
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 15
y y
5 • (4, 5) 5 • (4, 5)
4 (1, 3) 4 (1, 3)
3 • 3 •
2 • (3, 2) 2 • (3, 2)
1 1
0 x 0 x
0 1 2 3 4 5 0 1 2 3 4 5
As mentioned above, C coincides with the plane R2 when viewed as a set of ordered
pairs of real numbers. Therefore, we can use polar coordinates as an alternate
way to uniquely identify a complex number. This gives rise to the so-called polar
form for a complex number, which often turns out to be a convenient representation
for complex numbers.
z
•⎫
⎪
⎪ ⎪
⎪
⎪
⎪
r ⎪
⎬
y = r sin(θ)
⎪
⎪
⎪
⎪
θ ⎪
⎪
⎪
⎭
x
x = r cos(θ)
We call the ordered pair (x, y) the rectangular coordinates for the complex
number z.
We also call the ordered pair (r, θ) the polar coordinates for the complex
number z. The radius r = |z| is called the modulus of z (as defined in Section 2.2.4
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 16
above), and the angle θ = Arg(z) is called the argument of z. Since the argument
of a complex number describes an angle that is measured relative to the x-axis, it is
important to note that θ is only well-defined up to adding multiples of 2π. As such,
we restrict θ ∈ [0, 2π) and add or subtract multiples of 2π as needed (e.g., when
multiplying two complex numbers so that their arguments are added together) in
order to keep the argument within this range of values.
It is straightforward to transform polar coordinates into rectangular coordinates
using the equations
x = r cos(θ) and y = r sin(θ). (2.1)
In order to
transform rectangular coordinates into polar coordinates, we first note
that r = x2 + y 2 is just the complex modulus. Then, θ must be chosen so that
it satisfies the bounds 0 ≤ θ < 2π in addition to the simultaneous equations (2.1)
where we are assuming that z = 0.
Summarizing:
z = x + yi = r cos(θ) + r sin(θ)i = r(cos(θ) + sin(θ)i).
Part of the utility of this expression is that the size r = |z| of z is explicitly part of
the very definition since it is easy to check that | cos(θ) + sin(θ)i| = 1 for any choice
of θ ∈ R.
Closely related is the exponential form for complex numbers, which does
nothing more than replace the expression cos(θ)+sin(θ)i with eiθ . The real power of
this definition is that this exponential notation turns out to be completely consistent
with the usual usage of exponential notation for real numbers.
Example 2.3.1. The complex number i in polar coordinates is expressed as eiπ/2 ,
whereas the number −1 is given by eiπ .
where we have used the usual formulas for the sine and cosine of the sum of two
angles.
In particular, Lemma 2.3.2 shows that the modulus |z1 z2 | of the product is the
product of the moduli r1 and r2 and that the argument Arg(z1 z2 ) of the product
is the sum of the arguments θ1 + θ2 .
Note, in particular, that we are not only always guaranteed the existence of an
nth root for any complex number, but that we are also always guaranteed to have
exactly n of them. This level of completeness in root extraction is in stark contrast
with roots of real numbers (within the real numbers) which may or may not exist
and may be unique or not when they exist.
An important special case of de Moivre’s Formula yields n nth roots of unity.
By unity, we just mean the complex number 1 = 1 + 0i, and by the nth roots of
unity, we mean the n numbers
0 2πk 0 2πk
zk = 11/n cos + + sin + i
n n n n
2πk 2πk
= cos + sin i
n n
= e2πi(k/n) ,
where k = 0, 1, 2, . . . , n − 1. The fact that these numbers are precisely the complex
numbers solving the equation z n = 1, has many interesting applications.
Example 2.3.4. To find all solutions of the equation z 3 + 8 = 0 for z ∈ C, we
may write z = reiθ in polar form with r > 0 and θ ∈ [0, 2π). Then the equation
z 3 + 8 = 0 becomes z 3 = r3 ei3θ = −8 = 8eiπ so that r = 2 and 3θ = π + 2πk
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 18
for k = 0, 1, 2. This means that there are three distinct solutions when θ ∈ [0, 2π),
namely θ = π3 , θ = π, and θ = 5π
3 .
Calculational Exercises
(1) Express the following complex numbers in the form x + yi for x, y ∈ R:
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 19
(a) (2 + 3i) + (4 + i)
(b) (2 + 3i)2 (4 + i)
2 + 3i
(c)
4+i
1 3
(d) +
i 1+i
(e) (−i)−1 √
(f) (−1 + i 3)3
(2) Compute the real and imaginary parts of the following expressions, where z is
the complex number x + yi and x, y ∈ R:
1
(a) 2
z
1
(b)
3z + 2
z+1
(c)
2z − 5
(d) z 3
√
(3) Find r > 0 and θ ∈ [0, 2π) such that (1 − i)/ 2 = reiθ .
(4) Solve the following equations for z a complex number:
(a) z5 − 2 = 0
(b) z4 + i = 0
(c) z6 + 8 = 0
(d) z 3 − 4i = 0
(5) Calculate the
(a) complex conjugate of the fraction (3 + 8i)4 /(1 + i)10 .
(b) complex conjugate of the fraction (8 − 2i)10 /(4 + 6i)5 .
(c) complex modulus of the fraction i(2 + 3i)(5 − 2i)/(−2 − i).
(d) complex modulus of the fraction (2 − 3i)2 /(8 + 6i)2 .
(6) Compute the real and imaginary parts:
(a) e2+i
(b) sin(1 + i)
(c) e3−i
(d) cos(2 + 3i)
z
(7) Compute the real and imaginary parts of ee for z ∈ C.
Proof-Writing Exercises
(1) Let a ∈ R and z, w ∈ C. Prove that
(a) Re(az) = aRe(z) and Im(az) = aIm(z).
(b) Re(z + w) = Re(z) + Re(w) and Im(z + w) = Im(z) + Im(w).
(2) Let z ∈ C. Prove that Im(z) = 0 if and only if Re(z) = z.
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 20
(3) Let z, w ∈ C. Prove the parallelogram law |z − w|2 + |z + w|2 = 2(|z|2 + |w|2 ).
(4) Let
z, w ∈ C with zw = 1 such that either |z| = 1 or |w| = 1. Prove that
z−w
1 − zw = 1.
(5) For an angle θ ∈ [0, 2π), find the linear map fθ : R2 → R2 , which describes the
rotation by the angle θ in the counterclockwise direction.
Hint: For a given angle θ, find a, b, c, d ∈ R such that fθ (x1 , x2 ) = (ax1 +
bx2 , cx1 + dx2 ).
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 21
Chapter 3
The similarities and differences between R and C are elegant and intriguing, but
why are complex numbers important? One possible answer to this question is the
Fundamental Theorem of Algebra. It states that every polynomial equation
in one variable with complex coefficients has at least one complex solution. In
other words, polynomial equations formed over C can always be solved over C.
This amazing result has several equivalent formulations in addition to a myriad of
different proofs, one of the first of which was given by the eminent mathematician
Carl Friedrich Gauss (1777-1855) in his doctoral thesis.
The aim of this section is to provide a proof of the Fundamental Theorem of Algebra
using concepts that should be familiar from the study of Calculus, and so we begin
by providing an explicit formulation.
an z n + · · · + a1 z + a0 = 0 (3.1)
21
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 22
plane C (when thought of as R2 ) has at least one root, i.e., vanishes in at least one
place. It is in this form that we will provide a proof for Theorem 3.1.1.
Given how long the Fundamental Theorem of Algebra has been around, you
should not be surprised that there are many proofs of it. There have even been
entire books devoted solely to exploring the mathematics behind various distinct
proofs. Different proofs arise from attempting to understand the statement of the
theorem from the viewpoint of different branches of mathematics. This quickly
leads to many non-trivial interactions with such fields of mathematics as Real and
Complex Analysis, Topology, and (Modern) Abstract Algebra. The diversity of
proof techniques available is yet another indication of how fundamental and deep
the Fundamental Theorem of Algebra really is.
To prove the Fundamental Theorem of Algebra using Differential Calculus, we
will need the Extreme Value Theorem for real-valued functions of two real vari-
ables, which we state without proof. In particular, we formulate this theorem in
the restricted case of functions defined on the closed disk D of radius R > 0 and
centered at the origin, i.e.,
f (z) = an z n + · · · + a1 z + a0
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 23
For all z ∈ C such that |z| ≥ 2, we can further simplify this expression and obtain
2A
|f (z)| ≥ |an | |z|n 1 − .
|an ||z|
It follows from this inequality that there is an R > 0 such that |f (z)| > |f (0)|, for
all z ∈ C satisfying |z| > R. Let D ⊂ R2 be the disk of radius R centered at 0, and
define a function g : D → R, by
g(x, y) = |f (x + iy)|.
Since g is continuous, we can apply Theorem 3.1.2 in order to obtain a point
(x0 , y0 ) ∈ D such that g attains its minimum at (x0 , y0 ). By the choice of R
we have that for z ∈ C \ D, |f (z)| > |g(0, 0)| ≥ |g(x0 , y0 )|. Therefore, |f | attains its
minimum at z = x0 + iy0 .
Proof of Theorem 3.1.1. For our argument, we rely on the fact that the function
|f | attains its minimum value by Lemma 3.1.3. Let z0 ∈ C be a point where the
minimum is attained. We will show that if f (z0 ) = 0, then z0 is not a minimum,
thus proving by contraposition that the minimum value of |f (z)| is zero. Therefore,
f (z0 ) = 0.
If f (z0 ) = 0, then we can define a new function g : C → C by setting
f (z + z0 )
g(z) = , for all z ∈ C.
f (z0 )
Note that g is a polynomial of degree n, and that the minimum of |f | is attained at
z0 if and only if the minimum of |g| is attained at z = 0. Moreover, it is clear that
g(0) = 1.
More explicitly, g is given by a polynomial of the form
g(z) = bn z n + · · · + bk z k + 1,
with n ≥ 1 and bk = 0, for some 1 ≤ k ≤ n. Let bk = |bk |eiθ , and consider z of the
form
z = r|bk |−1/k ei(π−θ)/k , (3.2)
with r > 0. For z of this form we have
g(z) = 1 − rk + rk+1 h(r),
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 24
where h is a polynomial. Then, for r < 1, we have by the triangle inequality that
For r > 0 sufficiently small we have r|h(r)| < 1, by the continuity of the function
rh(r) and the fact that it vanishes in r = 0. Hence
for some z having the form in Equation (3.2) with r ∈ (0, r0 ) and r0 > 0 sufficiently
small. But then the minimum of the function |g| : C → R cannot possibly be equal
to 1.
p(z) = an z n + · · · + a1 z + a0
(1) given any complex number w ∈ C, we have that f (w) = 0 if and only if there
exists a polynomial function g : C → C of degree n − 1 such that
f (z) = (z − w)g(z), ∀ z ∈ C.
(2) there are at most n distinct complex numbers w for which f (w) = 0. In other
words, f has at most n distinct roots.
(3) (Fundamental Theorem of Algebra, restated) there exist exactly n + 1 complex
numbers w0 , w1 , . . . , wn ∈ C (not necessarily distinct) such that
f (z) = w0 (z − w1 )(z − w2 ) · · · (z − wn ), ∀ z ∈ C.
In other words, every polynomial function with coefficients over C can be fac-
tored into linear factors over C.
Proof.
Calculational Exercises
(1) Let n ∈ Z+ be a positive integer, let w0 , w1 , . . . , wn ∈ C be distinct complex
numbers, and let z0 , z1 , . . . , zn ∈ C be any complex numbers. Then one can
prove that there is a unique polynomial p(z) of degree at most n such that, for
each k ∈ {0, 1, . . . , n}, p(wk ) = zk .
(a) Find the unique polynomial of degree at most 2 that satisfies p(0) = 0,
p(1) = 1, and p(2) = 2.
(b) Can your result in Part (a) be easily generalized to find the unique poly-
nomial of degree at most n satisfying p(0) = 0, p(1) = 1, . . . , p(n) = n?
(2) Given any complex number α ∈ C, show that the coefficients of the polynomial
(z − α)(z − α)
are real numbers.
Proof-Writing Exercises
(1) Let m, n ∈ Z+ be positive integers with m ≤ n. Prove that there is a degree n
polynomial p(z) with complex coefficients such that p(z) has exactly m distinct
roots.
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 27
Chapter 4
Vector Spaces
With the background developed in the previous chapters, we are ready to begin the
study of Linear Algebra by introducing vector spaces. Vector spaces are essential
for the formulation and solution of linear algebra problems and they will appear on
virtually every page of this book from now on.
As we have seen in Chapter 1, a vector space is a set V with two operations defined
upon it: addition of vectors and multiplication by scalars. These operations must
satisfy certain properties, which we are about to discuss in more detail. The scalars
are taken from a field F, where for the remainder of this book F stands either for
the real numbers R or for the complex numbers C. The sets R and C are examples
of fields. The abstract definition of a field along with further examples can be found
in Appendix C.
Vector addition can be thought of as a function + : V × V → V that maps
two vectors u, v ∈ V to their sum u + v ∈ V . Scalar multiplication can similarly
be described as a function F × V → V that maps a scalar a ∈ F and a vector v ∈ V
to a new vector av ∈ V . (More information on these kinds of functions, also known
as binary operations, can be found in Appendix C.) It is when we place the right
conditions on these operations, also called axioms, that we turn V into a vector
space.
Definition 4.1.1. A vector space over F is a set V together with the operations
of addition V × V → V and scalar multiplication F × V → V satisfying each of the
following properties.
29
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 30
(4) Additive inverse: For every v ∈ V , there exists an element w ∈ V such that
v + w = 0;
(5) Multiplicative identity: 1v = v for all v ∈ V ;
(6) Distributivity: a(u + v) = au + av and (a + b)u = au + bu for all u, v ∈ V
and a, b ∈ F.
A vector space over R is usually called a real vector space, and a vector space
over C is similarly called a complex vector space. The elements v ∈ V of a vector
space are called vectors.
Even though Definition 4.1.1 may appear to be an extremely abstract definition,
vector spaces are fundamental objects in mathematics because there are count-
less examples of them. You should expect to see many examples of vector spaces
throughout your mathematical life.
Example 4.1.2. Consider the set Fn of all n-tuples with elements in F. This is a
vector space with addition and scalar multiplication defined componentwise. That
is, for u = (u1 , u2 , . . . , un ), v = (v1 , v2 , . . . , vn ) ∈ Fn and a ∈ F, we define
u + v = (u1 + v1 , u2 + v2 , . . . , un + vn ),
au = (au1 , au2 , . . . , aun ).
It is easy to check that each property of Definition 4.1.1 is satisfied. In par-
ticular, the additive identity 0 = (0, 0, . . . , 0), and the additive inverse of u is
−u = (−u1 , −u2 , . . . , −un ).
Example 4.1.4. Verify that V = {0} is a vector space! (Here, 0 denotes the zero
vector in any vector space.)
Example 4.1.5. Let F[z] be the set of all polynomial functions p : F → F with
coefficients in F. As discussed in Chapter 3, p(z) is a polynomial if there exist
a0 , a1 , . . . , an ∈ F such that
p(z) = an z n + an−1 z n−1 + · · · + a1 z + a0 . (4.1)
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 31
Vector Spaces 31
We are going to prove several important, yet simple, properties of vector spaces.
From now on, V will denote a vector space over F.
Proposition 4.2.1. In every vector space the additive identity is unique.
Since the additive inverse of v is unique, as we have just shown, it will from now
on be denoted by −v. We also define w − v to mean w + (−v). We will, in fact,
show in Proposition 4.2.5 below that −v = −1v.
Proposition 4.2.3. 0v = 0 for all v ∈ V .
Note that the 0 on the left-hand side in Proposition 4.2.3 is a scalar, whereas
the 0 on the right-hand side is a vector.
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 32
4.3 Subspaces
As mentioned in the last section, there are countless examples of vector spaces. One
particularly important source of new vector spaces comes from looking at subsets
of a set that is already known to be a vector space.
Definition 4.3.1. Let V be a vector space over F, and let U ⊂ V be a subset of
V . Then we call U a subspace of V if U is a vector space over F under the same
operations that make V into a vector space over F.
To check that a subset U ⊂ V is a subspace, it suffices to check only a few of
the conditions of a vector space.
Lemma 4.3.2. Let U ⊂ V be a subset of a vector space V over F. Then U is a
subspace of V if and only if the following three conditions hold.
(1) additive identity: 0 ∈ U ;
(2) closure under addition: u, v ∈ U implies u + v ∈ U ;
(3) closure under scalar multiplication: a ∈ F, u ∈ U implies that au ∈ U .
Proof. Condition 1 implies that the additive identity exists. Condition 2 implies
that vector addition is well-defined and, Condition 3 ensures that scalar multipli-
cation is well-defined. All other conditions for a vector space are inherited from V
since addition and scalar multiplication for elements in U are the same when viewed
as elements in either U or V .
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 33
Vector Spaces 33
Example 4.3.4. In every vector space V , the subsets {0} and V are easily verified
to be subspaces. We call these the trivial subspaces of V .
Example 4.3.9. The subspaces of R2 consist of {0}, all lines through the origin,
and R2 itself. The subspaces of R3 are {0}, all lines through the origin, all planes
through the origin, and R3 . In fact, these exhaust all subspaces of R2 and R3 ,
respectively. To prove this, we will need further tools such as the notion of bases
and dimensions to be discussed soon. In particular, this shows that lines and planes
that do not pass through the origin are not subspaces (which is not so hard to
show!).
U1 + U2 = {u1 + u2 | u1 ∈ U1 , u2 ∈ U2 }.
U1 = {(x, 0, 0) ∈ F3 | x ∈ F},
U2 = {(0, y, 0) ∈ F3 | y ∈ F}.
Then
Vector Spaces 35
/ U ∪ U
u+v ∈
u
v
U
U
U = U1 ⊕ U2
U2 = {(0, w, z) | w, z ∈ R},
(1) V = U1 + U2 ;
(2) If 0 = u1 + u2 with u1 ∈ U1 and u2 ∈ U2 , then u1 = u2 = 0.
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 36
Proof.
(“=⇒”) Suppose V = U1 ⊕ U2 . Then Condition 1 holds by definition. Certainly
0 = 0 + 0, and, since by uniqueness this is the only way to write 0 ∈ V , we have
u1 = u2 = 0.
(“⇐=”) Suppose Conditions 1 and 2 hold. By Condition 1, we have that, for all
v ∈ V , there exist u1 ∈ U1 and u2 ∈ U2 such that v = u1 + u2 . Suppose v = w1 + w2
with w1 ∈ U1 and w2 ∈ U2 . Subtracting the two equations, we obtain
0 = (u1 − w1 ) + (u2 − w2 ),
where u1 − w1 ∈ U1 and u2 − w2 ∈ U2 . By Condition 2, this implies u1 − w1 = 0
and u2 − w2 = 0, or equivalently u1 = w1 and u2 = w2 , as desired.
Proof.
(“=⇒”) Suppose V = U1 ⊕U2 . Then Condition 1 holds by definition. If u ∈ U1 ∩U2 ,
then 0 = u + (−u) with u ∈ U1 and −u ∈ U2 (why?). By Proposition 4.4.6, we have
u = 0 and −u = 0 so that U1 ∩ U2 = {0}.
Vector Spaces 37
Calculational Exercises
(1) For each of the following sets, either show that the set is a vector space or
explain why it is not a vector space.
(a) The set R of real numbers under the usual operations of addition and mul-
tiplication.
(b) The set {(x, 0) | x ∈ R} under the usual operations of addition and multi-
plication on R2 .
(c) The set {(x, 1) | x ∈ R} under the usual operations of addition and multi-
plication on R2 .
(d) The set {(x, 0) | x ∈ R, x ≥ 0} under the usual operations of addition and
multiplication on R2 .
(e) The set {(x, 1) | x ∈ R, x ≥ 0} under the usual operations of addition and
on R .
2
multiplication
a a+b
(f) The set | a, b ∈ R under the usual operations of addition
a+b a
and multiplication
on R2×2.
a a+b+1
(g) The set | a, b ∈ R under the usual operations of addi-
a+b a
tion and multiplication on R2×2 .
(2) Show that the space V = {(x1 , x2 , x3 ) ∈ F3 | x1 + 2x2 + 2x3 = 0} forms a vector
space.
(3) For each of the following sets, either show that the set is a subspace of C(R) or
explain why it is not a subspace.
(a) The set {f ∈ C(R) | f (x) ≤ 0, ∀ x ∈ R}.
(b) The set {f ∈ C(R) | f (0) = 0}.
(c) The set {f ∈ C(R) | f (0) = 2}.
(d) The set of all constant functions.
(e) The set {α + β sin(x) | α, β ∈ R}.
(4) Give an example of a nonempty subset U ⊂ R2 such that U is closed under
scalar multiplication but is not a subspace of R2 .
(5) Let F[z] denote the vector space of all polynomials with coefficients in F, and
define U to be the subspace of F[z] given by
U = {az 2 + bz 5 | a, b ∈ F}.
Find a subspace W of F[z] such that F[z] = U ⊕ W .
Proof-Writing Exercises
(1) Let V be a vector space over F. Then, given a ∈ F and v ∈ V such that av = 0,
prove that either a = 0 or v = 0.
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 38
(2) Let V be a vector space over F, and suppose that W1 and W2 are subspaces of
V . Prove that their intersection W1 ∩ W2 is also a subspace of V .
(3) Prove or give a counterexample to the following claim:
Claim. Let V be a vector space over F, and suppose that W1 , W2 , and W3 are
subspaces of V such that W1 + W3 = W2 + W3 . Then W1 = W2 .
(4) Prove or give a counterexample to the following claim:
Claim. Let V be a vector space over F, and suppose that W1 , W2 , and W3 are
subspaces of V such that W1 ⊕ W3 = W2 ⊕ W3 . Then W1 = W2 .
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 39
Chapter 5
The intuitive notion of dimension of a space as the number of coordinates one needs
to uniquely specify a point in the space motivates the mathematical definition of
dimension of a vector space. In this Chapter, we will first introduce the notions of
linear span, linear independence, and basis of a vector space. Given a basis, we
will find a bijective correspondence between coordinates and elements in a vector
space, which leads to the definition of dimension of a vector space.
v = a1 v1 + a2 v2 + · · · + am vm .
Definition 5.1.1. The linear span (or simply span) of (v1 , . . . , vm ) is defined as
(1) vj ∈ span(v1 , v2 , . . . , vm ).
(2) span(v1 , v2 , . . . , vm ) is a subspace of V .
(3) If U ⊂ V is a subspace such that v1 , v2 , . . . vm ∈ U , then span(v1 , v2 , . . . , vm ) ⊂
U.
39
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 40
Example 5.1.5. The vectors v1 = (1, 1, 0) and v2 = (1, −1, 0) span a subspace of
R3 . More precisely, if we write the vectors in R3 as 3-tuples of the form (x, y, z),
then span(v1 , v2 ) is the xy-plane in R3 .
Then Fm [z] ⊂ F[z] is a subspace since Fm [z] contains the zero polynomial and is
closed under addition and scalar multiplication. In fact, Fm [z] is a finite-dimensional
subspace of F[z] since
Fm [z] = span(1, z, z 2 , . . . , z m ).
At the same time, though, note that F[z] itself is infinite-dimensional. To see this,
assume the contrary, namely that
We are now going to define the notion of linear independence of a list of vectors.
This concept will be extremely important in the sections that follow, and especially
when we introduce bases and the dimension of a vector space.
a1 v1 + · · · + am vm = 0
Lemma 5.2.6. The list of vectors (v1 , . . . , vm ) is linearly independent if and only
if every v ∈ span(v1 , . . . , vm ) can be uniquely written as a linear combination of
(v1 , . . . , vm ).
Proof.
(“=⇒”) Assume that (v1 , . . . , vm ) is a linearly independent list of vectors. Suppose
there are two ways of writing v ∈ span(v1 , . . . , vm ) as a linear combination of the
vi :
v = a1 v1 + · · · am vm ,
v = a1 v1 + · · · am vm .
Subtracting the two equations yields 0 = (a1 − a1 )v1 + · · · + (am − am )vm . Since
(v1 , . . . , vm ) is linearly independent, the only solution to this equation is a1 − a1 =
0, . . . , am − am = 0, or equivalently a1 = a1 , . . . , am = am .
(“⇐=”) Now assume that, for every v ∈ span(v1 , . . . , vm ), there are unique
a1 , . . . , am ∈ F such that
v = a1 v1 + · · · + am vm .
This implies, in particular, that the only way the zero vector v = 0 can be written
as a linear combination of v1 , . . . , vm is with a1 = · · · = am = 0. This shows that
(v1 , . . . , vm ) are linearly independent.
Example 5.2.8. The list (v1 , v2 , v3 ) = ((1, 1), (1, 2), (1, 0)) of vectors spans R2 . To
see this, take any vector v = (x, y) ∈ R2 . We want to show that v can be written as
a linear combination of (1, 1), (1, 2), (1, 0), i.e., that there exist scalars a1 , a2 , a3 ∈ F
such that
v = a1 (1, 1) + a2 (1, 2) + a3 (1, 0),
or equivalently that
(x, y) = (a1 + a2 + a3 , a1 + 2a2 ).
Clearly a1 = y, a2 = 0, and a3 = x − y form a solution for any choice of x, y ∈ R,
and so R2 = span((1, 1), (1, 2), (1, 0)). However, note that
2(1, 1) − (1, 2) − (1, 0) = (0, 0), (5.2)
which shows that the list ((1, 1), (1, 2), (1, 0)) is linearly dependent. The Linear
Dependence Lemma 5.2.7 thus states that one of the vectors can be dropped from
((1, 1), (1, 2), (1, 0)) and that the resulting list of vectors will still span R2 . Indeed,
by Equation (5.2),
v3 = (1, 0) = 2(1, 1) − (1, 2) = 2v1 − v2 ,
and so span((1, 1), (1, 2), (1, 0)) = span((1, 1), (1, 2)).
The next result shows that linearly independent lists of vectors that span a
finite-dimensional vector space are the smallest possible spanning sets.
Theorem 5.2.9. Let V be a finite-dimensional vector space. Suppose that
(v1 , . . . , vm ) is a linearly independent list of vectors that spans V , and let
(w1 , . . . , wn ) be any list that spans V . Then m ≤ n.
Proof. The proof uses the following iterative procedure: start with an arbitrary
list of vectors S0 = (w1 , . . . , wn ) such that V = span(S0 ). At the k th step of the
procedure, we construct a new list Sk by replacing some vector wjk by the vector
vk such that Sk still spans V . Repeating this for all vk then produces a new list Sm
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 44
5.3 Bases
Example 5.3.5. To see how Basis Reduction Theorem 5.3.4 works, consider the
list of vectors
S = ((1, −1, 0), (2, −2, 0), (−1, 0, 1), (0, −1, 1), (0, 1, 0)).
This list does not form a basis for R3 as it is not linearly independent. However,
it is clear that R3 = span(S) since any arbitrary vector v = (x, y, z) ∈ R3 can be
written as the following linear combination over S:
In fact, since the coefficients of (2, −2, 0) and (0, −1, 1) in this linear combination
are both zero, it suggests that they add nothing to the span of the subset
of S. Moreover, one can show that B is a basis for R3 , and it is exactly the basis
produced by applying the process from the proof of Theorem 5.3.4 (as you should
be able to verify).
5.4 Dimension
Definition 5.4.1. We call the length of any basis for V (which is well-defined by
Theorem 5.4.2 below) the dimension of V , and we denote this by dim(V ).
Note that Definition 5.4.1 only makes sense if, in fact, every basis for a given
finite-dimensional vector space has the same length. This is true by the following
theorem.
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 47
Theorem 5.4.2. Let V be a finite-dimensional vector space. Then any two bases
of V have the same length.
We conclude this chapter with some additional interesting results on bases and
dimensions. The first one combines the concepts of basis and direct sum.
v = a 1 u 1 + · · · + a m u m = b 1 w 1 + · · · + bn w n ,
or equivalently that
a1 u1 + · · · + am um − b1 w1 − · · · − bn wn = 0.
B = (v1 , . . . , vn , u1 , . . . , uk , w1 , . . . , w )
a1 v1 + · · · + an vn + b1 u1 + · · · + bk uk + c1 w1 + · · · + c w = 0, (5.3)
Calculational Exercises
(1) Show that the vectors v1 = (1, 1, 1), v2 = (1, 2, 3), and v3 = (2, −1, 1) are
linearly independent in R3 . Write v = (1, −2, 5) as a linear combination of v1 ,
v2 , and v3 .
(2) Consider the complex vector space V = C3 and the list (v1 , v2 , v3 ) of vectors in
V , where
v1 = (i, 0, 0), v2 = (i, 1, 0), v3 = (i, i, −1) .
(a) Prove that span(v1 , v2 , v3 ) = V .
(b) Prove or disprove: (v1 , v2 , v3 ) is a basis for V .
(3) Determine the dimension of each of the following subspaces of F4 .
(a) {(x1 , x2 , x3 , x4 ) ∈ F4 | x4 = 0}.
(b) {(x1 , x2 , x3 , x4 ) ∈ F4 | x4 = x1 + x2 }.
(c) {(x1 , x2 , x3 , x4 ) ∈ F4 | x4 = x1 + x2 , x3 = x1 − x2 }.
(d) {(x1 , x2 , x3 , x4 ) ∈ F4 | x4 = x1 + x2 , x3 = x1 − x2 , x3 + x4 = 2x1 }.
(e) {(x1 , x2 , x3 , x4 ) ∈ F4 | x1 = x2 = x3 = x4 }.
(4) Determine the value of λ ∈ R for which each list of vectors is linear dependent.
(a) ((λ, −1, −1), (−1, λ, −1), (−1, −1, λ)) as a subset of R3 .
(b) sin2 (x), cos(2x), λ as a subset of C(R).
(5) Consider the real vector space V = R4 . For each of the following five statements,
provide either a proof or a counterexample.
(a) dim V = 4.
(b) span((1, 1, 0, 0), (0, 1, 1, 0), (0, 0, 1, 1)) = V .
(c) The list ((1, −1, 0, 0), (0, 1, −1, 0), (0, 0, 1, −1), (−1, 0, 0, 1)) is linearly inde-
pendent.
(d) Every list of four vectors v1 , . . . , v4 ∈ V , such that span(v1 , . . . , v4 ) = V , is
linearly independent.
(e) Let v1 and v2 be two linearly independent vectors in V . Then, there exist
vectors u, w ∈ V , such that (v1 , v2 , u, w) is a basis for V .
Proof-Writing Exercises
(1) Let V be a vector space over F and define U = span(u1 , u2 , . . . , un ), where for
each i = 1, . . . , n, ui ∈ V . Now suppose v ∈ U . Prove
U = span(v, u1 , u2 , . . . , un ) .
(2) Let V be a vector space over F, and suppose that the list (v1 , v2 , . . . , vn ) of
vectors spans V , where each vi ∈ V . Prove that the list
(v1 − v2 , v2 − v3 , v3 − v4 , . . . , vn−2 − vn−1 , vn−1 − vn , vn )
also spans V .
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 50
(3) Let V be a vector space over F, and suppose that (v1 , v2 , . . . , vn ) is a linearly
independent list of vectors in V . Given any w ∈ V such that
(v1 + w, v2 + w, . . . , vn + w)
is a linearly dependent list of vectors in V , prove that w ∈ span(v1 , v2 , . . . , vn ).
(4) Let V be a finite-dimensional vector space over F with dim(V ) = n for some
n ∈ Z+ . Prove that there are n one-dimensional subspaces U1 , U2 , . . . , Un of V
such that
V = U1 ⊕ U2 ⊕ · · · ⊕ Un .
(5) Let V be a finite-dimensional vector space over F, and suppose that U is a
subspace of V for which dim(U ) = dim(V ). Prove that U = V .
(6) Let Fm [z] denote the vector space of all polynomials with degree less than or
equal to m ∈ Z+ and having coefficient over F, and suppose that p0 , p1 , . . . , pm ∈
Fm [z] satisfy pj (2) = 0. Prove that (p0 , p1 , . . . , pm ) is a linearly dependent list
of vectors in Fm [z].
(7) Let U and V be five-dimensional subspaces of R9 . Prove that U ∩ V = {0}.
(8) Let V be a finite-dimensional vector space over F, and suppose that
U1 , U2 , . . . , Um are any m subspaces of V . Prove that
dim(U1 + U2 + · · · + Um ) ≤ dim(U1 ) + dim(U2 ) + · · · + dim(Um ).
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 51
Chapter 6
Linear Maps
As discussed in Chapter 1, one of the main goals of Linear Algebra is the charac-
terization of solutions to a system of m linear equations in n unknowns x1 , . . . , xn ,
⎫
a11 x1 + · · · + a1n xn = b1 ⎪
⎬
.. .. ..
. . . ⎪,
⎭
am1 x1 + · · · + amn xn = bm
where each of the coefficients aij and bi is in F. Linear maps and their properties
give us insight into the characteristics of solutions to linear systems.
Throughout this chapter, V and W denote vector spaces over F. We are going
to study functions from V into W that have the special properties given in the
following definition.
Example 6.1.2.
51
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 52
Proof. First we verify that there is at most one linear map T with T (vi ) = wi . Take
any v ∈ V . Since (v1 , . . . , vn ) is a basis of V , there are unique scalars a1 , . . . , an ∈ F
such that v = a1 v1 + · · · + an vn . By linearity, we have
T (v) = T (a1 v1 + · · · + an vn ) = a1 T (v1 ) + · · · + an T (vn ) = a1 w1 + · · · + an wn , (6.3)
and hence T (v) is completely determined. To show existence, use Equation (6.3) to
define T . It remains to show that this T is linear and that T (vi ) = wi . These two
conditions are not hard to show and are left to the reader.
The set of linear maps L(V, W ) is itself a vector space. For S, T ∈ L(V, W )
addition is defined as
(S + T )v = Sv + T v, for all v ∈ V .
For a ∈ F and T ∈ L(V, W ), scalar multiplication is defined as
(aT )(v) = a(T v), for all v ∈ V .
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 53
Linear Maps 53
You should verify that S + T and aT are indeed linear maps and that all properties
of a vector space are satisfied.
In addition to the operations of vector addition and scalar multiplication, we
can also define the composition of linear maps. Let V, U, W be vector spaces
over F. Then, for S ∈ L(U, V ) and T ∈ L(V, W ), we define T ◦ S ∈ L(U, W ) by
The map T ◦ S is often also called the product of T and S denoted by T S. It has
the following properties:
Note that the product of linear maps is not always commutative. For example,
if we take T ∈ L(F[z], F[z]) to be the differentiation map T p(z) = p (z) and S ∈
L(F[z], F[z]) to be the map Sp(z) = z 2 p(z), then
null (T ) = {v ∈ V | T v = 0}.
Example 6.2.2. Let T ∈ L(F[z], F[z]) be the differentiation map T p(z) = p (z).
Then
Example 6.2.3. Consider the linear map T (x, y) = (x−2y, 3x+y) of Example 6.1.2.
To determine the null space, we need to solve T (x, y) = (0, 0), which is equivalent
to the system of linear equations
x − 2y = 0
.
3x + y = 0
We see that the only solution is (x, y) = (0, 0) so that null (T ) = {(0, 0)}.
Proof. We need to show that 0 ∈ null (T ) and that null (T ) is closed under addition
and scalar multiplication. By linearity, we have
so that T (0) = 0. Hence 0 ∈ null (T ). For closure under addition, let u, v ∈ null (T ).
Then
T (u + v) = T (u) + T (v) = 0 + 0 = 0,
and hence u + v ∈ null (T ). Similarly, for closure under scalar multiplication, let
u ∈ null (T ) and a ∈ F. Then
T (au) = aT (u) = a0 = 0,
and so au ∈ null (T ).
Definition 6.2.5. The linear map T : V → W is called injective if, for all u, v ∈ V ,
the condition T u = T v implies that u = v. In other words, different vectors in V
are mapped to different vectors in W .
Proof.
(“=⇒”) Suppose that T is injective. Since null (T ) is a subspace of V , we know
that 0 ∈ null (T ). Assume that there is another vector v ∈ V that is in the kernel.
Then T (v) = 0 = T (0). Since T is injective, this implies that v = 0, proving that
null (T ) = {0}.
(“⇐=”) Assume that null (T ) = {0}, and let u, v ∈ V be such that T u = T v. Then
0 = T u − T v = T (u − v) so that u − v ∈ null (T ). Hence u − v = 0, or, equivalently,
u = v. This shows that T is indeed injective.
Example 6.2.7.
(1) The differentiation map p(z) → p (z) is not injective since p (z) = q (z) implies
that p(z) = q(z) + c, where c ∈ F is a constant.
(2) The identity map I : V → V is injective.
(3) The linear map T : F[z] → F[z] given by T (p(z)) = z 2 p(z) is injective since it is
easy to verify that null (T ) = {0}.
(4) The linear map T (x, y) = (x − 2y, 3x + y) is injective since null (T ) = {(0, 0)},
as we calculated in Example 6.2.3.
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 55
Linear Maps 55
6.3 Range
Example 6.3.2. The range of the differentiation map T : F[z] → F[z] is range (T ) =
F[z] since, for every polynomial q ∈ F[z], there is a p ∈ F[z] such that p = q.
Example 6.3.3. The range of the linear map T (x, y) = (x − 2y, 3x + y) is R2 since,
for any (z1 , z2 ) ∈ R2 , we have T (x, y) = (z1 , z2 ) if (x, y) = 17 (z1 + 2z2 , −3z1 + z2 ).
Proof. We need to show that 0 ∈ range (T ) and that range (T ) is closed under
addition and scalar multiplication. We already showed that T 0 = 0 so that 0 ∈
range (T ).
For closure under addition, let w1 , w2 ∈ range (T ). Then there exist v1 , v2 ∈ V
such that T v1 = w1 and T v2 = w2 . Hence
T (v1 + v2 ) = T v1 + T v2 = w1 + w2 ,
and so w1 + w2 ∈ range (T ).
For closure under scalar multiplication, let w ∈ range (T ) and a ∈ F. Then there
exists a v ∈ V such that T v = w. Thus
T (av) = aT v = aw,
and so aw ∈ range (T ).
Example 6.3.6.
(1) The differentiation map T : F[z] → F[z] is surjective since range (T ) = F[z].
However, if we restrict ourselves to polynomials of degree at most m, then the
differentiation map T : Fm [z] → Fm [z] is not surjective since polynomials of
degree m are not in the range of T .
(2) The identity map I : V → V is surjective.
(3) The linear map T : F[z] → F[z] given by T (p(z)) = z 2 p(z) is not surjective
since, for example, there are no linear polynomials in the range of T .
(4) The linear map T (x, y) = (x − 2y, 3x + y) is surjective since range (T ) = R2 , as
we calculated in Example 6.3.3.
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 56
6.4 Homomorphisms
It should be mentioned that linear maps between vector spaces are also called
vector space homomorphisms. Instead of the notation L(V, W ), one often sees
the convention
HomF (V, W ) = {T : V → W | T is linear}.
A homomorphism T : V → W is also often called
• Monomorphism iff T is injective;
• Epimorphism iff T is surjective;
• Isomorphism iff T is bijective;
• Endomorphism iff V = W ;
• Automorphism iff V = W and T is bijective.
The next theorem is the key result of this chapter. It relates the dimension of the
kernel and range of a linear map.
Theorem 6.5.1. Let V be a finite-dimensional vector space and T : V → W be a
linear map. Then range (T ) is a finite-dimensional subspace of W and
dim(V ) = dim(null (T )) + dim(range (T )). (6.4)
Linear Maps 57
T (c1 v1 + · · · + cn vn ) = 0,
c 1 v 1 + · · · + c n v n = d1 u1 + · · · + d m um .
Now we will see that every linear map T ∈ L(V, W ), with V and W finite-
dimensional vector spaces, can be encoded by a matrix, and, vice versa, every
matrix defines such a linear map.
Let V and W be finite-dimensional vector spaces, and let T : V → W be a linear
map. Suppose that (v1 , . . . , vn ) is a basis of V and that (w1 , . . . , wm ) is a basis for
W . We have seen in Theorem 6.1.3 that T is uniquely determined by specifying the
vectors T v1 , . . . , T vn ∈ W . Since (w1 , . . . , wm ) is a basis of W , there exist unique
scalars aij ∈ F such that
aij = (T ej )i ,
However, if alternatively we take the bases ((1, 2), (0, 1)) for R2 and
((1, 0, 0), (0, 1, 0), (0, 0, 1)) for R3 , then T (1, 2) = (2, 5, 3) and T (0, 1) = (1, 2, 1)
so that
⎡ ⎤
21
⎣
M (T ) = 5 2⎦ .
31
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 59
Linear Maps 59
Example 6.6.4. Let S : R2 → R2 be the linear map S(x, y) = (y, x). With respect
to the basis ((1, 2), (0, 1)) for R2 , we have
S(1, 2) = (2, 1) = 2(1, 2) − 3(0, 1) and S(0, 1) = (1, 0) = 1(1, 2) − 2(0, 1),
and so
2 1
M (S) = .
−3 −2
Given vector spaces V and W of dimensions n and m, respectively, and given a
fixed choice of bases, note that there is a one-to-one correspondence between linear
maps in L(V, W ) and matrices in Fm×n . If we start with the linear map T , then
the matrix M (T ) = A = (aij ) is defined via Equation (6.5). Conversely, given the
matrix A = (aij ) ∈ Fm×n , we can define a linear map T : V → W by setting
m
T vj = aij wi .
i=1
Recall that the set of linear maps L(V, W ) is a vector space. Since we have
a one-to-one correspondence between linear maps and matrices, we can also make
the set of matrices Fm×n into a vector space. Given two matrices A = (aij ) and
B = (bij ) in Fm×n and given a scalar α ∈ F, we define the matrix addition and
scalar multiplication component-wise:
A + B = (aij + bij ),
αA = (αaij ).
Next, we show that the composition of linear maps imposes a product on
matrices, also called matrix multiplication. Suppose U, V, W are vector spaces
over F with bases (u1 , . . . , up ), (v1 , . . . , vn ) and (w1 , . . . , wm ), respectively. Let
S : U → V and T : V → W be linear maps. Then the product is a linear map
T ◦ S : U → W.
Each linear map has its corresponding matrix M (T ) = A, M (S) = B and
M (T S) = C. The question is whether C is determined by A and B. We have,
for each j ∈ {1, 2, . . . p}, that
(T ◦ S)uj = T (b1j v1 + · · · + bnj vn ) = b1j T v1 + · · · + bnj T vn
n n
m
= bkj T vk = bkj aik wi
k=1 k=1 i=1
m
n
= aik bkj wi .
i=1 k=1
Linear Maps 61
Example 6.6.9. Take the linear map S from Example 6.6.4 with basis ((1, 2), (0, 1))
of R2 . To determine the action on the vector v = (1, 4) ∈ R2 , note that v = (1, 4) =
1(1, 2) + 2(0, 1). Hence,
2 1 1 4
M (Sv) = M (S)M (v) = = .
−3 −2 2 −7
This means that
Sv = 4(1, 2) − 7(0, 1) = (4, 1),
which is indeed true.
6.7 Invertibility
Proof.
(“=⇒”) Suppose T is invertible.
To show that T is injective, suppose that u, v ∈ V are such that T u = T v.
Apply the inverse T −1 of T to obtain T −1 T u = T −1 T v so that u = v. Hence T is
injective.
To show that T is surjective, we need to show that, for every w ∈ W , there is a
v ∈ V such that T v = w. Take v = T −1 w ∈ V . Then T (T −1 w) = w. Hence T is
surjective.
(“⇐=”) Suppose that T is injective and surjective. We need to show that T is
invertible. We define a map S ∈ L(W, V ) as follows. Since T is surjective, we know
that, for every w ∈ W , there exists a v ∈ V such that T v = w. Moreover, since T
is injective, this v is uniquely determined. Hence, define Sw = v.
We claim that S is the inverse of T . Note that, for all w ∈ W , we have T Sw =
T v = w so that T S = IW . Similarly, for all v ∈ V , we have ST v = Sw = v so that
ST = IV .
Example 6.7.4. The linear map T (x, y) = (x − 2y, 3x + y) is both injective, since
null (T ) = {0}, and surjective, since range (T ) = R2 . Hence, T is invertible by
Proposition 6.7.2.
Definition 6.7.5. Two vector spaces V and W are called isomorphic if there
exists an invertible linear map T ∈ L(V, W ).
Theorem 6.7.6. Two finite-dimensional vector spaces V and W over F are iso-
morphic if and only if dim(V ) = dim(W ).
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 63
Linear Maps 63
Proof.
(“=⇒”) Suppose V and W are isomorphic. Then there exists an invertible linear
map T ∈ L(V, W ). Since T is invertible, it is injective and surjective, and so
null (T ) = {0} and range (T ) = W . Using the Dimension Formula, this implies that
dim(V ) = dim(null (T )) + dim(range (T )) = dim(W ).
(“⇐=”) Suppose that dim(V ) = dim(W ). Let (v1 , . . . , vn ) be a basis of V and
(w1 , . . . , wn ) be a basis of W . Define the linear map T : V → W as
T (a1 v1 + · · · + an vn ) = a1 w1 + · · · + an wn .
Since the scalars a1 , . . . , an ∈ F are arbitrary and (w1 , . . . , wn ) spans W , this means
that range (T ) = W and T is surjective. Also, since (w1 , . . . , wn ) is linearly indepen-
dent, T is injective (since a1 w1 + · · · + an wn = 0 implies that all a1 = · · · = an = 0
and hence only the zero vector is mapped to zero). It follows that T is both injective
and surjective; hence, by Proposition 6.7.2, T is invertible. Therefore, V and W
are isomorphic.
We close this chapter by considering the case of linear maps having equal domain
and codomain. As in Definition 6.1.1, a linear map T ∈ L(V, V ) is called a linear
operator on V . As the following remarkable theorem shows, the notions of injec-
tivity, surjectivity, and invertibility of a linear operator T are the same — as long
as V is finite-dimensional. A similar result does not hold for infinite-dimensional
vector spaces. For example, the set of all polynomials F[z] is an infinite-dimensional
vector space, and we saw that the differentiation map on F[z] is surjective but not
injective.
Theorem 6.7.7. Let V be a finite-dimensional vector space and T : V → V be a
linear map. Then the following are equivalent:
(1) T is invertible.
(2) T is injective.
(3) T is surjective.
Calculational Exercises
(1) Define the map T : R2 → R2 by T (x, y) = (x + y, x).
(a) Show that T is linear.
(b) Show that T is surjective.
(c) Find dim (null (T )).
(d) Find the matrix for T with respect to the canonical basis of R2 .
(e) Find the matrix for T with respect to the canonical basis for the domain
R2 and the basis ((1, 1), (1, −1)) for the target space R2 .
(f) Show that the map F : R2 → R2 given by F (x, y) = (x + y, x + 1) is not
linear.
(2) Let T ∈ L(R2 ) be defined by
x y x
T = , for all ∈ R2 .
y −x y
(a) Show that T is surjective.
(b) Find dim (null (T )).
(c) Find the matrix for T with respect to the canonical basis of R2 .
(d) Show that the map F : R2 → R2 given by F (x, y) = (x + y, x + 1) is not
linear.
(3) Consider the complex vector spaces C2 and C3 with their canonical bases, and
let S ∈ L(C3 , C2 ) be the linear map defined by S(v) = Av, ∀v ∈ C3 , where A is
the matrix
i 1 1
A = M (S) = .
2i −1 −1
Find a basis for null(S).
(4) Give an example of a function f : R2 → R having the property that
∀ a ∈ R, ∀ v ∈ R2 , f (av) = af (v)
but such that f is not a linear map.
(5) Show that the linear map T : F4 → F2 is surjective if
null(T ) = {(x1 , x2 , x3 , x4 ) ∈ F4 | x1 = 5x2 , x3 = 7x4 }.
(6) Show that no linear map T : F5 → F2 can have as its null space the set
{(x1 , x2 , x3 , x4 , x5 ) ∈ F5 | x1 = 3x2 , x3 = x4 = x5 }.
(7) Describe the set of solutions x = (x1 , x2 , x3 ) ∈ R3 of the system of equations
⎫
x1 − x2 + x3 = 0 ⎬
x1 + 2x2 + x3 = 0 .
⎭
2x1 + x2 + 2x3 = 0
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 65
Linear Maps 65
Proof-Writing Exercises
(1) Let V and W be vector spaces over F with V finite-dimensional, and let U be
any subspace of V . Given a linear map S ∈ L(U, W ), prove that there exists a
linear map T ∈ L(V, W ) such that, for every u ∈ U , S(u) = T (u).
(2) Let V and W be vector spaces over F, and suppose that T ∈ L(V, W ) is injective.
Given a linearly independent list (v1 , . . . , vn ) of vectors in V , prove that the list
(T (v1 ), . . . , T (vn )) is linearly independent in W .
(3) Let U , V , and W be vector spaces over F, and suppose that the linear maps
S ∈ L(U, V ) and T ∈ L(V, W ) are both injective. Prove that the composition
map T ◦ S is injective.
(4) Let V and W be vector spaces over F, and suppose that T ∈ L(V, W ) is surjec-
tive. Given a spanning list (v1 , . . . , vn ) for V , prove that
span(T (v1 ), . . . , T (vn )) = W.
(5) Let V and W be vector spaces over F with V finite-dimensional. Given T ∈
L(V, W ), prove that there is a subspace U of V such that
U ∩ null(T ) = {0} and range(T ) = {T (u) | u ∈ U }.
(6) Let V be a vector space over F, and suppose that there is a linear map T ∈
L(V, V ) such that both null(T ) and range(T ) are finite-dimensional subspaces
of V . Prove that V must also be finite-dimensional.
(7) Let U , V , and W be finite-dimensional vector spaces over F with S ∈ L(U, V )
and T ∈ L(V, W ). Prove that
dim(null(T ◦ S)) ≤ dim(null(T )) + dim(null(S)).
(8) Let V be a finite-dimensional vector space over F with S, T ∈ L(V, V ). Prove
that T ◦ S is invertible if and only if both S and T are invertible.
(9) Let V be a finite-dimensional vector space over F with S, T ∈ L(V, V ), and
denote by I the identity map on V . Prove that T ◦ S = I if and only if
S ◦ T = I.
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 67
Chapter 7
To begin our study, we will look at subspaces U of V that have special properties
under an operator T ∈ L(V, V ).
Tu ∈ U for all u ∈ U .
That is, U is invariant under T if the image of every vector in U under T remains
within U . We denote this as T U = {T u | u ∈ U } ⊂ U .
Example 7.1.2. The subspaces null (T ) and range (T ) are invariant subspaces
under T . To see this, let u ∈ null (T ). This means that T u = 0. But, since
0 ∈ null (T ), this implies that T u = 0 ∈ null (T ). Similarly, let u ∈ range (T ). Since
T v ∈ range (T ) for all v ∈ V , in particular we have T u ∈ range (T ).
67
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 68
with respect to the basis (e1 , e2 , e3 ). Then span(e1 , e2 ) and span(e3 ) are both
invariant subspaces under T .
An important special case of Definition 7.1.1 is that of one-dimensional invariant
subspaces under an operator T ∈ L(V, V ). If dim(U ) = 1, then there exists a non-
zero vector u ∈ V such that
U = {au | a ∈ F}.
In this case, we must have
T u = λu for some λ ∈ F.
This motivates the definitions of eigenvectors and eigenvalues of a linear operator,
as given in the next section.
7.2 Eigenvalues
so that y = −λx and x = λy. This implies that y = −λ2 y, i.e., λ2 = −1. The
solutions are hence λ = ±i. One can check that (1, −i) is an eigenvector with
eigenvalue i and that (1, i) is an eigenvector with eigenvalue −i.
Eigenspaces are important examples of invariant subspaces. Let T ∈ L(V, V ),
and let λ ∈ F be an eigenvalue of T . Then
Vλ = {v ∈ V | T v = λv}
is called an eigenspace of T . Equivalently,
Vλ = null (T − λI).
Note that Vλ = {0} since λ is an eigenvalue if and only if there exists a non-zero
vector u ∈ V such that T u = λu. We can reformulate this as follows:
• λ ∈ F is an eigenvalue of T if and only if the operator T − λI is not injective.
Since the notion of injectivity, surjectivity, and invertibility are equivalent for op-
erators on a finite-dimensional vector space, we can equivalently say either of the
following:
• λ ∈ F is an eigenvalue of T if and only if the operator T − λI is not surjective.
• λ ∈ F is an eigenvalue of T if and only if the operator T − λI is not invertible.
We close this section with two fundamental facts about eigenvalues and eigenvectors.
Theorem 7.2.3. Let T ∈ L(V, V ), and let λ1 , . . . , λm ∈ F be m distinct eigenvalues
of T with corresponding non-zero eigenvectors v1 , . . . , vm . Then (v1 , . . . , vm ) is
linearly independent.
Proof. Suppose that (v1 , . . . , vm ) is linearly dependent. Then, by the Linear De-
pendence Lemma, there exists an index k ∈ {2, . . . , m} such that
vk ∈ span(v1 , . . . , vk−1 )
and such that (v1 , . . . , vk−1 ) is linearly independent. This means that there exist
scalars a1 , . . . , ak−1 ∈ F such that
vk = a1 v1 + · · · + ak−1 vk−1 . (7.1)
Applying T to both sides yields, using the fact that vj is an eigenvector with eigen-
value λj ,
λk vk = a1 λ1 v1 + · · · + ak−1 λk−1 vk−1 .
Subtracting λk times Equation (7.1) from this, we obtain
0 = (λk − λ1 )a1 v1 + · · · + (λk − λk−1 )ak−1 vk−1 .
Since (v1 , . . . , vk−1 ) is linearly independent, we must have (λk − λj )aj = 0 for all
j = 1, 2, . . . , k − 1. By assumption, all eigenvalues are distinct, so λk − λj = 0,
which implies that aj = 0 for all j = 1, 2, . . . , k − 1. But then, by Equation (7.1),
vk = 0, which contradicts the assumption that all eigenvectors are non-zero. Hence
(v1 , . . . , vm ) is linearly independent.
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 70
Corollary 7.2.4. Any operator T ∈ L(V, V ) has at most dim(V ) distinct eigenval-
ues.
Note that if T has n = dim(V ) distinct eigenvalues, then there exists a basis
(v1 , . . . , vn ) of V such that
T vj = λj v j , for all j = 1, 2, . . . , n.
T v = λ1 a 1 v 1 + · · · + λn a n v n .
is mapped to
⎡ ⎤
λ1 a 1
⎢ ⎥
M (T v) = ⎣ ... ⎦ .
λn a n
This means that the matrix M (T ) for T with respect to the basis of eigenvectors
(v1 , . . . , vn ) is diagonal, and so we call T diagonalizable:
⎡ ⎤
λ1 0
⎢ .. ⎥
M (T ) = ⎣ . ⎦.
0 λn
In what follows, we want to study the question of when eigenvalues exist for a given
operator T . To answer this question, we will use polynomials p(z) ∈ F[z] evaluated
on operators T ∈ L(V, V ) (or, equivalently, on square matrices A ∈ Fn×n ). More
explicitly, given a polynomial
p(z) = a0 + a1 z + · · · + ak z k
p(T ) = a0 IV + a1 T + · · · + ak T k .
The results of this section will be for complex vector spaces. This is because
the proof of the existence of eigenvalues relies on the Fundamental Theorem of
Algebra from Chapter 3, which makes a statement about the existence of zeroes of
polynomials over C.
Theorem 7.4.1. Let V = {0} be a finite-dimensional vector space over C, and let
T ∈ L(V, V ). Then T has at least one eigenvalue.
(v, T v, T 2 v, . . . , T n v),
0 = a0 v + a1 T v + a2 T 2 v + · · · + an T n v.
Let m be the largest index for which am = 0. Since v = 0, we must have m > 0
(but possibly m = n). Consider the polynomial
p(z) = a0 + a1 z + · · · + am z m .
p(z) = c(z − λ1 ) · · · (z − λm ),
where c, λ1 , . . . , λm ∈ C and c = 0.
Therefore,
0 = a0 v + a1 T v + a2 T 2 v + · · · + an T n v = p(T )v
= c(T − λ1 I)(T − λ2 I) · · · (T − λm I)v,
and so at least one of the factors T − λj I must be non-injective. In other words,
this λj is an eigenvalue of T .
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 72
Note that the proof of Theorem 7.4.1 only uses basic concepts about linear
maps, which is the same approach as in a popular textbook called Linear Algebra
Done Right by Sheldon Axler. Many other textbooks rely on significantly more
difficult proofs using concepts like the determinant and characteristic polynomial of
a matrix. At the same time, it is often preferable to use the characteristic polynomial
of a matrix in order to compute eigen-information of an operator; we discuss this
approach in Chapter 8.
Note also that Theorem 7.4.1 does not hold for real vector spaces. E.g., as we
saw in Example 7.2.2, the rotation operator R on R2 has no eigenvalues.
What we will show next is that we can find a basis of V such that the matrix M (T )
is upper triangular.
where the entries ∗ can be anything and every entry below the main diagonal is
zero.
Here are two reasons why having an operator T represented by an upper trian-
gular matrix can be quite convenient:
(1) the eigenvalues are on the diagonal (as we will see later);
(2) it is easy to solve the corresponding system of linear equations by back substi-
tution (as discussed in Section A.3).
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 73
The next proposition tells us what upper triangularity means in terms of linear
operators and invariant subspaces.
(1) the matrix M (T ) with respect to the basis (v1 , . . . , vn ) is upper triangular;
(2) T vk ∈ span(v1 , . . . , vk ) for each k = 1, 2, . . . , n;
(3) span(v1 , . . . , vk ) is invariant under T for each k = 1, 2, . . . , n.
Proof. The equivalence of Condition 1 and Condition 2 follows easily from the
definition since Condition 2 implies that the matrix elements below the diagonal
are zero.
Clearly, Condition 3 implies Condition 2. To show that Condition 2 implies
Condition 3, note that any vector v ∈ span(v1 , . . . , vk ) can be written as v =
a1 v1 + · · · + ak vk . Applying T , we obtain
T v = a1 T v1 + · · · + ak T vk ∈ span(v1 , . . . , vk )
The next theorem shows that complex vector spaces indeed have some basis for
which the matrix of a given operator is upper triangular.
U = range (T − λI),
T u = (T − λI)u + λu,
The following are two very important facts about upper triangular matrices and
their associated operators.
Proposition 7.5.4. Suppose T ∈ L(V, V ) is a linear operator and that M (T ) is
upper triangular with respect to some basis of V . Then
(1) T is invertible if and only if all entries on the diagonal of M (T ) are non-zero.
(2) The eigenvalues of T are precisely the diagonal elements of M (T ).
Now suppose that T is not invertible. We need to show that at least one λk = 0.
The linear map T not being invertible implies that T is not injective. Hence, there
exists a vector 0 = v ∈ V such that T v = 0, and we can write
v = a1 v1 + · · · + ak vk
for some k, where ak = 0. Then
0 = T v = (a1 T v1 + · · · + ak−1 T vk−1 ) + ak T vk . (7.2)
Since T is upper triangular with respect to the basis (v1 , . . . , vn ), we know that
a1 T v1 + · · · + ak−1 T vk−1 ∈ span(v1 , . . . , vk−1 ). Hence, Equation (7.2) shows that
T vk ∈ span(v1 , . . . , vk−1 ), which implies that λk = 0.
−2 −1
Example 7.6.1. Let A = . Then p(λ) = (−2 − λ)(2 − λ) − (−1)(5) =
5 2
λ2 + 1, which is equal to zero exactly when λ = ±i. Moreover, if λ = i, then the
System (7.3) becomes
(−2 − i)v1 − v2 = 0
,
5v1 + (2 − i)v2 = 0
v
which is satisfied by any vector v = 1 ∈ C2 such that v2 = (−2 − i)v1 . Similarly,
v2
if λ = −i, then the System (7.3) becomes
(−2 + i)v1 − v2 = 0
,
5v1 + (2 + i)v2 = 0
v
which is satisfied by any vector v = 1 ∈ C2 such that v2 = (−2 + i)v1 .
v
2
−2 −1
It follows that, given A = , the linear operator on C2 defined by T (v) =
5 2
Av has eigenvalues λ = ±i, with associated eigenvectors as described above.
Example 7.6.2. Take the rotation Rθ : R2 → R2 by an angle θ ∈ [0, 2π) given by
the matrix
cos θ − sin θ
Rθ = .
sin θ cos θ
Then we obtain the eigenvalues by solving the polynomial equation
p(λ) = (cos θ − λ)2 + sin2 θ
= λ2 − 2λ cos θ + 1 = 0,
where we have used the fact that sin2 θ + cos2 θ = 1. Solving for λ in C, we obtain
Calculational Exercises
(1) Let T ∈ L(F2 , F2 ) be defined by
T (u, v) = (v, u)
for every u, v ∈ F. Compute the eigenvalues and associated eigenvectors for T .
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 77
(6) For each matrix A below, describe the invariant subspaces for the induced linear
operator T on F2 that maps each v ∈ F2 to T (v) = Av.
4 −1 01 23 10
(a) , (b) , (c) , (d)
2 1 −1 0 02 00
(a) Find the matrix of T with respect to the canonical basis for R2 (both as
the domain and the codomain of T ; call this matrix A).
(b) Verify that λ+ and λ− are eigenvalues of T by showing that v+ and v− are
eigenvectors, where
1 1
v+ = , v− = .
λ+ λ−
Proof-Writing Exercises
(1) Let V be a finite-dimensional vector space over F with T ∈ L(V, V ), and let
U1 , . . . , Um be subspaces of V that are invariant under T . Prove that U1 + · · · +
Um must then also be an invariant subspace of V under T .
(2) Let V be a finite-dimensional vector space over F with T ∈ L(V, V ), and suppose
that U1 and U2 are subspaces of V that are invariant under T . Prove that U1 ∩U2
is also an invariant subspace of V under T .
(3) Let V be a finite-dimensional vector space over F with T ∈ L(V, V ) invertible
and λ ∈ F \ {0}. Prove that λ is an eigenvalue for T if and only if λ−1 is an
eigenvalue for T −1 .
(4) Let V be a finite-dimensional vector space over F, and suppose that T ∈ L(V, V )
has the property that every v ∈ V is an eigenvector for T . Prove that T must
then be a scalar multiple of the identity function on V .
(5) Let V be a finite-dimensional vector space over F, and let S, T ∈ L(V ) be linear
operators on V with S invertible. Given any polynomial p(z) ∈ F[z], prove that
p(S ◦ T ◦ S −1 ) = S ◦ p(T ) ◦ S −1 .
Chapter 8
8.1 Permutations
81
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 82
Example 8.1.3. Suppose that we have a set of five distinct objects and that we
wish to describe the permutation that places the first item into the second position,
the second item into the fifth position, the third item into the first position, the
fourth item into the third position, and the fifth item into the fourth position. Then,
using the notation developed above, we have the permutation π ∈ S5 such that
|Sn | = n · (n − 1) · (n − 2) · · · · · 3 · 2 · 1 = n!
Example 8.1.5.
Keep in mind the fact that each element in S3 is simultaneously both a function
and a reordering operation. E.g., the permutation
1 2 3 123
π= =
π 1 π2 π3 231
can be read as defining the reordering that, with respect to the original list,
places the second element in the first position, the third element in the second
position, and the first element in the third position. This permutation could
equally well have been identified by describing its action on the (ordered) list
of letters a, b, c. In other words,
123 abc
= ,
231 bca
123
• π = has the two inversion pairs (1, 2) and (1, 3) since we have that
312
both π(1) =3 > 1 = π(2) and π(1) = 3 > 2 = π(3).
123
• π= has the three inversion pairs (1, 2), (1, 3), and (2, 3), as you can
321
check.
8.2 Determinants
where the sum is over all permutations of n elements (i.e., over the symmetric
group).
Note that each permutation in the summand of (8.4) permutes the n columns
of the n × n matrix.
Example 8.2.2. Suppose that A ∈ F2×2 is the 2 × 2 matrix
a11 a12
A= .
a21 a22
To calculate the determinant of A, we first list the two permutations in S2 :
12 12
id = and σ= .
12 21
The permutation id has sign 1, and the permutation σ has sign −1. Thus, the
determinant of A is given by
det(A) = a11 a22 − a12 a21 .
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 88
is finite, we are free to reorder the summands. In other words, the sum is indepen-
dent of the order in which the terms are added, and so we are free to permute the
term order without affecting the value of the sum. Some commonly used reorderings
of such sums are the following:
T (π) = T (σ ◦ π) (8.5)
π ∈ Sn π ∈ Sn
= T (π ◦ σ) (8.6)
π ∈ Sn
= T (π −1 ), (8.7)
π ∈ Sn
sign function on permutations, and properties of sums over the symmetric group as
discussed in Section 8.2.1 above. In thinking about these properties, it is useful to
keep in mind that, using Equation (8.4), the determinant of an n × n matrix A is
the sum over all possible ways of selecting n entries of A, where exactly one element
is selected from each row and from each column of A.
Theorem 8.2.3 (Properties of the Determinant). Let n ∈ Z+ be a positive integer,
and suppose that A = (aij ) ∈ Fn×n is an n × n matrix. Then
(1) det(0n×n ) = 0 and det(In ) = 1, where 0n×n denotes the n × n zero matrix and
In denotes the n × n identity matrix.
(2) det(AT ) = det(A), where AT denotes the transpose of A.
(3) denoting by A(·,1) , A(·,2) , . . . , A(·,n) ∈ Fn the columns of A, det(A) is a linear
function of column A(·,i) , for each i ∈ {1, . . . , n}. In other words, if we denote
! "
A = A(·,1) | A(·,2) | · · · | A(·,n)
then, given any scalar z ∈ F and any vectors a1 , a2 , . . . , an , c, b ∈ Fn ,
det [a1 | · · · | ai−1 | zai | · · · | an ] = z det [a1 | · · · | ai−1 | ai | · · · | an ] ,
det [a1 | · · · | ai−1 | b + c | · · · | an ] = det [a1 | · · · | b | · · · | an ]
+ det [a1 | · · · | c | · · · | an ] .
(4) det(A) is an antisymmetric function of the columns of A. In other
words,
(·,1) given any positive
integers 1 ≤ i < j ≤ n and denoting A =
A | A(·,2) | · · · | A(·,n) ,
! "
det(A) = − det A(·,1) | · · · | A(·,j) | · · · | A(·,i) | · · · | A(·,n) .
Proof. First, note that Properties (1), (3), (6), and (9) follow directly from the sum
given in Equation (8.4). Moreover, Property (5) follows directly from Property (4),
and Property (7) follows directly from Property (2). Thus, we only need to prove
Properties (2), (4), and (8).
Proof of (2). Since the entries of AT are obtained from those of A by inter-
changing the row and column indices, it follows that det(AT ) is given by
det(AT ) = sign(π) aπ(1),1 aπ(2),2 · · · aπ(n),n .
π ∈ Sn
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 90
Using the commutativity of the product in F and Equation (8.3), we see that
det(AT ) = sign(π −1 ) a1,π−1 (1) a2,π−1 (2) · · · an,π−1 (n) ,
π ∈ Sn
Define π̃ = π ◦ tij , and note that π = π̃ ◦ tij . In particular, π(i) = π̃(j) and
π(j) = π̃(i), from which
det(B) = sign(π̃ ◦ tij ) a1,π̃(1) · · · ai,π̃(i) · · · aj,π̃(j) · · · an,π̃(n) .
π ∈ Sn
It follows from Equations (8.2) and (8.1) that sign(π̃ ◦ tij ) = −sign (π̃). Thus, using
Equation (8.6), we obtain det(B) = − det(A).
Proof of (8). Using the standard expression for the matrix entries of the
product AB in terms of the matrix entries of A = (aij ) and B = (bij ), we have that
n
n
det(AB) = sign(π) ··· a1,k1 bk1 ,π(1) · · · an,kn bkn ,π(n)
π ∈ Sn k1 =1 kn =1
n
n
= ··· a1,k1 · · · an,kn sign (π)bk1 ,π(1) · · · bkn ,π(n) .
k1 =1 kn =1 π ∈ Sn
Now, proceeding with the same arguments as in the proof of Property (4) but with
the role of tij replaced by an arbitrary permutation σ, we obtain
det(AB) = sign(σ) a1,σ(1) · · · an,σ(n) sign(π◦σ −1 ) b1,π◦σ−1 (1) · · · bn,π◦σ−1 (n) .
σ ∈ Sn π ∈ Sn
Note that Properties (3) and (4) of Theorem 8.2.3 effectively summarize how
multiplication by an Elementary Matrix interacts with the determinant operation.
These Properties together with Property (9) facilitate numerical computation of
determinants of larger matrices.
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 91
1
Moreover, should A be invertible, then det(A−1 ) = .
det(A)
Definition 8.2.6. Let n ∈ Z+ and A ∈ Fn×n . Then, for each i, j ∈ {1, 2, . . . , n},
the i-j minor of A, denoted Mij , is defined to be the determinant of the matrix
obtained by removing the ith row and j th column from A. Moreover, the i-j cofactor
of A is defined to be
Aij = (−1)i+j Mij .
Cofactors themselves, though, are not terribly useful unless put together in the right
way.
Theorem 8.2.8. Let n ∈ Z+ and A ∈ Fn×n . Then every row and column factor
expansion of A is equal to the determinant of A.
Then, each of the resulting 3×3 determinants can be computed by further expansion:
−4 1 3
3 0 −3 = (−1)1+2 (1) 3 −3 + (−1)3+2 (−2) −4 3 = −15 + 6 = −9.
2 3 3 −3
2 −2 3
1 −3 4
4 1 −3
3 0 −3 = (−1)2+1 (3) −3 + (−1) 2+3
(−3)
−2 3 2 −2 = 3 + 12 = 15.
2 −2 3
It follows that the original determinant is then equal to −2(−9) + 2(15) = 48.
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 93
Calculational Exercises
(1) Let A ∈ C3×3 be given by
⎡ ⎤
1 0 i
A = ⎣ 0 1 0 ⎦.
−i 0 −1
(a) Calculate det(A).
(b) Find det(A4 ).
(2) (a) For each permutation π ∈ S3 , compute the number of inversions in π, and
classify π as being either an even or an odd permutation.
(b) Use your result from Part (a) to construct a formula for the determinant
of a 3 × 3 matrix.
(3) (a) For each permutation π ∈ S4 , compute the number of inversions in π, and
classify π as being either an even or an odd permutation.
(b) Use your result from Part (a) to construct a formula for the determinant
of a 4 × 4 matrix.
(4) Solve for the variable x in the following expression:
⎛⎡ ⎤⎞
1 0 −3
x −1
det = det ⎝⎣2 x −6 ⎦⎠ .
3 1−x
1 3 x−5
(5) Prove that the following determinant does not depend upon the value of θ:
⎛⎡ ⎤⎞
sin(θ) cos(θ) 0
det ⎝⎣ − cos(θ) sin(θ) 0⎦ ⎠ .
sin(θ) − cos(θ) sin(θ) + cos(θ) 1
(6) Given scalars α, β, γ ∈ F, prove that the following matrix is not invertible:
⎡ 2 ⎤
sin (α) sin2 (β) sin2 (γ)
⎣cos2 (α) cos2 (β) cos2 (γ)⎦ .
1 1 1
Hint: Compute the determinant.
Proof-Writing Exercises
(1) Let a, b, c, d, e, f ∈ F be scalars, and suppose that A and B are the following
matrices:
ab de
A= and B = .
0c 0f
b a−c
Prove that AB = BA if and only if det = 0.
e d−f
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 94
Chapter 9
The abstract definition of a vector space only takes into account algebraic properties
for the addition and scalar multiplication of vectors. For vectors in Rn , for example,
we also have geometric intuition involving the length of a vector or the angle formed
by two vectors. In this chapter we discuss inner product spaces, which are vector
spaces with an inner product defined upon them. Using the inner product, we will
define notions such as the length of a vector, orthogonality, and the angle between
non-zero vectors.
·, · : V × V → F
(u, v) → u, v
(1) Linearity in first slot: u + v, w = u, w + v, w and au, v = au, v for
all u, v, w ∈ V and a ∈ F;
(2) Positivity: v, v ≥ 0 for all v ∈ V ;
(3) Positive definiteness: v, v = 0 if and only if v = 0;
(4) Conjugate symmetry: u, v = v, u for all u, v ∈ V .
Remark 9.1.2. Recall that every real number x ∈ R equals its complex conjugate.
Hence, for real vector spaces, conjugate symmetry of an inner product becomes
actual symmetry.
Definition 9.1.3. An inner product space is a vector space over F together with
an inner product ·, ·.
95
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 96
We close this section by noting that the convention in physics is often the exact
opposite of what we have defined above. In other words, an inner product in physics
is traditionally linear in the second slot and anti-linear in the first slot.
9.2 Norms
The norm of a vector in an arbitrary inner product space is the analog of the length
or magnitude of a vector in Rn . We formally define this concept as follows.
Definition 9.2.1. Let V be a vector space over F. A map
·:V →R
v → v
is a norm on V if the following three conditions are satisfied.
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 97
v
x3
x2
x1
While it is always possible to start with an inner product and use it to define
a norm, the converse does not hold in general. One can prove that a norm can be
written in terms of an inner product as in Equation (9.1) if and only if the norm
satisfies the Parallelogram Law (Theorem 9.3.6).
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 98
9.3 Orthogonality
Using the inner product, we can now define the notion of orthogonality, prove that
the Pythagorean theorem holds in any inner product space, and use the Cauchy-
Schwarz inequality
to prove the triangle inequality. In particular, this will show
that v = v, v does indeed define a norm.
Definition 9.3.1. Two vectors u, v ∈ V are orthogonal (denoted u⊥v) if u, v =
0.
Note that the zero vector is the only vector that is orthogonal to itself. In fact,
the zero vector is orthogonal to every vector v ∈ V .
Theorem 9.3.2 (Pythagorean Theorem).
If u, v ∈ V , an inner product space, with
u⊥v, then · defined by v := v, v obeys
u + v2 = u2 + v2 .
Note that the converse of the Pythagorean Theorem holds for real vector spaces
since, in that case, u, v + v, u = 2Reu, v = 0.
Given two vectors u, v ∈ V with v = 0, we can uniquely decompose u into two
pieces: one piece parallel to v and one piece orthogonal to v. This is called an
orthogonal decomposition. More precisely, we have
u = u1 + u2 ,
where u1 = av and u2 ⊥v for some scalar a ∈ F. To obtain such a decomposition,
write u2 = u − u1 = u − av. Then, for u2 to be orthogonal to v, we need
0 = u − av, v = u, v − av2 .
Solving for a yields a = u, v/v2 so that
u, v u, v
u= v+ u− v . (9.3)
v2 v2
This decomposition is particularly useful since it allows us to provide a simple
proof for the Cauchy-Schwarz inequality.
Theorem 9.3.3 (Cauchy-Schwarz Inequality). Given any u, v ∈ V , we have
|u, v| ≤ uv.
Furthermore, equality holds if and only if u and v are linearly dependent, i.e., are
scalar multiples of each other.
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 99
Proof. If v = 0, then both sides of the inequality are zero. Hence, assume that
v = 0, and consider the orthogonal decomposition
u, v
u= v+w
v2
where w⊥v. By the Pythagorean theorem, we have
* *
* u, v *2
* + w2 = |u, v| + w2 ≥ |u, v| .
2 2
u = *
2 * v
v2 * v2 v2
Multiplying both sides by v2 and taking the square root then yields the Cauchy-
Schwarz inequality.
Note that we get equality in the above arguments if and only if w = 0. But, by
Equation (9.3), this means that u and v are linearly dependent.
The Cauchy-Schwarz inequality has many different proofs. Here is another one.
Alternate proof of Theorem 9.3.3. Given u, v ∈ V , consider the norm square of the
vector u + reiθ v:
0 ≤ u + reiθ v2 = u2 + r2 v2 + 2Re(reiθ u, v).
Since u, v is a complex number, one can choose θ so that eiθ u, v is real. Hence,
the right-hand side is a parabola ar2 + br + c with real coefficients. It will lie above
the real axis, i.e., ar2 + br + c ≥ 0, if it does not have any real solutions for r. This
is the case when the discriminant satisfies b2 − 4ac ≤ 0. In our case this means
4|u, v|2 − 4u2 v2 ≤ 0.
Moreover, equality only holds if r can be chosen such that u + reiθ v = 0, which
means that u and v are scalar multiples.
u+v
v
u+v‘
v‘
u
Remark 9.3.5. Note that equality holds for the triangle inequality if and only if
v = ru or u = rv for some r ≥ 0. Namely, equality in the proof happens only
if u, v = uv, which is equivalent to u and v being scalar multiples of one
another.
u + v2 + u − v2 = u + v, u + v + u − v, u − v
= u2 + v2 + u, v + v, u + u2 + v2 − u, v − v, u
= 2(u2 + v2 ).
We now define the notions of orthogonal basis and orthonormal basis for an inner
product space. As we will see later, orthonormal bases have special properties that
lead to useful simplifications in common linear algebra calculations.
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 101
u−v u+v
v
h
u g
Definition 9.4.1. Let V be an inner product space with inner product ·, ·. A list
of non-zero vectors (e1 , . . . , em ) in V is called orthogonal if
ei , ej = 0, for all 1 ≤ i = j ≤ m.
The list (e1 , . . . , em ) is called orthonormal if
ei , ej = δi,j , for all i, j = 1, . . . , m,
where δij is the Kronecker delta symbol. I.e., δij = 1 if i = j and is zero otherwise.
Proof. Let v ∈ V . Since (e1 , . . . , en ) is a basis for V , there exist unique scalars
a1 , . . . , an ∈ F such that
v = a1 e1 + · · · + an en .
Taking the inner product of both sides with respect to ek then yields v, ek =
ak .
Proof. The proof is constructive, that is, we will actually construct vectors
e1 , . . . , em having the desired properties. Since (v1 , . . . , vm ) is linearly indepen-
dent, vk = 0 for each k = 1, 2, . . . , m. Set e1 = vv11 . Then e1 is a vector of norm 1
and satisfies Equation (9.4) for k = 1. Next, set
v2 − v2 , e1 e1
e2 = .
v2 − v2 , e1 e1
This is, in fact, the normalized version of the orthogonal decomposition Equa-
tion (9.3). I.e.,
w = v2 − v2 , e1 e1 ,
where w⊥e1 . Note that e2 = 1 and span(e1 , e2 ) = span(v1 , v2 ).
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 103
Now, suppose that e1 , . . . , ek−1 have been constructed such that (e1 , . . . , ek−1 )
is an orthonormal list and span(v1 , . . . , vk−1 ) = span(e1 , . . . , ek−1 ). Then define
vk − vk , e1 e1 − vk , e2 e2 − · · · − vk , ek−1 ek−1
ek = .
vk − vk , e1 e1 − vk , e2 e2 − · · · − vk , ek−1 ek−1
Since (v1 , . . . , vk ) is linearly independent, we know that vk ∈ span(v1 , . . . , vk−1 ).
Hence, we also know that vk ∈ span(e1 , . . . , ek−1 ). It follows that the norm in
the definition of ek is not zero, and so ek is well-defined (i.e., we are not dividing
by zero). Note that a vector divided by its norm has norm 1 so that ek = 1.
Furthermore,
+ ,
vk − vk , e1 e1 − vk , e2 e2 − · · · − vk , ek−1 ek−1
ek , ei = , ei
vk − vk , e1 e1 − vk , e2 e2 − · · · − vk , ek−1 ek−1
vk , ei − vk , ei
= = 0,
vk − vk , e1 e1 − vk , e2 e2 − · · · − vk , ek−1 ek−1
for each 1 ≤ i < k. Hence, (e1 , . . . , ek ) is orthonormal.
From the definition of ek , we see that vk ∈ span(e1 , . . . , ek ) so that
span(v1 , . . . , vk ) ⊂ span(e1 , . . . , ek ). Since both lists (e1 , . . . , ek ) and (v1 , . . . , vk )
are linearly independent, they must span subspaces of the same dimension and
therefore are the same subspace. Hence Equation (9.4) holds.
3 1
u2 = v2 − v2 , e1 e1 = (2, 1, 1) − (1, 1, 0) = (1, −1, 2).
2 2
) √
Calculating the norm of u2 , we obtain u2 = 14 (1 + 1 + 4) = 26 . Hence, nor-
malizing this vector, we obtain
u2 1
e2 = = √ (1, −1, 2).
u2 6
The list (e1 , e2 ) is therefore orthonormal and has the same span as (v1 , v2 ).
Proof. Let (v1 , . . . , vn ) be any basis for V . This list is linearly independent and
spans V . Apply the Gram-Schmidt procedure to this list to obtain an orthonormal
list (e1 , . . . , en ), which still spans V by construction. By Proposition 9.4.2, this list
is linearly independent and hence a basis of V .
(1) V = U + U ⊥ .
(2) U ∩ U ⊥ = {0}.
Example 9.6.3. R2 is the direct sum of any two orthogonal lines, and R3 is the
direct sum of any plane and any line orthogonal to the plane (as illustrated in
Figure 9.4). For example,
R2 = {(x, 0) | x ∈ R} ⊕ {(0, y) | y ∈ R},
R3 = {(x, y, 0) | x, y ∈ R} ⊕ {(0, 0, z) | z ∈ R}.
Another fundamental fact about the orthogonal complement of a subspace is as
follows.
next proposition shows that PU v is the closest point in U to the vector v and that
this minimum is, in fact, unique.
v − PU v ≤ v − u for every u ∈ U .
where the second line follows from the Pythagorean Theorem 9.3.2 since v−P v ∈ U ⊥
and P v − u ∈ U . Furthermore, equality holds only if P v − u2 = 0, which is
equivalent to P v = u.
Calculational Exercises
(1) Let (e1 , e2 , e3 ) be the canonical basis of R3 , and define
f1 = e 1 + e2 + e3
f2 = e2 + e3
f3 = e3 .
Then, given any positive integer n ∈ Z+ , verify that the set of vectors
1 sin(x) sin(2x) sin(nx) cos(x) cos(2x) cos(nx)
√ , √ , √ ,..., √ , √ , √ ,..., √
2π π π π π π π
is orthonormal.
(3) Let R2 [x] denote the inner product space of polynomials over R having degree
at most two, with inner product given by
( 1
f, g = f (x)g(x)dx, for every f, g ∈ R2 [x].
0
Apply the Gram-Schmidt procedure to the standard basis {1, x, x2 } for R2 [x]
in order to produce an orthonormal basis for R2 [x].
(4) Let v1 , v2 , v3 ∈ R3 be given by v1 = (1, 2, 1), v2 = (1, −2, 1), and v3 = (1, 2, −1).
Apply the Gram-Schmidt procedure to the basis (v1 , v2 , v3 ) of R3 , and call the
resulting orthonormal basis (u1 , u2 , u3 ).
(5) Let P ⊂ R3 be the plane containing 0 perpendicular to the vector (1, 1, 1).
Using the standard norm, calculate the distance of the point (1, 2, 3) to P .
(6) Give an orthonormal basis for null(T ), where T ∈ L(C4 ) is the map with canon-
ical matrix
⎛ ⎞
1111
⎜1 1 1 1⎟
⎜ ⎟
⎝1 1 1 1⎠ .
1111
Proof-Writing Exercises
(1) Let V be a finite-dimensional inner product space over F. Given any vectors
u, v ∈ V , prove that the following two statements are equivalent:
(a) u, v = 0
(b) u ≤ u + αv for every α ∈ F.
(2) Let n ∈ Z+ be a positive integer, and let a1 , . . . , an , b1 , . . . , bn ∈ R be any
collection of 2n real numbers. Prove that
n 2 n n
b2
a k bk ≤ ka2k k
.
k
k=1 k=1 k=1
Chapter 10
Change of Bases
In Section 6.6, we saw that linear operators on an n-dimensional vector space are
in one-to-one correspondence with n × n matrices. This correspondence, however,
depends upon the choice of basis for the vector space. In this chapter we address
the question of how the matrix for a linear operator changes if we change from one
orthonormal basis to another.
Let V be a finite-dimensional inner product space with inner product ·, · and
dimension dim(V ) = n. Then V has an orthonormal basis e = (e1 , . . . , en ), and,
according to Theorem 9.4.6, every v ∈ V can be written as
n
v= v, ei ei .
i=1
This induces a map
[ · ]e : V → Fn
⎡ ⎤
v, e1
⎢ ⎥
v → ⎣ ... ⎦ ,
v, en
which maps the vector v ∈ V to the n × 1 column vector of its coordinates with
respect to the basis e. The column vector [v]e is called the coordinate vector of
v with respect to the basis e.
Example 10.1.1. Recall that the vector space R1 [x] of polynomials over R of
degree at most 1 is an inner product space with inner product defined by
( 1
f, g = f (x)g(x)dx.
√ 0
Then e = (1, 3(−1 + 2x)) forms an orthonormal basis for R1 [x]. The coordinate
vector of the polynomial p(x) = 3x + 2 ∈ R1 [x] is, e.g.,
1 7
[p(x)]e = √ .
2 3
111
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 112
Then
since
n
n
v, wV = v, ei ei , w, ej ej = v, ei w, ej ei , ej
i,j=1 i,j=1
n
n
= v, ei w, ej δij = v, ei w, ei = [v]e , [w]e Fn .
i,j=1 i=1
M (T ) = (T ej , ei )1≤i,j≤n ,
where i is the row index and j is the column index of the matrix.
Conversely, if A ∈ Fn×n is a matrix, then we can associate a linear operator
T ∈ L(V, V ) to A by setting
n
n
n
Tv = v, ej T ej = T ej , ei v, ej ei
j=1 j=1 i=1
⎛ ⎞
n n
= ⎝ aij v, ej ⎠ ei = (A[v]e )i ei ,
i=1 j=1 i=1
where (A[v]e )i denotes the ith component of the column vector A[v]e . With this
construction, we have M (T ) = A. The coefficients of T v in the basis (e1 , . . . , en )
are recorded by the column vector obtained by multiplying the n × n matrix A with
the n × 1 column vector [v]e whose components ([v]e )j = v, ej .
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 113
Hence,
[v]f = S[v]e ,
where
S = (sij )ni,j=1 with sij = ej , fi .
th
The j column of S is given by the coefficients of the expansion of ej in terms of
the basis f = (f1 , . . . , fn ). The matrix S describes a linear map in L(Fn ), which is
called the change of basis transformation.
We may also interchange the role of bases e and f . In this case, we obtain the
matrix R = (rij )ni,j=1 , where
rij = fj , ei .
Then, by the uniqueness of the expansion in a basis, we obtain
[v]e = R[v]f
so that
RS[v]e = [v]e , for all v ∈ V .
Since this equation is true for all [v]e ∈ F , it follows that either RS = I or R = S −1 .
n
In particular, S and R are invertible. We can also check this explicitly by using the
properties of orthonormal bases. Namely,
n n
(RS)ij = rik skj = fk , ei ej , fk
k=1 k=1
n
= ej , fk ei , fk = [ej ]f , [ei ]f Fn = δij .
k=1
Matrix S (and similarly also R) has the interesting property that its columns are
orthonormal to one another. This follows from the fact that the columns are the
coordinates of orthonormal vectors with respect to another orthonormal basis. A
similar statement holds for the rows of S (and similarly also R). More information
about orthogonal matrices can be found in Appendix A.5.1, in particular Defini-
tion A.5.3.
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 114
Example 10.2.2. Let V = C2 , and choose the orthonormal bases e = (e1 , e2 ) and
f = (f1 , f2 ) with
1 0
e1 = , e2 = ,
0 1
1 1 1 −1
f1 = √ , f2 = √ .
2 1 2 1
Then
e1 , f1 e2 , f1 1 1 1
S= =√
e1 , f2 e2 , f2 2 −1 1
and
f1 , e1 f2 , e1 1 1 −1
R= =√ .
f1 , e2 f2 , e2 2 1 1
One can then check explicitly that indeed
1 1 −1 1 1 10
RS = = = I.
2 1 1 −1 1 01
So far we have only discussed how the coordinate vector of a given vector v ∈ V
changes under the change of basis from e to f . The next question we can ask is
how the matrix M (T ) of an operator T ∈ L(V ) changes if we change the basis. Let
A be the matrix of T with respect to the basis e = (e1 , . . . , en ), and let B be the
matrix for T with respect to the basis f = (f1 , . . . , fn ). How do we determine B
from A? Note that
[T v]e = A[v]e
so that
B = SAS −1 .
Calculational Exercises
(1) Consider R3 with two orthonormal bases: the canonical basis e = (e1 , e2 , e3 )
and the basis f = (f1 , f2 , f3 ), where
1 1 1
f1 = √ (1, 1, 1), f2 = √ (1, −2, 1), f3 = √ (1, 0, −1) .
3 6 2
Find the matrix, S, of the change of basis transformation such that
[v]f = S[v]e , for all v ∈ R3 ,
where [v]b denotes the column vector of v with respect to the basis b.
(2) Let v ∈ C4 be the vector given by v = (1, i, −1, −i). Find the matrix (with
respect to the canonical basis on C4 ) of the orthogonal projection P ∈ L(C4 )
such that
null(P ) = {v}⊥ .
(3) Let U be the subspace of R3 that coincides with the plane through the origin
that is perpendicular to the vector n = (1, 1, 1) ∈ R3 .
(a) Find an orthonormal basis for U .
(b) Find the matrix (with respect to the canonical basis on R3 ) of the orthog-
onal projection P ∈ L(R3 ) onto U , i.e., such that range(P ) = U .
(4) Let V = C4 with its standard inner product. For θ ∈ R, let
⎛ ⎞
1
⎜ eiθ ⎟
vθ = ⎜ ⎟
⎝e2iθ ⎠ ∈ C .
4
e3iθ
Find the canonical matrix of the orthogonal projection onto the subspace
{vθ }⊥ .
Proof-Writing Exercises
(1) Let V be a finite-dimensional vector space over F with dimension n ∈ Z+ , and
suppose that b = (v1 , v2 , . . . , vn ) is a basis for V . Prove that the coordinate
vectors [v1 ]b , [v2 ]b , . . . , [vn ]b with respect to b form a basis for Fn .
(2) Let V be a finite-dimensional vector space over F, and suppose that T ∈ L(V )
is a linear operator having the following property: Given any two bases b and
c for V , the matrix M (T, b) for T with respect to b is the same as the matrix
M (T, c) for T with respect to c. Prove that there exists a scalar α ∈ F such
that T = αIV , where IV denotes the identity map on V .
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 117
Chapter 11
In this chapter we come back to the question of when a linear operator on an inner
product space V is diagonalizable. We first introduce the notion of the adjoint
(a.k.a. hermitian conjugate) of an operator, and we then use this to define so-called
normal operators. The main result of this chapter is the Spectral Theorem, which
states that normal operators are diagonal with respect to an orthonormal basis. We
use this to show that normal operators are “unitarily diagonalizable” and generalize
this notion to finding the singular-value decomposition of an operator. In this
chapter, we will always assume F = C.
Let V be a finite-dimensional inner product space over C with inner product ·, ·.
A linear operator T ∈ L(V ) is uniquely determined by the values of
T v, w, for all v, w ∈ V .
This means, in particular, that if T, S ∈ L(V ) and
T v, w = Sv, w for all v, w ∈ V ,
then T = S. To see this, take w to be the elements of an orthonormal basis of V .
Definition 11.1.1. Given T ∈ L(V ), the adjoint (a.k.a. hermitian conjugate)
of T is defined to be the operator T ∗ ∈ L(V ) for which
T v, w = v, T ∗ w, for all v, w ∈ V .
Moreover, we call T self-adjoint (a.k.a. hermitian) if T = T ∗ .
The uniqueness of T ∗ is clear by the previous observation.
Example 11.1.2. Let V = C3 , and let T ∈ L(C3 ) be defined by T (z1 , z2 , z3 ) =
(2z2 + iz3 , iz1 , z2 ). Then
(y1 , y2 , y3 ), T ∗ (z1 , z2 , z3 ) = T (y1 , y2 , y3 ), (z1 , z2 , z3 )
= (2y2 + iy3 , iy1 , y2 ), (z1 , z2 , z3 )
= 2y2 z1 + iy3 z1 + iy1 z2 + y2 z3
= (y1 , y2 , y3 ), (−iz2 , 2z1 + z3 , −iz1 )
117
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 118
so that T ∗ (z1 , z2 , z3 ) = (−iz2 , 2z1 + z3 , −iz1 ). Writing the matrix for T in terms of
the canonical basis, we see that
⎡ ⎤ ⎡ ⎤
02i 0 −i 0
M (T ) = ⎣ i 0 0⎦ and M (T ∗ ) = ⎣ 2 0 1⎦ .
010 −i 0 0
We collect several elementary properties of the adjoint operation into the follow-
ing proposition. You should provide a proof of these results for your own practice.
(1) (S + T )∗ = S ∗ + T ∗ .
(2) (aT )∗ = aT ∗ .
(3) (T ∗ )∗ = T .
(4) I ∗ = I.
(5) (ST )∗ = T ∗ S ∗ .
(6) M (T ∗ ) = M (T )∗ .
2 1+i
Example 11.1.5. The operator T ∈ L(V ) defined by T (v) = v is
1−i 3
self-adjoint, and it can be checked (e.g., using the characteristic polynomial) that
the eigenvalues of T are λ = 1, 4.
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 119
Normal operators are those that commute with their own adjoint. As we will see,
this includes many important examples of operations.
Proposition 11.2.2. Let V be a complex inner product space, and suppose that
T ∈ L(V ) satisfies
T v, v = 0, for all v ∈ V .
Then T = 0.
Proof. Note that Part 1 follows from Proposition 11.2.3 and the positive definiteness
of the norm.
To prove Part 2, first verify that if T is normal, then T − λI is also normal with
(T − λI)∗ = T ∗ − λI. Therefore, by Proposition 11.2.3, we have
0 = (T − λI)v = (T − λI)∗ v = (T ∗ − λI)v,
and so v is an eigenvector of T ∗ with eigenvalue λ.
Using Part 2, note that
(λ − μ)v, w = λv, w − v, μw = T v, w − v, T ∗ w = 0.
Since λ − μ = 0 it follows that v, w = 0, proving Part 3.
Proof.
(“=⇒”) Suppose that T is normal. Combining Theorem 7.5.3 and Corollary 9.5.5,
there exists an orthonormal basis e = (e1 , . . . , en ) for which the matrix M (T ) is
upper triangular, i.e.,
⎡ ⎤
a11 · · · a1n
⎢ . . .. ⎥
M (T ) = ⎣ . . ⎦.
0 ann
We will show that M (T ) is, in fact, diagonal, which implies that the basis elements
e1 , . . . , en are eigenvectors of T .
Since M (T ) = (aij )ni,j=1 with aij = 0 for i > j, we have T e1 = a11 e1 and
∗
#n
T e1 = k=1 a1k ek . Thus, by the Pythagorean Theorem and Proposition 11.2.3,
n
n
|a11 |2 = a11 e1 2 = T e1 2 = T ∗ e1 2 = a1k ek 2 = |a1k |2 ,
k=1 k=1
2 n−1 1+i 2n
n −1 n n −1 1 0 ∗ 3 (1 + 2 ) 3 (−1 + 2 )
A = (U DU ) = UD U =U 2n U = 1−i 2n 1 2n+1 ,
02 3 (−1 + 2 ) 3 (1 + 2 )
−1 e 0 −1 1 2e + e4 e4 − e + i(e4 − e)
exp(A) = U exp(D)U =U U = .
0 e4 3 e4 − e + i(e − e4 ) e + 2e4
Recall that self-adjoint operators are the operator analog for real numbers. Let us
now define the operator analog for positive (or, more precisely, non-negative) real
numbers.
Definition 11.5.1. An operator T ∈ L(V ) is called positive (denoted T ≥ 0) if
T = T ∗ and T v, v ≥ 0 for all v ∈ V .
(If V is a complex vector space, then the condition of self-adjointness follows
from the condition T v, v ≥ 0 and hence can be dropped.)
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 125
T e i = λi e i ,
where λi are the eigenvalues of T with respect to the orthonormal basis e =
(e1 , . . . , en ). We know that these exist by the Spectral Theorem.
Continuing the analogy between C and L(V ), recall the polar form of a complex
number z = |z|eiθ , where |z| is the absolute value or modulus of z and eiθ lies on
the unit circle in R2 . In terms of an operator T ∈ L(V ), where V is a complex inner
product space, a unitary operator U takes the role of eiθ ,√and |T | takes the role of
the modulus. As in Section 11.5, T ∗ T ≥ 0 so that |T | := T ∗ T exists and satisfies
|T | ≥ 0 as well.
Theorem 11.6.1. For each T ∈ L(V ), there exists a unitary U such that
T = U |T |.
This is called the polar decomposition of T .
Note that null (|T |)⊥range (|T |), i.e., for v ∈ null (|T |) and w = |T |u ∈ range (|T |),
since |T | is self-adjoint.
Pick an orthonormal basis e = (e1 , . . . , em ) of null (|T |) and an orthonormal basis
f = (f1 , . . . , fm ) of (range (T ))⊥ . Set S̃ei = fi , and extend S̃ to all of null (|T |)
by linearity. Since null (|T |)⊥range (|T |), any v ∈ V can be uniquely written as
v = v1 + v2 , where v1 ∈ null (|T |) and v2 ∈ range (|T |). Now define U : V → V by
setting U v = S̃v1 + Sv2 . Then U is an isometry. Moreover, U is also unitary, as
shown by the following calculation application of the Pythagorean theorem:
where the notation M (T ; e, e) indicates that the basis e is used both for the domain
and codomain of T . The Spectral Theorem tells us that unitary diagonalization can
only be done for normal operators. In general, we can find two orthonormal bases
e and f such that
⎡ ⎤
s1 0
⎢ ⎥
M (T ; e, f ) = ⎣ . . . ⎦ ,
0 sn
which means that T ei = si fi even if T is not normal. The scalars si are called
singular values of T . If T is diagonalizable, then these are the absolute values of
the eigenvalues.
and hence
T v = U |T |v = s1 v, e1 U e1 + · · · + sn v, en U en .
Calculational Exercises
(1) Consider R3 with two orthonormal bases: the canonical basis e = (e1 , e2 , e3 )
and the basis f = (f1 , f2 , f3 ), where
1 1 1
f1 = √ (1, 1, 1), f2 = √ (1, −2, 1), f3 = √ (1, 0, −1) .
3 6 2
Find the canonical matrix, A, of the linear map T ∈ L(R3 ) with eigenvectors
f1 , f2 , f3 and eigenvalues 1, 1/2, −1/2, respectively.
(2) For each of the following matrices, verify that A is Hermitian by showing that
A = A∗ , find a unitary matrix U such that U −1 AU is a diagonal matrix, and
compute exp(A).
4 1−i 3 −i 6 2 + 2i
(a) A = (b) A = (c) A =
1+i 5 i 3 2 − 2i 4
⎡ −i ⎤
⎡ ⎤ 2 √i2 √
5 0 0 2
0 3+i ⎢ −i ⎥
(d) A = (e) A = ⎣ 0 −1 −1 + i ⎦ (f) A = ⎢ √
⎣ 2
2 0 ⎥
⎦
3 − i −3
0 −1 − i 0 √i
0 2
2
(3) For each of the following matrices, either find a matrix P (not necessarily uni-
tary) such that P −1 AP is a diagonal matrix, or show why no such matrix exists.
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
19 −9 −6 −1 4 −2 500
(a) A = ⎣ 25 −11 −9 ⎦ (b) A = ⎣ −3 4 0 ⎦ (c) A = ⎣ 1 5 0 ⎦
17 −9 −4 −3 1 3 015
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
000 −i 1 1 00i
(d) A = ⎣ 0 0 0 ⎦ (e) A = ⎣ −i 1 1 ⎦ (f) A = ⎣ 4 0 i ⎦
301 −i 1 1 00i
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 128
(4) Let r ∈ R and let T ∈ L(C2 ) be the linear map with canonical matrix
1 −1
T = .
−1 r
(a) Find the eigenvalues of T .
(b) Find an orthonormal basis of C2 consisting of eigenvectors of T .
(c) Find a unitary matrix U such that U T U ∗ is diagonal.
(5) Let A be the complex matrix given by:
⎡ ⎤
5 0 0
A = ⎣ 0 −1 −1 + i ⎦
0 −1 − i 0
(a) Find the eigenvalues of A.
(b) Find an orthonormal
√ basis of eigenvectors of A.
(c) Calculate |A| = A∗ A.
(d) Calculate eA .
(6) Let θ ∈ R, and let T ∈ L(C2 ) have canonical matrix
1 eiθ
M (T ) = −iθ .
e −1
(a) Find the eigenvalues of T .
(b) Find an orthonormal basis for C2 that consists of eigenvectors for T .
Proof-Writing Exercises
(1) Prove or give a counterexample: The product of any two self-adjoint operators
on a finite-dimensional vector space is self-adjoint.
(2) Prove or give a counterexample: Every unitary matrix is invertible.
(3) Let V be a finite-dimensional vector space over F, and suppose that T ∈ L(V )
satisfies T 2 = T . Prove that T is an orthogonal projection if and only if T is
self-adjoint.
(4) Let V be a finite-dimensional inner product space over C, and suppose that
T ∈ L(V ) has the property that T ∗ = −T . (We call T a skew Hermitian
operator on V .)
(a) Prove that the operator iT ∈ L(V ) defined by (iT )(v) = i(T (v)), for each
v ∈ V , is Hermitian.
(b) Prove that the canonical matrix for T can be unitarily diagonalized.
(c) Prove that T has purely imaginary eigenvalues.
(5) Let V be a finite-dimensional vector space over F, and suppose that S, T ∈ L(V )
are positive operators on V . Prove that S + T is also a positive operator on T .
(6) Let V be a finite-dimensional vector space over F, and let T ∈ L(V ) be any
operator on V . Prove that T is invertible if and only if 0 is not a singular value
of T .
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 129
Appendix A
As discussed in Chapter 1, there are many ways in which you might try to solve a sys-
tem of linear equation involving a finite number of variables. These supplementary
notes are intended to illustrate the use of Linear Algebra in solving such systems.
In particular, any arbitrary number of equations in any number of unknowns — as
long as both are finite — can be encoded as a single matrix equation. As you
will see, this has many computational advantages, but, perhaps more importantly,
it also allows us to better understand linear systems abstractly. Specifically, by
exploiting the deep connection between matrices and so-called linear maps, one can
completely determine all possible solutions to any linear system.
These notes are also intended to provide a self-contained introduction to matrices
and important matrix operations. As you read the sections below, remember that
a matrix is, in general, nothing more than a rectangular array of real or complex
numbers. Matrices are not linear maps. Instead, a matrix can (and will often) be
used to define a linear map.
We begin this section by reviewing the definition of and notation for matrices. We
then review several different conventions for denoting and studying systems of linear
equations. This point of view has a long history of exploration, and numerous
computational devices — including several computer programming languages —
have been developed and optimized specifically for analyzing matrix equations.
129
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 130
virtually every usage also involves the notion of vector, by which we mean here
either an m × 1 matrix (a.k.a. a row vector) or a 1 × n matrix (a.k.a. a column
vector).
Example A.1.3. Suppose that A = (aij ), B = (bij ), C = (cij ), D = (dij ), and
E = (eij ) are the following matrices over F:
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
3 15 2 613
4 −1
A = ⎣ −1 ⎦, B = , C = 1, 4, 2 , D = ⎣ −1 0 1 ⎦, E = ⎣ −1 1 2 ⎦.
0 2
1 32 4 413
Then we say that A is a 3 × 1 matrix (a.k.a. a column vector), B is a 2 × 2 square
matrix, C is a 1 × 3 matrix (a.k.a. a row vector), and both D and E are square
3 × 3 matrices. Moreover, only B is an upper-triangular matrix (as defined below),
and none of the matrices in this example are diagonal matrices.
We can discuss individual entries in each matrix. E.g.,
• the 2nd row of D is d21 = −1, d22 = 0, and d23 = 1.
• the main diagonal of D is the sequence d11 = 1, d22 = 0, d33 = 4.
• the skew main diagonal of D is the sequence d13 = 2, d22 = 0, d31 = 3.
• the off-diagonal entries of D are (by row) d12 , d13 , d21 , d23 , d31 , and d32 .
• the 2nd column of E is e12 = e22 = e32 = 1.
• the superdiagonal of E is the sequence e12 = 1, e23 = 2.
• the subdiagonal of E is the sequence e21 = −1, e32 = 1.
A square matrix A = (aij ) ∈ Fn×n is called upper triangular (resp. lower
triangular) if aij = 0 for each pair of integers i, j ∈ {1, . . . , n} such that i > j
(resp. i < j). In other words, A is triangular if it has the form
⎡ ⎤ ⎡ ⎤
a11 a12 a13 · · · a1n a11 0 0 · · · 0
⎢ 0 a22 a23 · · · a2n ⎥ ⎢ a21 a22 0 · · · 0 ⎥
⎢ ⎥ ⎢ ⎥
⎢ 0 0 a33 · · · a3n ⎥ ⎢ ⎥
⎢ ⎥ or ⎢ a31 a32 a33 · · · 0 ⎥ .
⎢ . . . . ⎥ ⎢ ⎥
⎣ .. .. .. . . ... ⎦ ⎣ ... ... ... . . . ... ⎦
0 0 0 · · · ann an1 an2 an3 · · · ann
Note that a diagonal matrix is simultaneously both an upper triangular matrix and
a lower triangular matrix.
Two particularly important examples of diagonal matrices are defined as follows:
Given any positive integer n ∈ Z+ , we can construct the identity matrix In and
the zero matrix 0n×n by setting
⎡ ⎤ ⎡ ⎤
1 0 0 ··· 0 0 0 0 0 ··· 0 0
⎢ 0 1 0 · · · 0 0⎥ ⎢ 0 0 0 · · · 0 0⎥
⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥
⎢ 0 0 1 · · · 0 0⎥ ⎢ 0 0 0 · · · 0 0⎥
⎥
In = ⎢ . . . . . . ⎥ and 0n×n = ⎢ . . . . . . ⎥
⎢ ⎢
⎥,
⎢ .. .. .. . . .. .. ⎥ ⎢ .. .. .. . . .. .. ⎥
⎢ ⎥ ⎢ ⎥
⎣ 0 0 0 · · · 1 0⎦ ⎣ 0 0 0 · · · 0 0⎦
0 0 0 ··· 0 1 0 0 0 ··· 0 0
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 132
⎡ ⎤⎫
0 0 0 ··· 0 0 ⎪ ⎪
⎢ 0 0 0 · · · 0 0⎥ ⎪ ⎪
⎢ ⎥⎪⎪
⎢ ⎥⎪⎪
⎢ 0 0 0 · · · 0 0⎥ ⎬
0m×n ⎢ ⎥
= ⎢ . . . . . . ⎥ m rows
⎢ .. .. .. . . .. .. ⎥⎪⎪
⎢ ⎥⎪⎪
⎣ 0 0 0 · · · 0 0⎦ ⎪ ⎪
⎪
⎪
⎭
0 0 0 ··· 0 0
n columns
Then the left-hand side of the ith equation in System (A.1) can be recovered by
taking the dot product (a.k.a. Euclidean inner product) of x with the ith row
in A:
n
ai1 ai2 · · · ain · x = aij xj = ai1 x1 + ai2 x2 + ai3 x3 + · · · + ain xn .
j=1
In general, we can extend the dot product between two vectors in order to form
the product of any two matrices (as in Section A.2.2). For the purposes of this
section, though, it suffices to define the product of the matrix A ∈ Fm×n and the
vector x ∈ Fn to be
⎡ ⎤⎡ ⎤ ⎡ ⎤
a11 a12 · · · a1n x1 a11 x1 + a12 x2 + · · · + a1n xn
⎢ a21 a22 · · · a2n ⎥ ⎢ x2 ⎥ ⎢ a21 x1 + a22 x2 + · · · + a2n xn ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
Ax = ⎢ . .. . . .. ⎥ ⎢ .. ⎥ = ⎢ .. ⎥. (A.2)
⎣ .. . . . ⎦⎣ . ⎦ ⎣ . ⎦
am1 am2 · · · amn xn am1 x1 + am2 x2 + · · · + amn xn
Then, since each entry in the resulting m × 1 column vector Ax ∈ Fm corresponds
exactly to the left-hand side of each equation in System (A.1), we have effectively
encoded System (A.1) as the single matrix equation
⎡ ⎤
a11 x1 + a12 x2 + · · · + a1n xn ⎡ ⎤
⎢ a21 x1 + a22 x2 + · · · + a2n xn ⎥ b1
⎢ ⎥ ⎢ .. ⎥
Ax = ⎢ .. ⎥ = ⎣ . ⎦ = b. (A.3)
⎣ . ⎦
bm
am1 x1 + am2 x2 + · · · + amn xn
Example A.1.4. The linear system
⎫
x1 + 6x2 + 4x5 − 2x6 = 14 ⎬
x3 + 3x5 + x6 = −3
⎭
x4 + 5x5 + 2x6 = 11
has three equations and involves the six variables x1 , x2 , . . . , x6 . One can check that
possible solutions to this system include
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
x1 14 x1 6
⎢x ⎥ ⎢ 0 ⎥ ⎢x ⎥ ⎢ 1 ⎥
⎢ 2⎥ ⎢ ⎥ ⎢ 2⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢x3 ⎥ ⎢−3⎥ ⎢x3 ⎥ ⎢−9⎥
⎢ ⎥ = ⎢ ⎥ and ⎢ ⎥ = ⎢ ⎥ .
⎢x4 ⎥ ⎢ 11 ⎥ ⎢x4 ⎥ ⎢−5⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎣x6 ⎦ ⎣ 0 ⎦ ⎣ x6 ⎦ ⎣ 2 ⎦
x6 0 x6 3
Note that, in describing these solutions, we have used the six unknowns
x1 , x2 , . . . , x6 to form the 6 × 1 column vector x = (xi ) ∈ F6 . We can similarly
form the coefficient matrix A ∈ F3×6 and the 3 × 1 column vector b ∈ F3 , where
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 6 0 0 4 −2 b1 14
A = ⎣0 0 1 0 3 1 ⎦ and ⎣b2 ⎦ = ⎣−3⎦ .
00015 2 b3 11
You should check that, given these matrices, each of the solutions given above
satisfies Equation (A.3).
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 134
Of course, it is not enough to just assert that Fm×n is a vector space since
we have yet to verify that the above defined operations of addition and scalar
multiplication satisfy the vector space axioms. The proof of the following theorem
is straightforward and something that you should work through for practice with
matrix notation.
Theorem A.2.3. Given positive integers m, n ∈ Z+ and the operations of matrix
addition and scalar multiplication as defined above, the set Fm×n of all m × n
matrices satisfies each of the following properties.
In other words, Fm×n forms a vector space under the operations of matrix addition
and scalar multiplication.
As a consequence of Theorem A.2.3, every property that holds for an arbitrary
vector space can be taken as a property of Fm×n specifically. We highlight some of
these properties in the following corollary to Theorem A.2.3.
Corollary A.2.4. Given positive integers m, n ∈ Z+ and the operations of matrix
addition and scalar multiplication as defined above, the set Fm×n of all m × n
matrices satisfies each of the following properties:
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 137
(1) Given any matrix A ∈ Fm×n , given any scalar α ∈ F, and denoting by 0 the
additive identity of F,
0A = A and α0m×n = 0m×n .
(2) Given any matrix A ∈ Fm×n and any scalar α ∈ F,
αA = 0 =⇒ either α = 0 or A = 0m×n .
(3) Given any matrix A ∈ Fm×n and any scalar α ∈ F,
−(αA) = (−α)A = α(−A).
In particular, the additive inverse −A of A is given by −A = (−1)A, where −1
denotes the additive inverse for the additive identity of F.
While one could prove Corollary A.2.4 directly from definitions, the point of recog-
nizing Fm×n as a vector space is that you get to use these results without worrying
about their proof. Moreover, there is no need for separate proofs for F = R and
F = C.
In particular, note that the “i, j entry” of the matrix product AB involves a sum-
mation over the positive integer k = 1, . . . , s, where s is both the number of columns
in A and the number of rows in B. Thus, this multiplication is only defined when
the “middle” dimension of each matrix is the same:
⎧⎡ ⎤ ⎡ ⎤⎫
⎪
⎨ a11 ··· a1s · · · b1t ⎪
b11 ⎬
⎢ .. ⎥ ⎢ ... . .. ⎥
(aij )r×s (bij )s×t = r ⎣ ... ..
. . ⎦ ⎣ . ⎦
. . ⎪s
⎪
⎩ ⎭
ar1 ··· ars · · · bst
bs1
s t
⎡#s #s ⎤⎫
k=1 ···
a1k bk1 k=1 a1k bkt ⎪ ⎬
⎢ .. .. .. ⎥
= ⎣ . . . ⎦ r
#s #s ⎪
⎭
···
k=1 ark bk1 k=1 ark bkt
t
Alternatively, if we let n ∈ Z+ be a positive integer, then another way of viewing
matrix multiplication is through the use of the standard inner product on Fn =
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 138
F1×n = Fn×1 . In particular, we define the dot product (a.k.a. Euclidean inner
product) of the row vector x = (x1j ) ∈ F1×n and the column vector y = (yi1 ) ∈
Fn×1 to be
⎡ ⎤
y11
⎢ . ⎥ n
x · y = x11 , · · · , x1n · ⎣ .. ⎦ = x1k yk1 ∈ F.
k=1
yn1
We can then decompose matrices A = (aij )r×s and B = (bij )s×t into their con-
stituent row vectors by fixing a positive integer k ∈ Z+ and setting
A(k,·) = ak1 , · · · , aks ∈ F1×s and B (k,·) = bk1 , · · · , bkt ∈ F1×t .
Similarly, fixing ∈ Z+ , we can also decompose A and B into the column vectors
⎡ ⎤ ⎡ ⎤
a1 b1
⎢ .. ⎥ ⎢ .. ⎥
A(·,) =⎣ . ⎦∈F r×1
and B (·,)
= ⎣ . ⎦ ∈ Fs×1 .
ar bs
It follows that the product AB is the following matrix of dot products:
⎡ (1,·) ⎤
A · B (·,1) · · · A(1,·) · B (·,t)
⎢ .. .. .. ⎥
⎦∈F .
r×t
AB = ⎣ . . .
A(r,·) · B (·,1) · · · A(r,·) · B (·,t)
Example A.2.5. With the notation as in Example A.1.3, the reader is advised to
use the above definitions to verify that the following matrix products hold.
⎡ ⎤ ⎡ ⎤
3 3 12 6
AC = ⎣ −1 ⎦ 1, 4, 2 = ⎣ −1 −4 −2 ⎦ ∈ F3×3 ,
1 1 4 2
⎡ ⎤
3
CA = 1, 4, 2 ⎣ −1 ⎦ = 3 − 4 + 2 = 1 ∈ F,
1
4 −1 4 −1 16 −6
2
B = BB = = ∈ F2×2 ,
0 2 0 2 0 4
⎡ ⎤
613
CE = 1, 4, 2 ⎣ −1 1 2 ⎦ = 10, 7, 17 ∈ F1×3 , and
413
⎡ ⎤⎡ ⎤ ⎡ ⎤
152 3 0
DA = ⎣ −1 0 1 ⎦ ⎣ −1 ⎦ = ⎣ −2 ⎦ ∈ F3×1 .
324 1 11
Note, though, that B cannot be multiplied by any of the other matrices, nor does it
make sense to try to form the products AD, AE, DC, and EC due to the inherent
size mismatches.
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 139
A(BC) = (AB)C.
As with Theorem A.2.3, you should work through a proof of each part of Theo-
rem A.2.6 (and especially of the first part) in order to practice manipulating the
indices of entries correctly. We state and prove a useful followup to Theorems A.2.3
and A.2.6 as an illustration.
In other words, the set of all n × n upper triangular matrices forms an algebra over
F.
Moreover, each of the above statements still holds when upper triangular is
replaced by lower triangular.
Proof. The proofs of Parts 1 and 2 are straightforward and follow directly from
the appropriate definitions. Moreover, the proof of the case for lower triangular
matrices follows from the fact that a matrix A is upper triangular if and only if AT
is lower triangular, where AT denotes the transpose of A. (See Section A.5.1 for
the definition of transpose.)
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 140
To prove Part 3, we start from the definition of the matrix product. Denoting
A = (aij ) and B = (bij ), note that AB = ((ab)ij ) is an n × n matrix having “i-j
entry” given by
n
(ab)ij = aik bkj .
k=1
Since A and B are upper triangular, we have that aik = 0 when i > k and that
bkj = 0 when k > j. Thus, to obtain a non-zero summand aik bkj = 0, we must
have both aik = 0, which implies that i ≤ k, and bkj = 0, which implies that k ≤ j.
In particular, these two conditions are simultaneously satisfiable only when i ≤ j.
Therefore, (ab)ij = 0 when i > j, from which AB is upper triangular.
At the same time, you should be careful not to blithely perform operations on
matrices as you would with numbers. The fact that matrix multiplication is not a
commutative operation should make it clear that significantly more care is required
with matrix arithmetic. As another example, given a positive integer n ∈ Z+ , the
set Fn×n has what are called zero divisors. That is, there exist non-zero matrices
A, B ∈ Fn×n such that AB = 0n×n :
2
01 01 01 00
= = = 02×2 .
00 00 00 00
Moreover, note that there exist matrices A, B, C ∈ Fn×n such that AB = AC but
B = C:
01 10 0 1 01
= 02×2 = .
00 00 0 0 00
As a result, we say that the set Fn×n fails to have the so-called cancellation
property. This failure is a direct result of the fact that there are non-zero matrices
in Fn×n that have no multiplicative inverse. We discuss matrix invertibility at length
in the next section and define a special subset GL(n, F) ⊂ Fn×n upon which the
cancellation property does hold.
One can prove that, if the multiplicative inverse of a matrix exists, then the
inverse is unique. As such, we will usually denote the so-called inverse matrix of
A ∈ GL(n, F) by A−1 . Note that the zero matrix 0n×n ∈ / GL(n, F). This means
that GL(n, F) is not a vector subspace of Fn×n .
Since matrix multiplication is not a commutative operation, care must be taken
when working with the multiplicative inverses of invertible matrices. In particular,
many of the algebraic properties for multiplicative inverses of scalars, when properly
modified, continue to hold. We summarize the most basic of these properties in the
following theorem.
Theorem A.2.9. Let n ∈ Z+ be a positive integer and A, B ∈ GL(n, F). Then
(1) the inverse matrix A−1 ∈ GL(n, F) and satisfies (A−1 )−1 = A.
(2) the matrix power Am ∈ GL(n, F) and satisfies (Am )−1 = (A−1 )m , where m ∈
Z+ is any positive integer.
(3) the matrix αA ∈ GL(n, F) and satisfies (αA)−1 = α−1 A−1 , where α ∈ F is any
non-zero scalar.
(4) the product AB ∈ GL(n, F) and has inverse given by the formula
(AB)−1 = B −1 A−1 .
Moreover, GL(n, F) has the cancellation property. In other words, given any
three matrices A, B, C ∈ GL(n, F), if AB = AC, then B = C.
At the same time, it is important to note that the zero matrix is not the only
non-invertible matrix. As an illustration of the subtlety involved in understanding
invertibility, we give the following theorem for the 2 × 2 case.
a a
Theorem A.2.10. Let A = 11 12 ∈ F2×2 . Then A is invertible if and only if
a21 a22
A satisfies
a11 a22 − a12 a21 = 0.
Moreover, if A is invertible, then
⎡ a22 −a12 ⎤
⎢ a11 a22 − a12 a21 a11 a22 − a12 a21 ⎥
⎢ ⎥
A−1 = ⎢ ⎥.
⎣ −a21 a11 ⎦
a11 a22 − a12 a21 a11 a22 − a12 a21
A more general theorem holds for larger matrices. Its statement requires the no-
tion of determinant and we refer the reader to Chapter 8 for the definition of the
determinant. For completeness, we state the result here.
Theorem A.2.11. Let n ∈ Z+ be a positive integer, and let A = (aij ) ∈ Fn×n be
an n × n matrix. Then A is invertible if and only if det(A) = 0. Moreover, if A
is invertible, then the “i, j entry” of A−1 is Aji / det(A). Here, Aij = (−1)i+j Mij ,
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 142
and Mij is the determinant of the matrix obtained when both the ith row and j th
column are removed from A.
We close this section by noting that the set GL(n, F) of all invertible n × n
matrices over F is often called the general linear group. This set has many
important uses in mathematics and there are several equivalent notations for it,
including GLn (F) and GL(Fn ), and sometimes simply GL(n) or GLn if it is not
important to emphasize the dependence on F. Note that the usage of the term
“group” in the name “general linear group” has a technical meaning: GL(n, F)
forms a group under matrix multiplication, which is non-abelian if n ≥ 2. (See
Section C.2 for the definition of a group.)
There are many ways in which one might try to solve a given system of linear
equations. This section is primarily devoted to describing two particularly popular
techniques, both of which involve factoring the coefficient matrix for the system
into a product of simpler matrices. These techniques are also at the heart of many
frequently used numerical (i.e., computer-assisted) applications of Linear Algebra.
Note that the factorization of complicated objects into simpler components is
an extremely common problem solving technique in mathematics. E.g., we will
often factor a given polynomial into several polynomials of lower degree, and one
can similarly use the prime factorization for an integer in order to simplify certain
numerical computations.
(1) either A(1,·) is the zero vector or the first non-zero entry in A(1,·) (when read
from left to right) is a one.
(2) for i = 1, . . . , m, if any row vector A(i,·) is the zero vector, then each subsequent
row vector A(i+1,·) , . . . , A(m,·) is also the zero vector.
(3) for i = 2, . . . , m, if some A(i,·) is not the zero vector, then the first non-zero
entry (when read from left to right) is a one and occurs to the right of the initial
one in A(i−1,·) .
The initial leading one in each non-zero row is called a pivot. We furthermore say
that A is in reduced row-echelon form (abbreviated RREF) if
(4) for each column vector A(·,j) containing a pivot (j = 2, . . . , n), the pivot is the
only non-zero element in A(·,j) .
The motivation behind Definition A.3.1 is that matrix equations having their
coefficient matrix in RREF (and, in some sense, also REF) are particularly easy to
solve. Note, in particular, that the only square matrix in RREF without zero rows
is the identity matrix.
Example A.3.2. The following matrices are all in REF:
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1111 1110 1101 1100
A1 = ⎣0 1 1 1⎦ , A2 = ⎣0 1 1 0⎦ , A3 = ⎣0 1 1 0⎦ , A4 = ⎣0 0 1 0⎦ ,
0011 0010 0001 0001
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1010 0010 0001 0000
A5 = ⎣0 0 0 1⎦ , A6 = ⎣0 0 0 1⎦ , A7 = ⎣0 0 0 0⎦ , A8 = ⎣0 0 0 0⎦ .
0000 0000 0000 0000
However, only A4 through A8 are in RREF, as you should verify. Moreover, if we
take the transpose of each of these matrices (as defined in Section A.5.1), then only
AT6 , AT7 , and AT8 are in RREF.
Example A.3.3.
(1) Consider the following matrix in RREF:
⎡ ⎤
10 0 0
⎢0 1 0 0⎥
A=⎢ ⎣0 0
⎥.
1 0⎦
00 0 1
Given any vector b = (bi ) ∈ F4 , the matrix equation Ax = b corresponds to the
system of equations
⎫
x1 = b1 ⎪
⎪
⎬
x2 = b2
.
x3 = b3 ⎪
⎪
⎭
x 4 = b4
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 144
must satisfy the matrix equation Ax = b. One can also verify that every solution
to the matrix equation must be of this form. It then follows that the set of all
solutions should somehow be “three dimensional”.
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 145
(1) (row exchange, a.k.a. “row swap”, matrix) E is obtained from the identity
(r,·) (s,·)
matrix Im by interchanging the row vectors Im and Im for some particular
choice of positive integers r, s ∈ {1, 2, . . . , m}. I.e., in the case that r < s,
⎡ ⎤
1 0 0 ··· 0 0 0 ··· 0 0 0 ··· 0
⎢0 1 0 ··· 0 0 0 ··· 0 0 0 ··· 0⎥
⎢ ⎥
⎢0 0 1 ··· 0 0 0 ··· 0 0 0 ··· 0⎥
⎢ ⎥
⎢ . . . . . . . . . . . . .⎥
⎢ . . . .. . . . .. . . . .. . ⎥
⎢. . . . . . . . . .⎥
⎢ ⎥
⎢0 0 0 ··· 1 0 0 ··· 0 0 0 ··· 0⎥
⎢ ⎥
⎢ 0 0 0 · · · 0 0 0 · · · 0 1 0 · · · 0 ⎥ ← rth row
⎢ ⎥
⎢ ⎥
E = ⎢0 0 0 ··· 0 0 1 ··· 0 0 0 ··· 0⎥
⎢ . . . . . . . . . . . . .⎥
⎢ .. .. .. . . .. .. .. . . .. .. .. . . .. ⎥
⎢ ⎥
⎢0 0 0 ··· 0 0 0 ··· 1 0 0 ··· 0⎥
⎢ ⎥
⎢ ⎥
⎢ 0 0 0 · · · 0 1 0 · · · 0 0 0 · · · 0 ⎥ ← s row.
th
⎢ ⎥
⎢0 0 0 ··· 0 0 0 ··· 0 0 1 ··· 0⎥
⎢ ⎥
⎢ .. .. .. . . .. .. .. . . .. .. .. . . .. ⎥
⎣ . . . . . . . . . . . . .⎦
0 0 0 ··· 0 0 0 ··· 0 0 0 ··· 1
(2) (row scaling matrix) E is obtained from the identity matrix Im by replacing
(r,·) (r,·)
the row vector Im with αIm for some choice of non-zero scalar 0 = α ∈ F
and some choice of positive integer r ∈ {1, 2, . . . , m}. I.e.,
⎡ ⎤
1 0 ··· 0 0 0 ··· 0
⎢ 0 1 · · · 0 0 0 · · · 0⎥
⎢ ⎥
⎢. . . . . . . .⎥
⎢ .. .. . . .. .. .. . . .. ⎥
⎢ ⎥
⎢ ⎥
⎢ 0 0 · · · 1 0 0 · · · 0⎥
E = Im + (α − 1)Err = ⎢ ⎥
⎢0 0 · · · 0 α 0 · · · 0⎥ ← rth row,
⎢ ⎥
⎢ 0 0 · · · 0 0 1 · · · 0⎥
⎢ ⎥
⎢ .. .. . . .. .. .. . . .. ⎥
⎣. . . . . . . .⎦
0 0 ··· 0 0 0 ··· 1
where Err is the matrix having “r, r entry” equal to one and all other entries
equal to zero. (Recall that Err was defined in Section A.2.1 as a standard basis
vector for the vector space Fm×m .)
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 146
(3) (row combination, a.k.a. “row sum”, matrix) E is obtained from the identity
(r,·) (r,·) (s,·)
matrix Im by replacing the row vector Im with Im + αIm for some choice
of scalar α ∈ F and some choice of positive integers r, s ∈ {1, 2, . . . , m}. I.e., in
the case that r < s,
⎡ ⎤
1 0 0 ··· 0 0 0 ··· 0 0 0 ··· 0
⎢0 1 0 ··· 0 0 0 ··· 0 0 0 ··· 0⎥
⎢ ⎥
⎢0 0 1 ··· 0 0 0 ··· 0 0 0 ··· 0⎥
⎢ ⎥
⎢ . . . . . . . . . . . . .⎥
⎢ . . . .. . . . .. . . . .. . ⎥
⎢. . . . . . . . . .⎥
⎢ ⎥
⎢0 0 0 ··· 1 0 0 ··· 0 0 0 ··· 0⎥
⎢ ⎥
⎢ 0 0 0 · · · 0 1 0 · · · 0 α 0 · · · 0 ⎥ ← rth row
⎢ ⎥
⎢ ⎥
E = Im + αErs = ⎢ 0 0 0 · · · 0 0 1 · · · 0 0 0 · · · 0 ⎥
⎢ . . . . . . . . . . . . .⎥
⎢ .. .. .. . . .. .. .. . . .. .. .. . . .. ⎥
⎢ ⎥
⎢0 0 0 ··· 0 0 0 ··· 1 0 0 ··· 0⎥
⎢ ⎥
⎢ ⎥
⎢0 0 0 ··· 0 0 0 ··· 0 1 0 ··· 0⎥
⎢ ⎥
⎢0 0 0 ··· 0 0 0 ··· 0 0 1 ··· 0⎥
⎢ ⎥
⎢ .. .. .. . . .. .. .. . . .. .. .. . . .. ⎥
⎣ . . . . . . . . . . . . .⎦
0 0 0 ··· 0 0 0 ··· 0 0 0 ··· 1
↑
sth column
where Ers is the matrix having “r, s entry” equal to one and all other entries
equal to zero. (Ers was also defined in Section A.2.1 as a standard basis vector
for Fm×m .)
The “elementary” in the name “elementary matrix” comes from the correspon-
dence between these matrices and so-called “elementary operations” on systems of
equations. In particular, each of the elementary matrices is clearly invertible (in
the sense defined in Section A.2.3), just as each “elementary operation” is itself
completely reversible. We illustrate this correspondence in the following example.
Example A.3.5. Define A, x, and b by
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
253 x1 4
A = ⎣1 2 3⎦ , x = ⎣x2 ⎦ , and b = ⎣5⎦ .
108 x3 9
We illustrate the correspondence between elementary matrices and “elementary”
operations on the system of linear equations corresponding to the matrix equation
Ax = b, as follows.
To begin solving this system, one might want to either multiply the first equa-
tion through by 1/2 or interchange the first equation with one of the other equa-
tions. From a computational perspective, it is preferable to perform an interchange
since multiplying through by 1/2 would unnecessarily introduce fractions. Thus,
we choose to interchange the first and second equation in order to obtain
Definition A.3.6. The system of linear equations, System (A.1), is called a ho-
mogeneous system if the right-hand side of each equation is zero. In other words,
a homogeneous system corresponds to a matrix equation of the form
Ax = 0,
where A ∈ F m×n
is an m × n matrix and x is an n-tuple of unknowns. We also call
the set
N = {v ∈ Fn | Av = 0}
the solution space for the homogeneous system corresponding to Ax = 0.
When describing the solution space for a homogeneous linear system, there are
three important cases to keep in mind:
Definition A.3.7. The system of linear equations System (A.1) is called
(1) overdetermined if m > n.
(2) square if m = n.
(3) underdetermined if m < n.
In particular, we can say a great deal about underdetermined homogeneous systems,
which we state as a corollary to the following more general result.
Theorem A.3.8. Let N be the solution space for the homogeneous linear system
corresponding to the matrix equation Ax = 0, where A ∈ Fm×n . Then
(1) the zero vector 0 ∈ N .
(2) N is a subspace of the vector space Fn .
This is an amazing theorem. Since N is a subspace of Fn , we know that either
N will contain exactly one element (namely, the zero vector) or N will contain
infinitely many elements.
Corollary A.3.9. Every homogeneous system of linear equations is solved by the
zero vector. Moreover, every underdetermined homogeneous system has infinitely
many solutions.
We call the zero vector the trivial solution for a homogeneous linear system. The
fact that every homogeneous linear system has the trivial solution thus reduces
solving such a system to determining if solutions other than the trivial solution
exist.
One method for finding the solution space of a homogeneous system is to first
use Gaussian elimination (as demonstrated in Example A.3.5) in order to factor
the coefficient matrix of the system. Then, because the original linear system is
homogeneous, the homogeneous system corresponding to the resulting RREF matrix
will have the same solutions as the original system. In other words, if a given matrix
A satisfies
Ek Ek−1 · · · E0 A = A0 ,
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 151
can solve for the leading variable in order to obtain x1 = −x2 − x3 . It follows
that, given any scalars α, β ∈ F, every vector of the form
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
x1 −α − β −1 −1
x = ⎣x2 ⎦ = ⎣ α ⎦ = α ⎣ 1 ⎦ + β ⎣ 0 ⎦
x3 β 0 1
is a solution to Ax = 0. Therefore,
4 5
N = (x1 , x2 , x3 ) ∈ F3 | x1 + x2 + x3 = 0 = span ((−1, 1, 0), (−1, 0, 1)) .
We are again assuming that the diagonal entries of A are all non-zero. Then, acting
similarly to back substitution, we can substitute the solution for x1 into the second
equation in order to obtain
b2 − a21 x1
x2 = .
a22
Continuing this process, we have created a forward substitution procedure. In
particular,
#n−1
bn − k=1 ank xk
xn = .
ann
More generally, suppose that A ∈ Fn×n is an arbitrary square matrix for which
there exists a lower triangular matrix L ∈ Fn×n and an upper triangular matrix
U ∈ Fn×n such that A = LU . When such matrices exist, we call A = LU an
LU-factorization (a.k.a. LU-decomposition) of A. The benefit of such a fac-
torization is that it allows us to exploit the triangularity of L and U when solving
linear systems having coefficient matrix A.
To see this, suppose that A = LU is an LU-factorization for the matrix A ∈ Fn×n
and that b ∈ Fn is a column vector. (As above, we also assume that none of the
diagonal entries in either L or U is zero.) Furthermore, set y = U x, where x is the
as yet unknown solution of Ax = b. Then, by substitution, y must satisfy
Ly = b.
Then, since L is lower triangular, we can immediately solve for y via forward sub-
stitution. In other words, we are using the associativity of matrix multiplication
(cf. Theorem A.2.6) in order to conclude that
U x = y.
In general, one can only obtain an LU-factorization for a matrix A ∈ Fn×n when
there exist elementary “row combination” matrices E1 , E2 , . . . , Ek ∈ Fn×n and an
upper triangular matrix U such that
Ek Ek−1 · · · E1 A = U.
There are various generalizations of LU-factorization that allow for more than just
elementary “row combinations” matrices in this product, but we do not mention
them here. Instead, we provide a detailed example that illustrates how to obtain an
LU-factorization and then how to use such a factorization in solving linear systems.
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 157
must satisfy
⎡ ⎤ ⎡ ⎤
2u11 + 3u21 + 4u31 2u12 + 3u22 + 4u32 2u13 + 3u23 + 4u33 100
⎣ −u21 + 3u31 −u22 + 3u32 −u23 + 3u33 ⎦= U −1 U= I3= ⎣0 1 0⎦ ,
−2u31 −2u32 −2u33 001
in the nine variables u11 , u12 , . . . , u33 . Since this linear system has an upper trian-
gular coefficient matrix, we can apply back substitution in order to directly solve
for the entries in U .
The only condition we imposed upon the triangular matrices above was that
all diagonal entries were non-zero. Since the determinant of a triangular matrix is
given by the product of its diagonal entries, this condition is necessary and sufficient
for a triangular matrix to be non-singular. Moreover, once the inverses of both L
and U in an LU-factorization have been obtained, we can immediately calculate the
inverse for A = LU by applying Theorem A.2.9(4):
This section is devoted to illustrating how linear maps are a fundamental tool for
gaining insight into the solutions to systems of linear equations with n unknowns.
Using the tools of Linear Algebra, many familiar facts about systems with two
unknowns can be generalized to an arbitrary number of unknowns without much
effort.
the matrix A ∈ Fm×n associated to the linear map T ∈ L(Fn , Fm ) is the canonical
matrix for T . With this choice of bases we have
T (x) = Ax, ∀ x ∈ Fn . (A.5)
In other words, one can compute the action of the linear map upon any vector in
Fn by simply multiplying the vector by the associated canonical matrix A. There
are many circumstances in which one might wish to use non-canonical bases for
either Fn or Fm , but the trade-off is that Equation (A.5) will no longer hold as
stated. (To modify Equation (A.5) for use with non-standard bases, one needs to
use coordinate vectors as described in Chapter 10.)
The utility of Equation (A.5) cannot be over-emphasized. To get a sense of this,
consider once again the generic matrix equation (Equation (A.3))
Ax = b,
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 160
which involves a given matrix A = (aij ) ∈ Fm×n , a given vector b ∈ Fm , and the
n-tuple of unknowns x. To provide a solution to this equation means to provide a
vector x ∈ Fn for which the matrix product Ax is exactly the vector b. In light of
Equation (A.5), the question of whether such a vector x ∈ Fn exists is equivalent
to asking whether or not the vector b is in the range of the linear map T .
The encoding of System (A.1) into Equation (A.3) is more than a mere change
of notation. The reinterpretation of Equation (A.3) using linear maps is a genuine
change of viewpoint. Solving System (A.1) (and thus Equation (A.3)) essentially
amounts to understanding how m distinct objects interact in an ambient space of
dimension n. (In particular, solutions to System (A.1) correspond to the points of
intersect of m hyperplanes in Fn .) On the other hand, questions about a linear map
involve understanding a single object, i.e., the linear map itself. Such a point of
view is both extremely flexible and fruitful, as we illustrate in the next section.
In addition, note that T is a bijective function. (This can be proven, for example,
by noting that the canonical matrix A for T is invertible.) Since T is bijective, this
means that
x1 1/3
x= =
x2 −2/3
is the only possible input vector that can result in the output vector b, and so we
have verified that x is the unique solution to the original linear system. Moreover,
this technique can be trivially generalized to any number of equations.
Example A.4.2. Consider the matrix A and the column vectors x and b from
Example A.3.5:
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
2 5 3 x1 4
A = ⎣1 2 3⎦ , x = ⎣x2 ⎦ , and b = ⎣5⎦ .
1 0 8 x3 9
Here, asking if the equation Ax = b has a solution is equivalent to asking if b is an
element of the range of the linear map T : F3 → F3 defined by
⎛⎡ ⎤ ⎞ ⎡ ⎤
x1 2x1 + 5x2 + 3x3
T ⎝⎣x2 ⎦⎠ = ⎣ x1 + 2x2 + 3x3 ⎦ .
x3 2x1 + 8x3
In order to answer this corresponding question regarding the range of T , we take a
closer look at the following expression obtained in Example A.3.5:
A = E0−1 E1−1 · · · E7−1 .
Here, we have factored A into the product of eight elementary matrices. From the
linear map point of view, this means that we can apply the results of Section 6.6 in
order to obtain the factorization
T = S0 ◦ S1 ◦ · · · ◦ S7 ,
where Si is the (invertible) linear map having canonical matrix Ei−1 for i = 0, . . . , 7.
In the above examples, we used the bijectivity of a linear map in order to prove
the uniqueness of solutions to linear systems. As discussed in Section A.3, many
linear systems do not have unique solutions. Instead, there are exactly two other
possibilities: if a linear system does not have a unique solution, then it will either
have no solution or it will have infinitely many solutions. Fundamentally, this is
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 162
000
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 163
In this section, we define three important operations on matrices called the trans-
pose, conjugate transpose, and the trace. These will then be seen to interact with
matrix multiplication and invertibility in order to form special classes of matrices
that are extremely important to applications of Linear Algebra.
Another motivation for defining the transpose and conjugate transpose opera-
tions is that they allow us to define several very special classes of matrices.
Definition A.5.3. Given a positive integer n ∈ Z+ , we say that the square matrix
A ∈ Fn×n
(1) is symmetric if A = AT .
(2) is Hermitian if A = A∗ .
(3) is orthogonal if A ∈ GL(n, R) and A−1 = AT . Moreover, we define the (real)
orthogonal group to be the set O(n) = {A ∈ GL(n, R) | A−1 = AT }.
(4) is unitary if A ∈ GL(n, C) and A−1 = A∗ . Moreover, we define the (complex)
unitary group to be the set U (n) = {A ∈ GL(n, C) | A−1 = A∗ }.
A lot can be said about these classes of matrices. Both O(n) and U (n), for exam-
ple, form a group under matrix multiplication. Additionally, real symmetric and
complex Hermitian matrices always have real eigenvalues. Moreover, given any ma-
trix A ∈ Rm×n , AAT is a symmetric matrix with real, non-negative eigenvalues.
Similarly, for A ∈ Cm×n , AA∗ is Hermitian with real, non-negative eigenvalues.
Calculational Exercises
(1) In each of the following, find matrices A, x, and b such that the given system
of linear equations can be expressed as the single matrix equation Ax = b.
⎫
⎫ 4x1 − 3x3 + x4 = 1 ⎪
2x1 − 3x2 + 5x3 = 7 ⎬ ⎪
⎬
5x1 + x2 − 8x4 = 3
(a) 9x1 − x2 + x3 = −1 ⎭ (b) 2x − 5x + 9x − x = 0 ⎪
x1 + 5x2 + 4x3 = 0
1 2 3 4 ⎪
⎭
3x2 − x3 + 7x4 = 2
(2) In each of the following, express the matrix equation as a system of linear
equations.
⎡ ⎤⎡ ⎤ ⎡ ⎤
⎡ ⎤⎡ ⎤ ⎡ ⎤ 3 −2 0 1 w 0
3 −1 2 x1 2 ⎢ 5 0 2 −2 ⎥ ⎢ x ⎥ ⎢ 0 ⎥
(a) ⎣ 4 3 7 ⎦ ⎣ x2 ⎦ = ⎣ −1 ⎦ (b) ⎢ ⎥⎢ ⎥ ⎢ ⎥
⎣ 3 1 4 7⎦⎣ y⎦ = ⎣0⎦
−2 1 5 x3 4
−2 5 1 6 z 0
(3) Suppose that A, B, C, D, and E are matrices over F having the following sizes:
A is 4 × 5, B is 4 × 5, C is 5 × 2, D is 4 × 2, E is 5 × 4.
Determine whether the following matrix expressions are defined, and, for those
that are defined, determine the size of the resulting matrix.
(a) BA (b) AC + D (c) AE + B (d) AB + B (e) E(A + B) (f) E(AC)
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 167
⎡ ⎤ ⎡ ⎤
152 613
D = ⎣ −1 0 1 ⎦, and E = ⎣ −1 1 2 ⎦.
324 413
Determine whether the following matrix expressions are defined, and, for those
that are defined, compute the resulting matrix.
(a) 2AT + C (b) DT − E T (c) (D − E)T
T
(d) B + 5C T
(e) 2 C − 4 A
1 T 1
(f) B − B T
(g) 3E − 3D
T T
(h) (2E − 3D )
T T T
(i) CC T
(j) (DA) T T
(k) (C B)A T
(l) (2DT − E)A
(m) (BAT − 2C)T (n) B T (CC T − AT A) (o) DT E T − (ED)T
(p) trace(DDT ) (q) trace(4E T − D) (r) trace(C T AT + 2E T )
Proof-Writing Exercises
(1) Let n ∈ Z+ be a positive integer and ai,j ∈ F be scalars for i, j = 1, . . . , n.
Prove that the following two statements are equivalent:
(a) The trivial solution x1 = · · · = xn = 0 is the only solution to the homoge-
neous system of equations
⎫
n
⎪
a1,k xk = 0 ⎪
⎪
⎪
⎪
⎪
⎪
k=1 ⎬
..
. ⎪.
⎪
⎪
n
⎪
an,k xk = 0 ⎪
⎪
⎪
⎭
k=1
EtU_final_v6.indd 358
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 171
Appendix B
B.1 Sets
(1) The empty set (a.k.a. the null set), is what it sounds like: the set with no
elements. We usually denote it by ∅ or sometimes by { }. The empty set, ∅,
is uniquely determined by the property that for all x we have x ∈ ∅. Clearly,
there is exactly one empty set.
(2) Next up are the singletons. A singleton is a set with exactly one element. If
171
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 172
When introducing a new set (new for the purpose of the discussion at hand) it is
crucial to define it unambiguously. It is not required that from a given definition of
a set A, it is easy to determine what the elements of A are, or even how many there
are, but it should be clear that, in principle, there is a unique and unambiguous
answer to each question of the form “is x an element of A?”. There are several
common ways to define sets. Here are a few examples.
Example B.1.2.
(1) The simplest way is a generalization of the list notation to infinite lists that can
be described by a pattern. E.g., the set of positive integers N = {1, 2, 3, . . .}.
The list can be allowed to be bi-directional, as in the set of all integers
Z = {. . . , −2, −1, 0, 1, 2, . . .}. Note the use of triple dots . . . to indicate the
continuation of the list.
(2) The so-called set builder notation gives more options to describe the mem-
bership of a set. E.g., the set of all even integers, often denote by 2Z, is defined
by
2Z = {2a | a ∈ Z} .
Instead of the vertical bar, |, a colon, :, is also commonly used. For example,
the open interval of the real numbers strictly between 0 and 1 is defined by
(0, 1) = {x ∈ R : 0 < x < 1}.
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 173
In addition to constructing sets directly, sets can also be obtained from other
sets by a number of standard operations. The following definition introduces the
basic operations of taking the union, intersection, and difference of sets.
Definition B.2.3. Let A and B be sets. Then
(1) The union of A and B, denoted by A ∪ B, is defined by
A ∪ B = {x | x ∈ A or x ∈ B}.
(2) The intersection of A and B, denoted by A ∩ B, is defined by
A ∩ B = {x | x ∈ A and x ∈ B}.
(3) The set difference of B from A, denoted by A \ B, is defined by
A \ B = {x | x ∈ A and x ∈
/ B}.
Often, the context provides a ‘universe’ of all possible elements pertinent to a
given discussion. Suppose, we have given such a set of ‘all’ elements and let us call
it U . Then, the complement of a set A, denoted by Ac , is defined as Ac = U \ A.
In the following theorem the existence of a universe U is tacitly assumed.
Theorem B.2.4. Let A, B, and C be sets. Then
To familiarize yourself with the basic properties of sets and the basic operations
of sets, it is a good exercise to write proofs for the three properties stated in the
theorem.
The so-called Cartesian product of sets is a powerful and ubiquitous method
to construct new sets out of old ones.
Definition B.2.5. Let A and B be sets. Then the Cartesian product of A and
B, denoted by A × B, is the set of all ordered pairs (a, b), with a ∈ A and b ∈ B.
In other words,
A × B = {(a, b) | a ∈ A, b ∈ B} .
B.3 Relations
In this section we introduce two important types of relations: order relations and
equivalence relations. A relation R between elements of a set A and elements of
a set B is a subset of their Cartesian product: R ⊂ A × B. When A = B, we also
call R simply a relation on A.
Let A be a set and R a relation on A. Then,
The notion of subset is an example of an order relation. To see this, first define
the power set of a set A as the set of all its subsets. It is often denoted by P(A).
So, for any set A, P(A) = {B : B ⊂ A}. Then, the inclusion relation is defined as
the relation R by setting
R = {(B, C) ∈ P(A) × P(A) | B ⊂ C}.
Important relations, such as the subset relation, are given a convenient notation of
the form a <symbol> b, to denote (a, b) ∈ R. The symbol for the inclusion relation
is ⊂.
B.4 Functions
The pre-image of b ∈ B is the subset of all a ∈ A that have b as their image. This
subset is often denoted by f −1 (b).
Appendix C
When we are presented with the set of real numbers, say S = R, we expect a great
deal of “structure” given on S. E.g., given any two real numbers r1 , r2 ∈ R, one
can form the sum r1 + r2 , the difference r1 − r2 , the product r1 r2 , the quotient
r1 /r2 (assuming r2 = 0), the maximum max{r1 , r2 }, the minimum min{r1 , r2 }, the
average (r1 + r2 )/2, and so on. Each of these operations follows the same pattern:
take two real numbers and “combine” (or “compare”) them in order to form a new
real number. Such operations are called binary operations. In general, a binary
operation on an arbitrary non-empty set is defined as follows.
Example C.1.2.
(1) Addition, subtraction, and multiplication are all examples of familiar binary
177
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 178
Even though one could define any number of binary operations upon a given
non-empty set, we are generally only interested in operations that satisfy additional
“arithmetic-like” conditions. In other words, the most interesting binary operations
are those that share the salient properties of common binary operations like addition
and multiplication on R. We make this precise with the definition of a “group” in
Section C.2.
In addition to binary operations defined on pairs of elements in the set S, one
can also define operations that involve elements from two different sets. Here is an
important example.
Definition C.1.3. A scaling operation (a.k.a. external binary operation) on
a non-empty set S is any function that has as its domain F × S and as its codomain
S, where F denotes an arbitrary field. (As usual, you should just think of F as being
either R or C.)
In other words, a scaling operation on S is any rule f : F × S → S that assigns
exactly one element f (α, s) ∈ S to each pair of elements α ∈ F and s ∈ S. As such,
f (α, s) is often written simply as αs. We illustrate this definition in the following
examples.
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 179
Example C.1.4.
In other words, given any α ∈ R and any n-tuple (x1 , . . . , xn ) ∈ Rn , their scalar
multiplication results in a new n-tuple denoted by α(x1 , . . . , xn ). This new
n-tuple is virtually identical to the original, each component having just been
“rescaled” by α.
(2) Scalar multiplication of continuous functions is another familiar scaling oper-
ation. Given any real number α ∈ R and any function f ∈ C(R), their scalar
multiplication results in a new function that is denoted by αf , where αf is
defined by the rule
(4) Strictly speaking, there is nothing in the definition that precludes S from
equalling F. Consequently, addition, subtraction, and multiplication can all
be seen as examples of scaling operations on R.
We begin this section with the following definition, which is one of the most funda-
mental and ubiquitous algebraic structures in all of mathematics.
Definition C.2.1. Let G be a non-empty set, and let ∗ be a binary operation on
G. (In other words, ∗ : G × G → G is a function with ∗(a, b) denoted by a ∗ b, for
each a, b ∈ G.) Then G is said to form a group under ∗ if the following three
conditions are satisfied:
You should recognize these three conditions (which are sometimes collectively
referred to as the group axioms) as properties that are satisfied by the operation of
addition on R. This is not an accident. In particular, given real numbers α, β ∈ R,
the group axioms form the minimal set of assumptions needed in order to solve the
equation x + α = β for the variable x, and it is in this sense that the group axioms
are an abstraction of the most fundamental properties of addition of real numbers.
A similar remark holds regarding multiplication on R \ {0} and solving the
equation αx = β for the variable x. Note, however, that this cannot be extended
to all of R.
The familiar property of addition of real numbers that a + b = b + a, is not part
of the group axioms. When it holds in a given group G, the following definition
applies.
Definition C.2.2. Let G be a group under binary operation ∗. Then G is called an
abelian group (a.k.a. commutative group) if, given any two elements a, b ∈ G,
a ∗ b = b ∗ a.
We now give some of the more important examples of groups that occur in Linear
Algebra, but note that these examples far from exhaust the variety of groups studied
in other branches of mathematics.
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 181
Example C.2.3.
(1) If G ∈ {Z, Q, R, C}, then G forms an abelian group under the usual definition
of addition.
Note, though, that the set Z+ of positive integers does not form a group under
addition since, e.g., it does not contain an additive identity element.
(2) Similarly, if G ∈ { Q \ {0}, R \ {0}, C \ {0}}, then G forms an abelian group
under the usual definition of multiplication.
Note, though, that Z \ {0} does not form a group under multiplication since
only ±1 have multiplicative inverses.
(3) If m, n ∈ Z+ are positive integers and F denotes either R or C, then the set
Fm×n of all m × n matrices forms an abelian group under matrix addition.
Note, though, that Fm×n does not form a group under matrix multiplication
unless m = n = 1, in which case F1×1 = F.
(4) Similarly, if n ∈ Z+ is a positive integer and F denotes either R or C, then the set
GL(n, F) of invertible n×n matrices forms a group under matrix multiplications.
This group, which is often called the general linear group, is non-abelian
when n ≥ 2.
Note, though, that GL(n, F) does not form a group under matrix addition for
any choice of n since, e.g., the zero matrix 0n×n ∈ / GL(n, F).
In the above examples, you should notice two things. First of all, it is important
to specify the operation under which a set might or might not be a group. Second,
and perhaps more importantly, all but one example is an abelian group. Most
of the important sets in Linear Algebra possess some type of algebraic structure,
and abelian groups are the principal building block of virtually every one of these
algebraic structures. In particular, fields and vector spaces (as defined below) and
rings and algebra (as defined in Section C.3) can all be described as “abelian groups
plus additional structure”.
Given an abelian group G, adding “additional structure” amounts to imposing
one or more additional operations on G such that each new operation is “compat-
ible” with the preexisting binary operation on G. As our first example of this, we
add another binary operation to G in order to obtain the definition of a field:
Definition C.2.4. Let F be a non-empty set, and let + and ∗ be binary operations
on F . Then F forms a field under + and ∗ if the following three conditions are
satisfied:
a ∗ (b + c) = a ∗ b + a ∗ c.
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 182
You should recognize these three conditions (which are sometimes collectively
referred to as the field axioms) as properties that are satisfied when the operations
of addition and multiplication are taken together on R. This is not an accident. As
with the group axioms, the field axioms form the minimal set of assumptions needed
in order to abstract fundamental properties of these familiar arithmetic operations.
Specifically, the field axioms guarantee that, given any field F , three conditions are
always satisfied:
(1) Given any a, b ∈ F , the equation x + a = b can be solved for the variable x.
(2) Given any a ∈ F \ {0} and b ∈ F , the equation a ∗ x = b can be solved for x.
(3) The binary operation ∗ (which is like multiplication on R) can be distributed
over (i.e., is “compatible” with) the binary operation + (which is like addition
on R).
Example C.2.5. It should be clear that, if F ∈ {Q, R, C}, then F forms a field
under the usual definitions of addition and multiplication.
Note, though, that the set Z of integers does not form a field under these oper-
ations since Z \ {0} fails to form a group under multiplication. Similarly, none of
the other sets from Example C.2.3 can be made into a field.
The fields Q, R, and C are familiar as commonly used number systems. There
are many other interesting and useful examples of fields, but those will not be used
in this book.
We close this section by introducing a special type of scaling operation called
scalar multiplication. Recall that F can be replaced with either R or C.
Definition C.2.6. Let S be a non-empty set, and let ∗ be a scaling operation on
S. (In other words, ∗ : F × S → S is a function with ∗(α, s) denoted by α ∗ s or
even just αs, for every α ∈ F and s ∈ S.) Then ∗ is called scalar multiplication
if it satisfies the following two conditions:
Note that we choose to have the multiplicative part of F “act” upon S because
we are abstracting scalar multiplication as it is intuitively defined in Example C.1.4
on both Rn and C(R). This is because, by also requiring a “compatible” additive
structure (called vector addition), we obtain the following alternate formulation
for the definition of a vector space.
Definition C.2.7. Let V be an abelian group under the binary operation +, and
let ∗ be a scalar multiplication operation on V with respect to F. Then V forms
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 183
a vector space over F with respect to + and ∗ if the following two conditions are
satisfied:
α ∗ (u + v) = α ∗ u + α ∗ v.
(α + β) ∗ v = α ∗ v + β ∗ v.
In this section, we briefly mention two other common algebraic structures. Specif-
ically, we first “relax” the definition of a field in order to define a ring, and we
then combine the definitions of ring and vector space in order to define an alge-
bra. Groups, rings, and fields are the most fundamental algebraic structures, with
vector spaces and algebras being particularly important within the study of Linear
Algebra and its applications.
Definition C.3.1. Let R be a non-empty set, and let + and ∗ be binary operations
on R. Then R forms an (associative) ring under + and ∗ if the following three
conditions are satisfied:
a ∗ (b + c) = a ∗ b + a ∗ c and (a + b) ∗ c = a ∗ c + b ∗ c.
As with the definition of group, there are many additional properties that can be
added to a ring; here, each additional property makes a ring more field-like in some
way.
Definition C.3.2. Let R be a ring under the binary operations + and ∗. Then we
call R
In particular, note that a commutative ring with identity is almost a field; the
only thing missing is the assumption that every element has a multiplicative inverse.
It is this one difference that results in many familiar sets being CRIs (or at least
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 184
unital rings) but not fields. E.g., Z is a CRI under the usual operations of addition
and multiplication, yet, because of the lack of multiplicative inverses for all elements
except ±1, Z is not a field.
In some sense, Z is the prototypical example of a ring, but there are many other
familiar examples. E.g., if F is any field, then the set of polynomials F [z] with
coefficients from F is a CRI under the usual operations of polynomial addition and
multiplication, but again, because of the lack of multiplicative inverses for every
element, F [z] is itself not a field. Another important example of a ring comes from
Linear Algebra. Given any vector space V , the set L(V ) of all linear maps from V
into V is a unital ring under the operations of function addition and composition.
However, L(V ) is not a CRI unless dim(V ) ∈ {0, 1}.
Alternatively, if a ring R forms a group under ∗ (but not necessarily an abelian
group), then R is sometimes called a skew field (a.k.a. division ring). Note that
a skew field is also almost a field; the only thing missing is the assumption that
multiplication is commutative. Unlike CRIs, though, there are no simple examples
of skew fields that are not also fields.
We close this section by defining the concept of an algebra over a field. In
essence, an algebra is a vector space together with a “compatible” ring structure.
Consequently, anything that can be done with either a ring or a vector space can
also be done with an algebra.
Definition C.3.3. Let A be a non-empty set, let + and × be binary operations
on A, and let ∗ be scalar multiplication on A with respect to F. Then A forms an
(associative) algebra over F with respect to +, ×, and ∗ if the following three
conditions are satisfied:
Appendix D
Binary Relations
= (the equals sign) means “is the same as” and was first introduced in the 1557
book The Whetstone of Witte by physician and mathematician Robert Recorde
(c. 1510–1558). He wrote, “I will sette as I doe often in woorke use, a paire of
parralles, or Gemowe lines of one lengthe, thus: =====, bicause noe 2 thynges
can be moare equalle.” (Recorde’s equals sign was significantly longer than the
one in modern usage and is based upon the idea of “Gemowe” or “identical”
lines, where “Gemowe” means “twin” and comes from the same root as the
name of the constellation “Gemini”.)
Robert Recorde also introduced the plus sign, “+”, and the minus sign, “−”,
in The Whetstone of Witte.
< (the less than sign) means “is strictly less than”, and > (the greater than
sign) means “is strictly greater than”. These first appeared in the book Ar-
tis Analyticae Praxis ad Aequationes Algebraicas Resolvendas (“The Analyti-
cal Arts Applied to Solving Algebraic Equations”) by mathematician and as-
tronomer Thomas Harriot (1560–1621), which was published posthumously in
1631.
Pierre Bouguer (1698–1758) later refined these to ≤ (“is less than or equals”)
and ≥ (“is greater than or equals”) in 1734. Bouger is sometimes called “the
father of naval architecture” due to his foundational work in the theory of naval
navigation.
:= (the equal by definition sign) means “is equal by definition to”. This is a
common alternate form of the symbol “=Def ”, the latter having first appeared in
the 1894 book Logica Matematica by logician Cesare Burali-Forti (1861–1931).
185
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 186
def
Other common alternate forms of the symbol “=Def ” include “ =” and “≡”,
with “≡” being especially common in applied mathematics.
≈ (the approximately equals sign) means “is approximately equal to” and
was first introduced in the 1892 book Applications of Elliptic Functions by
mathematician Alfred Greenhill (1847–1927).
.
Other modern symbols for “approximately equals” include “=” (read as “is
nearly equal to”), “∼
=” (read as “is congruent to”), “” (read as “is similar to”),
“ ” (read as “is asymptotically equal to”), and “∝” (read as “is proportional
to”). Usage varies, and these are sometimes used to denote varying degrees of
“approximate equality” within a given context.
∴ (three dots) means “therefore” and first appeared in print in the 1659 book
Teusche Algebra (“Teach Yourself Algebra”) by mathematician Johann Rahn
(1622–1676).
Teusche Algebra also contains the first use of the obelus, “÷”, to denote
division.
∵ (upside-down dots) means “because” and seems to have first appeared in
the 1805 book The Gentleman’s Mathematical Companion. However, it is
much more common (and less ambiguous) to just abbreviate “because” as
“b/c”.
" (the such that sign) means “under the condition that” and first appeared
in the 1906 edition of Formulaire de mathématiques by the logician Giuseppe
Peano (1858–1932). However, it is much more common (and less ambiguous)
to just abbreviate “such that” as “s.t.”.
There are two good reasons to avoid using “"” in place of “such that”. First
of all, the abbreviation “s.t.” is significantly more suggestive of its meaning
than is “"”. More importantly, the symbol “"” is now commonly used to
mean “contains as an element”, which is a logical extension of the usage of
the standard symbol “∈” to mean “is contained as an element in”.
⇒ (the implies sign) means “logically implies that”, and ⇐ (the is implied
by sign) means “is logically implied by”. Both have an unclear historical
origin. (E.g., “if it’s raining, then it’s pouring” is equivalent to saying “it’s
raining ⇒ it’s pouring.”)
⇐⇒ (the iff symbol) means “if and only if” (abbreviated “iff”) and is used to
connect two logically equivalent mathematical statements. (E.g., “it’s raining
iff it’s pouring” means simultaneously that “if it’s raining, then it’s pouring”
and that “if it’s pouring, then it’s raining”. In other words, the statement
“it’s raining ⇐⇒ it’s pouring” means simultaneously that “it’s raining ⇒
it’s pouring” and “it’s raining ⇐ it’s pouring”.)
The abbreviation “iff” is attributed to the mathematician Paul Halmos
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 187
(1916–2006).
∀ (the universal quantifier) means “for all” and was first used in the 1935
publication Untersuchungen über das logische Schliessen (“Investigations on
Logical Reasoning”) by logician Gerhard Gentzen (1909–1945). He called it
the All-Zeichen (“all character”) by analogy to the symbol “∃”, which means
“there exists”.
∃ (the existential quantifier) means “there exists” and was first used in the
1897 edition of Formulaire de mathématiques by the logician Giuseppe Peano
(1858–1932).
(the Halmos tombstone or Halmos symbol) means “Q.E.D.”, which is
an abbreviation of the Latin phrase quod erat demonstrandum (“which was
to be proven”). “Q.E.D.” has been the most common way to symbolize the
end of a logical argument for many centuries, but the modern convention of
the “tombstone” is now generally preferred both because it is easier to write
and because it is visually more compact.
The symbol “” was first made popular by mathematician Paul Halmos
(1916–2006).
⊂ (the is included in sign) means “is a subset of” and ⊃ (the includes sign)
means “has as a subset”. Both symbols were introduced in the 1890 book
Vorlesungen über die Algebra der Logik (“Lectures on the Algebra of the Logic”)
by logician Ernst Schröder (1841–1902).
∈ (the is in sign) means “is an element of” and first appeared in the 1895 edition
of Formulaire de mathématiques by the logician Giuseppe Peano (1858–1932).
Peano originally used the Greek letter “” (viz. the first letter of the Latin word
est for “is”). The modern stylized version of this symbol was later introduced in
the 1903 book Principles of Mathematics by logician and philosopher Betrand
Russell (1872–1970).
It is also common to use the symbol “"” to mean “contains as an element”,
which is not to be confused with the more archaic usage of “"” to mean “such
that”.
∪ (the union sign) means “take the elements that are in either set”, and ∩ (the
intersection sign) means “take the elements that the two sets have in com-
mon”. These were both introduced in the 1888 book Calcolo geometrico sec-
ondo l’Ausdehnungslehre di H. Grassmann preceduto dalle operazioni della log-
ica deduttiva (“Geometric Calculus based upon the teachings of H. Grassman,
preceded by the operations of deductive logic”) by logician Giuseppe Peano
(1858–1932).
∅ (the null set or empty set) means “the set without any elements in it” and
was first used in the 1939 book Éléments de mathématique by Nicolas Bour-
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 188
1800).
The number γ is widely considered to be the sixth most important number in
mathematics due to its frequent appearance in formulas from number theory
and applied mathematics. However, as of this writing, it is still not even known
whether or not γ is an irrational number.
i.e. (id est) means “that is” or “in other words”. (It is used to paraphrase
a statement that was just made, not to mean “for example”, and is
always followed by a comma.)
e.g. (exempli gratia) means “for example”. (It is usually used to give an
example of a statement that was just made and is always followed by
a comma.)
viz. (videlicet) means “namely” or “more specifically”. (It is used to clarify
a statement that was just made by providing more information and is
never followed by a comma.)
etc. (et cetera) means “and so forth” or “and so on”. (It is used to sug-
gest that the reader should infer further examples from a list that has
already been started and is usually not followed by a comma.)
et al. (et alii ) means “and others”. (It is used in place of listing multiple
authors past the first. The abbreviation “et al.” can also be used in
place of et alibi, which means “and elsewhere”.
cf. (conferre) means “compare to” or “see also”. (It is used either to draw
a comparison or to refer the reader to somewhere else that they can
find more information, and it is never followed by a comma.)
q.v. (quod vide) means “which see” or “go look it up if you’re interested”.
(It is used to cross-reference a different written work or a different part
of the same written work, and it is never followed by a comma.) The
plural form of “q.v.” is “q.q.”
v.s. (vide supra) means “see above”. (It is used to imply that more infor-
mation can be found before the current point in a written work and is
never followed by a comma.)
N.B. (Nota Bene) means “note well” or “pay attention to the following”.
(It is used to imply that the wise reader will pay especially careful
attention to what follows and is never followed by a comma. Cf. the
abbreviation “verb. sap.”)
verb. sap. (verbum sapienti sat est) means “a word to the wise is enough” or
“enough has already been said”. (It is used to imply that, while some-
thing may still be left unsaid, enough has been said for the reader to
infer the entire meaning.)
vs. (versus) means “against” or “in contrast to”. (It is used to contrast two
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 190
Index
191
November 2, 2015 14:50 ws-book961x669 Linear Algebra: As an Introduction to Abstract Mathematics 9808-main page 192
Index 193
Index 195
Index 197