Understanding Linear Algebra Concepts
Understanding Linear Algebra Concepts
Linear Algebra
→ → 4 Figure 2.1
x+y Different types of
2
vectors. Vectors can
0 be surprising
objects, including
y
→ −2 (a) geometric
x → vectors
y −4
and (b) polynomials.
−6
−2 0 2
x
17
This material is published by Cambridge University Press as Mathematics for Machine Learning by
Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon Ong (2020). This version is free to view
and download for personal use only. Not for re-distribution, re-sale, or use in derivative works.
©by M. P. Deisenroth, A. A. Faisal, and C. S. Ong, 2021. [Link]
18 Linear Algebra
Vector
Figure 2.2 A mind
map of the concepts
es
p os pro
p introduced in this
erty
closure
m
co of chapter, along with
Chapter 5 where they are used
Matrix Abelian
Vector calculus with + in other parts of the
s Linear
nt Vector space Group independence book.
rep
es e
pr
res
re
maximal set
ent
s
System of
linear equations
Linear/affine
so mapping
lv
solved by
es Basis
Matrix
inverse
Gaussian
elimination
Chapter 3 Chapter 10
Chapter 12
Analytic geometry Dimensionality
Classification
reduction
resources are Gilbert Strang’s Linear Algebra course at MIT and the Linear
Algebra Series by 3Blue1Brown.
Linear algebra plays an important role in machine learning and gen-
eral mathematics. The concepts introduced in this chapter are further ex-
panded to include the idea of geometry in Chapter 3. In Chapter 5, we
will discuss vector calculus, where a principled knowledge of matrix op-
erations is essential. In Chapter 10, we will use projections (to be intro-
duced in Section 3.8) for dimensionality reduction with principal compo-
nent analysis (PCA). In Chapter 9, we will discuss linear regression, where
linear algebra plays a central role for solving least-squares problems.
Example 2.1
A company produces products N1 , . . . , Nn for which resources
R1 , . . . , Rm are required. To produce a unit of product Nj , aij units of
resource Ri are needed, where i = 1, . . . , m and j = 1, . . . , n.
The objective is to find an optimal production plan, i.e., a plan of how
many units xj of product Nj should be produced if a total of bi units of
resource Ri are available and (ideally) no resources are left over.
If we produce x1 , . . . , xn units of the corresponding products, we need
a total of
ai1 x1 + · · · + ain xn (2.2)
many units of resource Ri . An optimal production plan (x1 , . . . , xn ) ∈ Rn ,
therefore, has to satisfy the following system of equations:
a11 x1 + · · · + a1n xn = b1
.. , (2.3)
.
am1 x1 + · · · + amn xn = bm
where aij ∈ R and bi ∈ R.
system of linear Equation (2.3) is the general form of a system of linear equations, and
equations x1 , . . . , xn are the unknowns of this system. Every n-tuple (x1 , . . . , xn ) ∈
solution Rn that satisfies (2.3) is a solution of the linear equation system.
Example 2.2
The system of linear equations
x1 + x2 + x3 = 3 (1)
x1 − x2 + 2x3 = 2 (2) (2.4)
2x1 + 3x3 = 1 (3)
has no solution: Adding the first two equations yields 2x1 +3x3 = 5, which
contradicts the third equation (3).
Let us have a look at the system of linear equations
x1 + x2 + x3 = 3 (1)
x1 − x2 + 2x3 = 2 (2) . (2.5)
x2 + x3 = 2 (3)
From the first and third equation, it follows that x1 = 1. From (1)+(2),
we get 2x1 + 3x3 = 5, i.e., x3 = 1. From (3), we then get that x2 = 1.
Therefore, (1, 1, 1) is the only possible and unique solution (verify that
(1, 1, 1) is a solution by plugging in).
As a third example, we consider
x1 + x2 + x3 = 3 (1)
x1 − x2 + 2x3 = 2 (2) . (2.6)
2x1 + 3x3 = 5 (3)
Since (1)+(2)=(3), we can omit the third equation (redundancy). From
(1) and (2), we get 2x1 = 5−3x3 and 2x2 = 1+x3 . We define x3 = a ∈ R
as a free variable, such that any triplet
5 3 1 1
− a, + a, a , a ∈ R (2.7)
2 2 2 2
4x1 + 4x2 = 5
(2.8)
2x1 − 4x2 = 1
where the solution space is the point (x1 , x2 ) = (1, 41 ). Similarly, for three
variables, each linear equation determines a plane in three-dimensional
space. When we intersect these planes, i.e., satisfy all linear equations at
the same time, we can obtain a solution set that is a plane, a line, a point
or empty (when the planes have no common intersection). ♦
For a systematic approach to solving systems of linear equations, we
will introduce a useful compact notation. We collect the coefficients aij
into vectors and collect the vectors into matrices. In other words, we write
the system from (2.3) in the following form:
a11 a12 a1n b1
.. .. .. ..
. x1 + . x2 + · · · + . xn = . (2.9)
am1 am2 amn bm
am1 · · · amn xn bm
In the following, we will have a close look at these matrices and de-
fine computation rules. We will return to solving linear equations in Sec-
tion 2.3.
2.2 Matrices
Matrices play a central role in linear algebra. They can be used to com-
pactly represent systems of linear equations, but they also represent linear
functions (linear mappings) as we will see later in Section 2.7. Before we
discuss some of these interesting topics, let us first define what a matrix
is and what kind of operations we can do with matrices. We will see more
properties of matrices in Chapter 4.
matrix Definition 2.1 (Matrix). With m, n ∈ N a real-valued (m, n) matrix A is
an m·n-tuple of elements aij , i = 1, . . . , m, j = 1, . . . , n, which is ordered
according to a rectangular scheme consisting of m rows and n columns:
a11 a12 · · · a1n
a21 a22 · · · a2n
A = .. .. .. , aij ∈ R . (2.11)
. . .
am1 am2 · · · amn
row By convention (1, n)-matrices are called rows and (m, 1)-matrices are called
column columns. These special matrices are also called row/column vectors.
row vector
column vector Rm×n is the set of all real-valued (m, n)-matrices. A ∈ Rm×n can be
Figure 2.4 By equivalently represented as a ∈ Rmn by stacking all n columns of the
stacking its matrix into a long vector; see Figure 2.4.
columns, a matrix A
can be represented
as a long vector a.
2.2.1 Matrix Addition and Multiplication
A ∈ R4×2 a ∈ R8
The sum of two matrices A ∈ Rm×n , B ∈ Rm×n is defined as the element-
re-shape
wise sum, i.e.,
a11 + b11 · · · a1n + b1n
.. .. m×n
A + B := ∈R . (2.12)
. .
am1 + bm1 · · · amn + bmn
Note the size of the For matrices A ∈ Rm×n , B ∈ Rn×k , the elements cij of the product
matrices. C = AB ∈ Rm×k are computed as
C =
n
[Link](’il, X
lj’, A, B) cij = ail blj , i = 1, . . . , m, j = 1, . . . , k. (2.13)
l=1
This means, to compute element cij we multiply the elements of the ith There are n columns
row of A with the j th column of B and sum them up. Later in Section 3.2, in A and n rows in
B so that we can
we will call this the dot product of the corresponding row and column. In
compute ail blj for
cases, where we need to be explicit that we are performing multiplication, l = 1, . . . , n.
we use the notation A · B to denote multiplication (explicitly showing Commonly, the dot
“·”). product between
two vectors a, b is
Remark. Matrices can only be multiplied if their “neighboring” dimensions denoted by a> b or
match. For instance, an n × k -matrix A can be multiplied with a k × m- ha, bi.
matrix B , but only from the left side:
A |{z}
|{z} B = |{z}
C (2.14)
n×k k×m n×m
Example 2.3
0 2
1 2 3
For A = ∈ R2×3 , B = 1 −1 ∈ R3×2 , we obtain
3 2 1
0 1
0 2
1 2 3 2 3
AB = 1 −1 = ∈ R2×2 , (2.15)
3 2 1 2 5
0 1
0 2 6 4 2
1 2 3
BA = 1 −1 = −2 0 2 ∈ R3×3 . (2.16)
3 2 1
0 1 3 2 1
associativity
Associativity:
(λψ)C = λ(ψC), C ∈ Rm×n
λ(BC) = (λB)C = B(λC) = (BC)λ, B ∈ Rm×n , C ∈ Rn×k .
Note that this allows us to move scalar values around.
distributivity (λC)> = C > λ> = C > λ = λC > since λ = λ> for all λ ∈ R.
Distributivity:
(λ + ψ)C = λC + ψC, C ∈ Rm×n
λ(B + C) = λB + λC, B, C ∈ Rm×n
and use the rules for matrix multiplication, we can write this equation
system in a more compact form as
2 3 5 x1 1
4 −2 −7 x2 = 8 . (2.36)
9 5 −3 x3 2
Note that x1 scales the first column, x2 the second one, and x3 the third
one.
Generally, a system of linear equations can be compactly represented in
their matrix form as Ax = b; see (2.3), and the product Ax is a (linear)
combination of the columns of A. We will discuss linear combinations in
more detail in Section 2.5.
so that 0 = 8c1 + 2c2 − 1c3 + 0c4 and (x1 , x2 , x3 , x4 ) = (8, 2, −1, 0). In
fact, any scaling of this solution by λ1 ∈ R produces the 0 vector, i.e.,
8
1 0 8 −4 2
= λ1 (8c1 + 2c2 − c3 ) = 0 .
λ1 (2.41)
0 1 2 12 −1
0
Following the same line of reasoning, we express the fourth column of the
matrix in (2.38) using the first two columns and generate another set of
non-trivial versions of 0 as
−4
1 0 8 −4 12
= λ2 (−4c1 + 12c2 − c4 ) = 0
λ2 (2.42)
0 1 2 12 0
−1
for any λ2 ∈ R. Putting everything together, we obtain all solutions of the
general solution equation system in (2.38), which is called the general solution, as the set
−4
42 8
8 2 12
4
x ∈ R : x = + λ1 + λ2 , λ1 , λ2 ∈ R . (2.43)
0 −1 0
0 0 −1
Example 2.6
For a ∈ R, we seek all solutions of the following system of equations:
−2x1 + 4x2 − 2x3 − x4 + 4x5 = −3
4x1 − 8x2 + 3x3 − 3x4 + x5 = 2
. (2.44)
x1 − 2x2 + x3 − x4 + x5 = 0
x1 − 2x2 − 3x4 + 4x5 = a
We start by converting this system of equations into the compact matrix
notation Ax = b. We no longer mention the variables x explicitly and
build the augmented matrix (in the form A | b ) augmented matrix
−2 −2 −1 −3
4 4 Swap with R3
4
−8 3 −3 1 2
1 −2 1 −1 1 0 Swap with R1
1 −2 0 −3 4 a
where we used the vertical line to separate the left-hand side from the
right-hand side in (2.44). We use to indicate a transformation of the
augmented matrix using elementary transformations. The augmented
matrix A | b
Swapping Rows 1 and 3 leads to
compactly
−2 −1 represents the
1 1 1 0
system of linear
4 −8 3 −3 1 2 −4R1 equations Ax = b.
−2 4 −2 −1 4 −3 +2R1
1 −2 0 −3 4 a −R1
When we now apply the indicated transformations (e.g., subtract Row 1
four times from Row 2), we obtain
−2 −1
1 1 1 0
0 0 −1 1 −3 2
0 0 0 −3 6 −3
0 0 −1 −2 3 a −R2 − R3
−2 −1
1 1 1 0
0 0 −1 1 −3 ·(−1)
2
0 0 0 −3 6 −3 ·(− 31 )
0 0 0 0 0 a+1
−2 −1
1 1 1 0
0 0 1 −1 3 −2
0 0 0 1 −2 1
0 0 0 0 0 a+1
row-echelon form This (augmented) matrix is in a convenient form, the row-echelon form
(REF). Reverting this compact notation back into the explicit notation with
the variables we seek, we obtain
x1 − 2x2 + x3 − x4 + x5 = 0
x3 − x4 + 3x5 = −2
. (2.45)
x4 − 2x5 = 1
0 = a+1
particular solution Only for a = −1 this system can be solved. A particular solution is
2
x1
x2 0
x3 = −1 . (2.46)
x4 1
x5 0
general solution The general solution, which captures the set of all possible solutions, is
2 2 2
0 1 0
5
x∈R :x= −1
+ λ1 0 + λ2 −1 , λ1 , λ2 ∈ R . (2.47)
1 0 2
0 0 1
All rows that contain only zeros are at the bottom of the matrix; corre-
spondingly, all rows that contain at least one nonzero element are on
top of rows that contain only zeros.
Looking at nonzero rows only, the first nonzero number from the left
pivot (also called the pivot or the leading coefficient) is always strictly to the
leading coefficient right of the pivot of the row above it.
In other texts, it is
sometimes required Remark (Basic and Free Variables). The variables corresponding to the
that the pivot is 1. pivots in the row-echelon form are called basic variables and the other
basic variable variables are free variables. For example, in (2.45), x1 , x3 , x4 are basic
free variable variables, whereas x2 , x5 are free variables. ♦
Remark (Obtaining a Particular Solution). The row-echelon form makes
0 0 0 0
From here, we find relatively directly that λ3 = 1, λ2 = −1, λ1 = 2. When
we put everything together, we must not forget the non-pivot columns
for which we set the coefficients implicitly to 0. Therefore, we get the
particular solution x = [2, 0, −1, 1, 0]> . ♦
Remark (Reduced Row Echelon Form). An equation system is in reduced reduced
row-echelon form (also: row-reduced echelon form or row canonical form) if row-echelon form
It is in row-echelon form.
Every pivot is 1.
The pivot is the only nonzero entry in its column.
♦
The reduced row-echelon form will play an important role later in Sec-
tion 2.3.3 because it allows us to determine the general solution of a sys-
tem of linear equations in a straightforward way.
Gaussian
Remark (Gaussian Elimination). Gaussian elimination is an algorithm that elimination
performs elementary transformations to bring a system of linear equations
into reduced row-echelon form. ♦
the second column from three times the first column. Now, we look at the
fifth column, which is our second non-pivot column. The fifth column can
be expressed as 3 times the first pivot column, 9 times the second pivot
column, and −4 times the third pivot column. We need to keep track of
the indices of the pivot columns and translate this into 3 times the first col-
umn, 0 times the second column (which is a non-pivot column), 9 times
the third column (which is our second pivot column), and −4 times the
fourth column (which is the third pivot column). Then we need to subtract
the fifth column to obtain 0. In the end, we are still solving a homogeneous
equation system.
To summarize, all solutions of Ax = 0, x ∈ R5 are given by
3 3
−1 0
5
x ∈ R : x = λ1 0 + λ2 9 , λ1 , λ2 ∈ R .
(2.50)
0 −4
0 −1
I n |A−1 .
A|I n ··· (2.56)
This means that if we bring the augmented equation system into reduced
row-echelon form, we can read out the inverse on the right-hand side of
the equation system. Hence, determining the inverse of a matrix is equiv-
alent to solving systems of linear equations.
and use the Moore-Penrose pseudo-inverse (A> A)−1 A> to determine the Moore-Penrose
solution (2.59) that solves Ax = b, which also corresponds to the mini- pseudo-inverse
mum norm least-squares solution. A disadvantage of this approach is that
it requires many computations for the matrix-matrix product and comput-
ing the inverse of A> A. Moreover, for reasons of numerical precision it
is generally not recommended to compute the inverse or pseudo-inverse.
In the following, we therefore briefly discuss alternative approaches to
solving systems of linear equations.
Gaussian elimination plays an important role when computing deter-
minants (Section 4.1), checking whether a set of vectors is linearly inde-
pendent (Section 2.5), computing the inverse of a matrix (Section 2.2.2),
computing the rank of a matrix (Section 2.6.2), and determining a basis
of a vector space (Section 2.6.1). Gaussian elimination is an intuitive and
constructive way to solve a system of linear equations with thousands of
variables. However, for systems with millions of variables, it is impracti-
cal as the required number of arithmetic operations scales cubically in the
number of simultaneous equations.
In practice, systems of many linear equations are solved indirectly, by ei-
ther stationary iterative methods, such as the Richardson method, the Ja-
cobi method, the Gauß-Seidel method, and the successive over-relaxation
method, or Krylov subspace methods, such as conjugate gradients, gener-
alized minimal residual, or biconjugate gradients. We refer to the books
by Stoer and Burlirsch (2002), Strang (2003), and Liesen and Mehrmann
(2015) for further details.
Let x∗ be a solution of Ax = b. The key idea of these iterative methods
is to set up an iteration of the form
x(k+1) = Cx(k) + d (2.60)
for suitable C and d that reduces the residual error kx(k+1) − x∗ k in every
iteration and converges to x∗ . We will introduce norms k · k, which allow
us to compute similarities between vectors, in Section 3.1.
2.4.1 Groups
Groups play an important role in computer science. Besides providing a
fundamental framework for operations on sets, they are heavily used in
cryptography, coding theory, and graphics.
Definition 2.7 (Group). Consider a set G and an operation ⊗ : G ×G → G
group defined on G . Then G := (G, ⊗) is called a group if the following hold:
closure
associativity 1. Closure of G under ⊗: ∀x, y ∈ G : x ⊗ y ∈ G
neutral element 2. Associativity: ∀x, y, z ∈ G : (x ⊗ y) ⊗ z = x ⊗ (y ⊗ z)
inverse element 3. Neutral element: ∃e ∈ G ∀x ∈ G : x ⊗ e = x and e ⊗ x = x
4. Inverse element: ∀x ∈ G ∃y ∈ G : x ⊗ y = e and y ⊗ x = e, where e is
the neutral element. We often write x−1 to denote the inverse element
of x.
Remark. The inverse element is defined with respect to the operation ⊗
and does not necessarily mean x1 . ♦
Abelian group If additionally ∀x, y ∈ G : x ⊗ y = y ⊗ x, then G = (G, ⊗) is an Abelian
group (commutative).
Remark. The following properties are useful to find out whether vectors
are linearly independent:
– The pivot columns indicate the vectors, which are linearly indepen-
dent of the vectors on the left. Note that there is an ordering of vec-
tors when the matrix is built.
– The non-pivot columns can be expressed as linear combinations of
the pivot columns on their left. For instance, the row-echelon form
1 3 0
(2.66)
0 0 2
tells us that the first and third columns are pivot columns. The sec-
ond column is a non-pivot column because it is three times the first
column.
All column vectors are linearly independent if and only if all columns
are pivot columns. If there is at least one non-pivot column, the columns
(and, therefore, the corresponding vectors) are linearly dependent.
Example 2.14
Consider R4 with
−1
1 1
2 1 −2
x1 =
−3 ,
x2 =
0 ,
x3 =
1 .
(2.67)
4 2 1
To check whether they are linearly dependent, we follow the general ap-
proach and solve
−1
1 1
2 1 −2
λ1 x1 + λ2 x2 + λ3 x3 = λ1
−3 + λ2 0 + λ3 1 = 0
(2.68)
4 2 1
for λ1 , . . . , λ3 . We write the vectors xi , i = 1, 2, 3, as the columns of a
matrix and apply elementary row operations until we identify the pivot
columns:
1 −1 1 −1
1 1
2 1 −2 0 1 0
−3
··· . (2.69)
0 1 0 0 1
4 2 1 0 0 0
Here, every column of the matrix is a pivot column. Therefore, there is no
non-trivial solution, and we require λ1 = 0, λ2 = 0, λ3 = 0 to solve the
equation system. Hence, the vectors x1 , x2 , x3 are linearly independent.
This means that {x1 , . . . , xm } are linearly independent if and only if the
column vectors {λ1 , . . . , λm } are linearly independent.
♦
Remark. In a vector space V , m linear combinations of k vectors x1 , . . . , xk
are linearly dependent if m > k . ♦
Example 2.15
Consider a set of linearly independent vectors b1 , b2 , b3 , b4 ∈ Rn and
x1 = b1 − 2b2 + b3 − b4
x2 = −4b1 − 2b2 + 4b4
. (2.73)
x3 = 2b1 + 3b2 − b3 − 3b4
x4 = 17b1 − 10b2 + 11b3 + b4
Are the vectors x1 , . . . , x4 ∈ Rn linearly independent? To answer this
question, we investigate whether the column vectors
1 −4 2 17
−2 , −2 , 3 , −10
1 0 −1 11 (2.74)
−1 4 −3 1
Generating sets are sets of vectors that span vector (sub)spaces, i.e.,
every vector can be represented as a linear combination of the vectors
in the generating set. Now, we will be more specific and characterize the
smallest generating set that spans a vector (sub)space.
Example 2.16
The set
1 2 1
2 −1 1
A= , , (2.80)
3
0 0
4 2 −4
2 −1 −4 8
x1 , x2 , x3 , x4 = −1 1
3 −5 . (2.83)
−1 2 5 −6
−1 −2 −3 1
With the basic transformation rules for systems of linear equations, we
obtain the row-echelon form
1 2 3 −1 1 2 3 −1
2 −1 −4 8 0 1 2 −2
−1 1 3 −5 · · · 0 0 0 1 .
−1 2 5 −6 0 0 0 0
−1 −2 −3 1 0 0 0 0
Since the pivot columns indicate which set of vectors is linearly indepen-
dent, we see from the row-echelon form that x1 , x2 , x4 are linearly inde-
pendent (because the system of linear equations λ1 x1 + λ2 x2 + λ4 x4 = 0
can only be solved with λ1 = λ2 = λ4 = 0). Therefore, {x1 , x2 , x4 } is a
basis of U .
2.6.2 Rank
The number of linearly independent columns of a matrix A ∈ Rm×n
equals the number of linearly independent rows and is called the rank rank
of A and is denoted by rk(A).
Remark. The rank of a matrix has some important properties:
rk(A) = rk(A> ), i.e., the column rank equals the row rank.
The columns of A ∈ Rm×n span a subspace U ⊆ Rm with dim(U ) =
rk(A). Later we will call this subspace the image or range. A basis of
U can be found by applying Gaussian elimination to A to identify the
pivot columns.
The rows of A ∈ Rm×n span a subspace W ⊆ Rn with dim(W ) =
rk(A). A basis of W can be found by applying Gaussian elimination to
A> .
For all A ∈ Rn×n it holds that A is regular (invertible) if and only if
rk(A) = n.
For all A ∈ Rm×n and all b ∈ Rm it holds that the linear equation
system Ax = b can be solved if and only if rk(A) = rk(A|b), where
A|b denotes the augmented system.
For A ∈ Rm×n the subspace of solutions for Ax = 0 possesses dimen-
sion n − rk(A). Later, we will call this subspace the kernel or the null kernel
space. null space
A matrix A ∈ Rm×n has full rank if its rank equals the largest possible full rank
rank for a matrix of the same dimensions. This means that the rank of
a full-rank matrix is the lesser of the number of rows and columns, i.e.,
rk(A) = min(m, n). A matrix is said to be rank deficient if it does not rank deficient
have full rank.
♦
1 2 1
A = −2 −3 1 .
3 5 0
We use Gaussian elimination to determine the rank:
1 2 1 1 2 1
−2 −3 1 ··· 0 1 3 . (2.84)
3 5 0 0 0 0
Here, we see that the number of linearly independent rows and columns
is 2, such that rk(A) = 2.
B = (b1 , . . . , bn ) (2.89)
x = α1 b1 + . . . + αn bn (2.90)
Example 2.20
Let us have a look at a geometric vector x ∈ R2 with coordinates [2, 3]> Figure 2.9
with respect to the standard basis (e1 , e2 ) of R2 . This means, we can write Different coordinate
representations of a
x = 2e1 + 3e2 . However, we do not have to choose the standard basis to
vector x, depending
represent this vector. If we use the basis vectors b1 = [1, −1]> , b2 = [1, 1]> on the choice of
we will obtain the coordinates 21 [−1, 5]> to represent the same vector with basis.
respect to (b1 , b2 ) (see Figure 2.9). x = 2e1 + 3e2
x = − 12 b1 + 52 b2
ŷ = AΦ x̂ . (2.94)
This means that the transformation matrix can be used to map coordinates
with respect to an ordered basis in V to coordinates with respect to an
ordered basis in W .
where we first expressed the new basis vectors c̃k ∈ W as linear com-
binations of the basis vectors cl ∈ W and then swapped the order of
summation.
Alternatively, when we express the b̃j ∈ V as linear combinations of
bj ∈ V , we arrive at
n
! n n m
(2.106)
X X X X
Φ(b̃j ) = Φ sij bi = sij Φ(bi ) = sij ali cl (2.109a)
i=1 i=1 i=1 l=1
m n
!
X X
= ali sij cl , j = 1, . . . , n , (2.109b)
l=1 i=1
and, therefore,
such that
ÃΦ = T −1 AΦ S , (2.112)
ÃΦ = T −1 AΦ S. (2.113)
1 1 0 1
1 0 1 1 0 1 0
B̃ = (1 , 1 , 0) ∈ R3 , 0 , 1 , 1 , 0) .
C̃ = ( (2.119)
0 1 1
0 0 0 1
Then,
1 1 0 1
1 0 1 1 0 1 0
S = 1 1 0 , T =
0
, (2.120)
1 1 0
0 1 1
0 0 0 1
where the ith column of S is the coordinate representation of b̃i in
terms of the basis vectors of B . Since B is the standard basis, the co-
ordinate representation is straightforward to find. For a general basis B ,
we would need to solve a linear equation system to find the λi such that
P3
i=1 λi bi = b̃j , j = 1, . . . , 3. Similarly, the j th column of T is the coordi-
nate representation of c̃j in terms of the basis vectors of C .
Therefore, we obtain
1 −1 −1
1 3 2 1
1 1 −1 1 −1 0 4 2
ÃΦ = T −1 AΦ S =
(2.121a)
2 −1 1 1 1 10 8 4
0 0 0 2 1 6 3
−4 −4 −2
6 0 0
= 4
. (2.121b)
8 4
1 6 3
ker(Φ) Im(Φ)
0V 0W
1 2 −1 0
= x1 + x2 + x3 + x4 (2.125b)
1 0 0 1
is linear. To determine Im(Φ), we can take the span of the columns of the
transformation matrix and obtain
1 2 −1 0
Im(Φ) = span[ , , , ]. (2.126)
1 0 0 1
To compute the kernel (null space) of Φ, we need to solve Ax = 0, i.e.,
we need to solve a homogeneous equation system. To do this, we use
Gaussian elimination to transform A into reduced row-echelon form:
1 2 −1 0 1 0 0 1
··· . (2.127)
1 0 0 1 0 1 − 21 − 12
This matrix is in reduced row-echelon form, and we can use the Minus-
1 Trick to compute a basis of the kernel (see Section 2.3.3). Alternatively,
we can express the non-pivot columns (columns 3 and 4) as linear com-
binations of the pivot columns (columns 1 and 2). The third column a3 is
equivalent to − 21 times the second column a2 . Therefore, 0 = a3 + 12 a2 . In
the same way, we see that a4 = a1 − 12 a2 and, therefore, 0 = a1 − 12 a2 −a4 .
Overall, this gives us the kernel (null space) as
−1
0
1 1
ker(Φ) = span[ 1 , 0 ] .
2 2 (2.128)
0 1
rank-nullity
theorem Theorem 2.24 (Rank-Nullity Theorem). For vector spaces V, W and a lin-
ear mapping Φ : V → W it holds that
dim(ker(Φ)) + dim(Im(Φ)) = dim(V ) . (2.129)
fundamental The rank-nullity theorem is also referred to as the fundamental theorem
theorem of linear of linear mappings (Axler, 2015, theorem 3.22). The following are direct
mappings
consequences of Theorem 2.24:
If dim(Im(Φ)) < dim(V ), then ker(Φ) is non-trivial, i.e., the kernel
contains more than 0V and dim(ker(Φ)) > 1.
If AΦ is the transformation matrix of Φ with respect to an ordered basis
and dim(Im(Φ)) < dim(V ), then the system of linear equations AΦ x =
0 has infinitely many solutions.
If dim(V ) = dim(W ), then the following three-way equivalence holds:
– Φ is injective
– Φ is surjective
– Φ is bijective
since Im(Φ) ⊆ W .
One-dimensional affine subspaces are called lines and can be written line
as y = x0 + λb1 , where λ ∈ R and U = span[b1 ] ⊆ Rn is a one-
dimensional subspace of Rn . This means that a line is defined by a sup-
port point x0 and a vector b1 that defines the direction. See Figure 2.13
for an illustration.
Exercises
2.1 We consider (R\{−1}, ?), where
a ? b := ab + a + b, a, b ∈ R\{−1} (2.134)
3 ? x ? x = 15
k = {x ∈ Z | x − k = 0 (modn)}
= {x ∈ Z | ∃a ∈ Z : (x − k = n · a)} .
Zn = {0, 1, . . . , n − 1}
a ⊕ b := a + b
a ⊗ b = a × b, (2.135)
a.
1 2 1 1 0
4 5 0 1 1
7 8 1 0 1
b.
1 2 3 1 1 0
4 5 6 0 1 1
7 8 9 1 0 1
c.
1 1 0 1 2 3
0 1 1 4 5 6
1 0 1 7 8 9
d.
0 3
1 2 1 2 1 −1
4 1 −1 −4 2 1
5 2
e.
0 3
1
−1 1 2 1 2
2 1 4 1 −1 −4
5 2
2.5 Find the set S of all solutions in x of the following inhomogeneous linear
systems Ax = b, where A and b are defined as follows:
a.
1 1 −1 −1 1
2 5 −7 −5 −2
A= , b=
2 −1 1 3 4
5 2 −4 2 6
b.
1 −1 0 0 1 3
1 1 0 −3 0 6
A= , b=
2 −1 0 1 −1 5
−1 2 0 −2 −1 −1
2.6 Using Gaussian elimination, find all solutions of the inhomogeneous equa-
tion system Ax = b with
0 1 0 0 1 0 2
A = 0 0 0 1 1 0 , b = −1 .
0 1 0 0 0 1 1
and 3i=1 xi = 1.
P
2.8 Determine the inverses of the following matrices if possible:
a.
2 3 4
A = 3 4 5
4 5 6
b.
1 0 1 0
0 1 1 0
A=
1
1 0 1
1 1 1 0
1 −1 1 1 0 −1
Determine a basis of U1 ∩ U2 .
2.13 Consider two subspaces U1 and U2 , where U1 is the solution space of the
homogeneous equation system A1 x = 0 and U2 is the solution space of the
homogeneous equation system A2 x = 0 with
1 0 1 3 −3 0
1 −2 −1 1 2 3
A1 = , A2 = .
2 1 3 7 −5 2
1 0 1 3 −1 2
where L1 ([a, b]) denotes the set of integrable functions on [a, b].
b.
Φ : C1 → C0
f 7→ Φ(f ) = f 0 ,
c.
Φ:R→R
x 7→ Φ(x) = cos(x)
d.
Φ : R 3 → R2
1 2 3
x 7→ x
1 4 3
and let us define two ordered bases B = (b1 , b2 ) and B 0 = (b01 , b02 ) of R2 .
a. Show that B and B 0 are two bases of R2 and draw those basis vectors.
b. Compute the matrix P 1 that performs a basis change from B 0 to B .
c. We consider c1 , c2 , c3 , three vectors of R3 defined in the standard basis
of R3 as
1 0 1
c1 = 2 , c2 = −1 , c3 = 0
−1 2 −1