CUST 2023-2024
Maths for Physics 2
Session 7
Square matrices
Summary
Here we study a few important aspects of square matrices. First, we discuss the
notions of eigenvalues, eigenvectors and diagonalization of a square matrix. We also
discuss a useful way of writing a square matrix, namely in the form of a so-called
LU decomposition.
i
Contents
1 Eigenvalues and eigenvectors 1
2 Diagonalization 4
3 The LU decomposition 8
ii
Chapter 1
Eigenvalues and eigenvectors
Let M be a real or complex n × n square matrix, with n an integer, n ⩾ 2. Let λ be
a scalar (i.e. a real or a complex number), and v be a nonzero vector (to be more
precise, a column vector here and in the sequel). Then λ is said to be an eigenvalue
of M , and v is said to be an eigenvector of M corresponding to, or associated with,
the eigenvalue λ, if we have
M v = λv , (1.1)
that is in matrix form
M11 · · · M1n v1 λv1
. ..
.. ... = ... .
.. . . (1.2)
Mn1 · · · Mnn vn λvn
Note that the zero vector 0 is always a trivial solution of the eigenvalue equa-
tion (1.1). For this reason, the zero vector 0 is never considered to be an eigenvector
of M : an eigenvector of M is a nontrivial (i.e. nonzero) solution of (1.1).
Let’s call I the corresponding identity matrix, i.e. the n × n matrix with only 1
on the diagonal and 0 elsewhere, that is
10 ··· 0
0 . . · · · ...
.
I = . . . (1.3)
.. . . . . . ...
0 ··· ··· 1
Note that we can thus write (1.1) as
(M − λI) v = 0 . (1.4)
1
CHAPTER 1. EIGENVALUES AND EIGENVECTORS 2
Now, let’s imagine that the matrix M −λI is invertible: we would then get from (1.4),
upon multiplying both sides by (M − λI)−1 on the left, that
v = (M − λI)−1 0 = 0 ,
which contradicts our assumption of v being nonzero. Therefore, this shows that
the matrix M − λI can actually not be invertible, and must thus be singular, so
that its determinant must be zero, that is
det (M − λI) = 0 . (1.5)
This equation is referred to as the characteristic equation of the matrix M . By
definition, det (M − λI) is an n-th order polynomial in λ, whose coefficients are
expressed in terms of the matrix coefficients Mij of M : the polynomial det (M − λI)
is then referred to as the characteristic polynomial of M . Since the latter is an n-th
order polynomial, the fundamental theorem of algebra ensures that det (M − λI)
has n (not necessarily distinct) roots λ1 , . . . , λn , so that1
det (M − λI) = (λ − λ1 ) · · · (λ − λn ) . (1.6)
In view of (1.5), these roots λ1 , . . . , λn of the characteristic polynomial hence pre-
cisely correspond to the eigenvalues of M . The set of numbers
S = {λ1 , . . . , λn } , (1.7)
formed by the eigenvalues of M , is called the spectrum of M .
REMARK: the eigenvalue equation (1.1) actually only defines a family of eigen-
vectors αv, with α any nonzero scalar (real or complex): indeed, if v is solution
of (1.1), so is αv. Therefore, very often, within this whole family of eigenvectors
we’ll pick the one that is normalized, i.e. the one that satisfies
v1
..
2 2
v T v = v1 · · · vn . = v1 + . . . + vn = 1 . (1.8)
vn
REMARK: it may occur that some of the roots λj in (1.6) are actually equal.
In this case, we say that this eigenvalue is degenerate, and the number of λj in (1.6)
that are equal to this eigenvalue gives the degeneracy of this eigenvalue. Regarding
1
Since we’ll only be interested in the roots λ1 , . . . , λn , the actual value of the coefficient of the
higher-order term λn is irrelevant, so that we fix it to be 1.
CHAPTER 1. EIGENVALUES AND EIGENVECTORS 3
the corresponding eigenvectors, we then keep the ones that are linearly independent:
actually, it often proves to be convenient to keep the ones that are orthogonal. It
may however be the case that an eigenvalue that has degeneracy g ⩾ 2 has less than
g linearly independent eigenvectors. We recall that two vectors v and w are said to
be linearly independent if we have the equivalence, for two scalars α and β,
αv + βw = 0 ⇐⇒ α = β = 0 . (1.9)
EXAMPLE: let’s consider the matrix
!
2 1
M= . (1.10)
0 2
It admits the single eigenvalue λ = 2 that has degeneracy g = 2. The eigenvalue
equation M v = 2v then yields eigenvectors v that are of the form
!
x
v= , x ∈ R. (1.11)
0
Note that x must be nonzero in (1.11), otherwise v would be the zero vector. But
of course, choosing two different values of x in (1.11), say x and x′ , does not yield
two linearly independent vectors v and v ′ . Therefore, (1.10) is an example of a
2 × 2 matrix that has an eigenvalue of degeneracy 2 but only a single corresponding
eigenvector.
Chapter 2
Diagonalization
Here we discuss an important notion, deeply connected to eigenvalues and eigenvec-
tors: the notion of diagonalization of a square matrix.
To introduce this notion, we consider an n × n matrix M , with eigenvalues
λ1 , . . . , λn (not necessarily distinct) and the corresponding eigenvectors V1 , . . . , Vn ,
that is
M Vj = λj Vj , ∀j = 1, . . . , n . (2.1)
We then construct two other n × n matrices, denote them by D and V . First, D is
the diagonal matrix whose diagonal elements are the n eigenvalues λj (which, again,
are not necessarily distinct) of M , that is
λ1 0 · · · 0
0 . . . · · · ...
D=. . . (2.2)
.. . . . . . ...
0 · · · · · · λn
Then, the matrix V is defined so that its columns are the eigenvectors Vj of M ,
which we can compactly write as
V = V1 , · · · , Vn . (2.3)
In other words, if we write the eigenvector Vj as the column vector
V1j
.
Vj = .
. , (2.4)
Vnj
4
CHAPTER 2. DIAGONALIZATION 5
we have
V11 · · · V1n
. .. ..
V = .
. . .
. (2.5)
Vn1 · · · Vnn
Let’s now compute the product M V : from the definition (2.3) of V we can write,
in a compact form,
M V = M V1 , · · · , M Vn , (2.6)
and thus in view of (2.1)
M V = λ1 V1 , · · · , λn Vn . (2.7)
Then we compute the product V D. Computing it explicitly from (2.2)-(2.5), we
also get
V D = λ1 V1 , · · · , λn Vn . (2.8)
Therefore, we readily see upon comparing (2.7) and (2.8) that we have the matrix
equality
MV = V D . (2.9)
Now, let’s suppose that the matrix V is invertible: we can thus multiply (2.9)
by V −1 on the left to get
V −1 M V = D . (2.10)
Since the matrix D is by definition diagonal, the result (2.10) is at the heart of the
notion of diagonalization, which is defined as follows:
An n × n matrix M is said to be diagonalizable if there exists an
n × n invertible matrix V such that the matrix
V −1 M V = D (2.11)
is diagonal. The matrix V is then said to diagonalize M .
In view of our construction above [see equations (2.1)-(2.10)] we hence see that
the matrix V that diagonalizes M (if it is invertible) is such that its columns are
the eigenvectors of M . The resulting diagonal matrix has thus the eigenvalues of
CHAPTER 2. DIAGONALIZATION 6
M as its diagonal elements. We can actually show (which we won’t do here) that
such a matrix V is invertible if and only if the n eigenvectors V1 , . . . , Vn of M are
linearly independent1 . That is, M is diagonalizable if and only if it has n linearly
independent eigenvectors.
Diagonalization is very useful in practice. For instance, it allows to compute the
power of an arbitrary matrix in a quite simple way. Indeed, let’s rewrite (2.11) so
as to express M : multiplying (2.11) on the left by V and on the right by V −1 yields
M = V DV −1 . (2.12)
Now, let’s first compute M 2 : we have in view of (2.12)
M 2 = V D2 V −1 . (2.13)
But since D is by construction diagonal, it’s immediate to compute its square: D2
is merely the diagonal matrix formed by the squares of the diagonal elements of D,
i.e. in view of (2.2)
λ21 0 ··· 0
.. ..
0 . ··· .
D2 = . .
.. .. .. ..
. . .
2
0 ··· · · · λn
This can then be repeated for an arbitrary power k: computing M k , k ∈ N, yields
in view of (2.12)
M k = V Dk V −1 , (2.14)
with in view of (2.2)
λk1 0 ··· 0
.. .
k
0 . · · · ..
D = . .
.. .. .. ..
. . .
0 · · · · · · λkn
The expression (2.14) has a clear practical advantage regarding for instance the
computational cost: to compute M k from brute force requires k − 1 matrix mul-
tiplications, while to compute V Dk V −1 requires only two matrix multiplications,
1
Which, we emphasize, does not require the n eigenvalues λ1 , . . . , λn to be distinct: it may still
be the case that some eigenvalues of M are degenerate, but the corresponding eigenvectors are
linearly independent.
CHAPTER 2. DIAGONALIZATION 7
independently of the value of k.
This is especially important in view of computing functions of matrices. Indeed,
if f (x) is a function of a real variable x, we can define the matrix f (M ) that results
from replacing x by the matrix M in the Taylor series of f (x) around 0. That is,
writing the latter as
∞
X f (k) (0)
f (x) = xk ,
k=0
k!
and replacing x by the matrix M , we define a matrix that we denote by f (M ) and
that we call the function f of M , namely
∞
X f (k) (0)
f (M ) = Mk . (2.15)
k=0
k!
For instance, we hence have the matrix exponential eM , which is then defined through
the series
∞
M
X 1 k
e = M . (2.16)
k=0
k!
Chapter 3
The LU decomposition
Let’s first discuss the general idea that underlies the notion of LU decomposition.
Let’s assume that we want to solve a system of n ⩾ 2 linear algebraic equations of
the form
A11 x1 + . . . + A1n xn = B1 , (3.1a)
..
.
An1 x1 + . . . + Ann xn = Bn , (3.1b)
where the coefficients Aij are supposed to be known, as well as the numbers Bj ,
and the numbers xj are the unknowns. We can combine these n equations in the
single matrix equation
Ax = B , (3.2)
where A is the n × n matrix
A11 · · · A1n
. ... ..
A= ..
,
. (3.3)
An1 · · · Ann
while x and B are the column vectors
x1 B1
. .
x= . B= .
. and . . (3.4)
xn Bn
The so-called LU decomposition of the matrix A is then defined as follows:
8
CHAPTER 3. THE LU DECOMPOSITION 9
Suppose that we can find two n × n matrices L and U such that L is lower
triangular (i.e., L only has nonzero elements on its diagonal and below the
diagonal) and U is upper triangular (i.e., U only has nonzero elements on its
diagonal and above the diagonal), namely
L11 0 · · · 0 U11 U12 · · · U1n
.. .. .. ..
L
21 . ··· . 0 . ··· .
L= . and U = , (3.5)
.. . .. . .. .
..
.. .. .. ..
. . . .
Ln1 · · · · · · Lnn 0 ··· ··· Unn
such that A can be written as the product
A = LU . (3.6)
The expression (3.6) of A is then called the LU decomposition of A.
The advantage of the LU decomposition of A arises from the triangular nature
of the two matrices L and U . To see this, let’s substitute (3.6) into (3.2), we have
Ax = LU x = L (U x) = B . (3.7)
This allows to break the equation Ax = B that we want to solve into two equations:
i) first, we can solve the equation
Ly = B , (3.8)
and thus obtain the corresponding vector y;
ii) and then, once we have y, we can solve the equation
Ux = y , (3.9)
which hence yields the solution x to our original equation Ax = B.
The advantage of separately solving (3.8) and then (3.9) instead of straight away
solving (3.7) is that (3.8) and (3.9) are actually very easy to solve because L and U
are triangular: indeed, (3.8) for instance reads
L11 0 ··· 0 y1 B1
... ..
y B
L
21 ··· .
.2 = . 2 ,
. ... ... .. . .
..
. . .
Ln1 · · · · · · Lnn yn Bn
CHAPTER 3. THE LU DECOMPOSITION 10
from which we readily see from the first equation that
B1
y1 = .
L11
We can then substitute this expression of y1 into the second equation to immediately
get y2 , etc. . . Similarly, (3.9) reads
U11 U12 · · · U1n x y
. . 1 1
0 .. · · · .. x2 y 2
=.,
.. ...
. . . .
.. .. .. ..
0 · · · · · · Unn xn yn
from which we readily see from the last equation that
yn
xn = .
Unn
We can then substitute this expression of xn into the second last equation to get
xn−1 , etc. . . Therefore, the linear equations that must be solved in (3.8) and (3.9)
are somehow already ordered in a suitable way so as to allow solving them without
too much effort.
Another advantage of triangular matrices is that it is very easy to compute their
determinant: indeed, the determinant of a triangular matrix is merely the product
of its diagonal elements. We hence have
n
Y n
Y
det L = Ljj and det U = Ujj . (3.10)
j=1 j=1
Therefore, if we have the LU decomposition (3.6) of the matrix A, then it’s really
easy to compute the determinant of A, and we have
n
! n
!
Y Y
det A = det L det U = Ljj Ujj . (3.11)
j=1 j=1
Even more, since in practice it proves to be always possible to choose1
L11 = L22 = . . . = Lnn = 1 , (3.12)
1
Indeed, note that the equation (3.6) only yields n × n = n2 equations for the n2 + n unknowns
Lij and Uij . In order to get a unique solution, i.e. in order to uniquely determine the matrices L
and U , we must thus impose n additional constraints on the unknowns Lij and Uij : (3.12) is just
a (convenient) example of such additional constraints.
CHAPTER 3. THE LU DECOMPOSITION 11
then we have the even simpler expression of det A
n
Y
det A = det U = Ujj . (3.13)
j=1
While things are nice once we have the LU decomposition (3.6) of the matrix A, the
difficult part is of course to actually obtain this decomposition for a given matrix
A. And of course, the larger is n, the more difficult it gets. There are however some
algorithms that have been designed to accomplish such a task, so in practice we can
of course use them if we need to find an LU decomposition.
As a final remark, note that the matrix A only contains the coefficients of the
linear equations (3.1), and in particular makes no reference to the vector B: this
means that once we have the LU decomposition of A, we can then efficiently solve
the equations (3.1), or equivalently (3.2), for many different vectors B, which is a
distinct advantage of the LU decomposition over some other methods that can be
used to solve the linear equations (3.1).