0% found this document useful (0 votes)
28 views36 pages

Module 3 - Supplementary Slides

Module 3 covers vector arithmetic, norms, and properties of matrices, including operations like modulo, integer division, and matrix multiplication. It introduces concepts such as the norm of a vector, the angle between vectors, and homogeneous systems of linear equations. The module also discusses the Moore-Penrose pseudo inverse for solving linear equations when no exact solution exists.

Uploaded by

Gia Linh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views36 pages

Module 3 - Supplementary Slides

Module 3 covers vector arithmetic, norms, and properties of matrices, including operations like modulo, integer division, and matrix multiplication. It introduces concepts such as the norm of a vector, the angle between vectors, and homogeneous systems of linear equations. The module also discusses the Moore-Penrose pseudo inverse for solving linear equations when no exact solution exists.

Uploaded by

Gia Linh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

Module 3: Supplementary Slides

(These additional materials are optional and intended for students who are interested)
Vectors
More Vector Arithmetic
m%%n Modulo operator (gives the remainder of m/n)
%/% Integer division (gives the integer part of m/n)
%*% Matrix multiplication (to be studied later)
%in% Returns TRUE if the left operand occurs in its right operand; FALSE otherwise

> 14%%5
[1] 4
> 14%%5
[1] 2
> 5%in%14
[1] FALSE
> 5%in%c(5,4)
[1] TRUE
General Norm of a Vector
Norm of a Vector
Definition: a norm for a vector 𝒙 ∈ ℝ𝑛 is a function 𝒙 : ℝ𝒏 → ℝ+ that satisfies the following properties:

1. 𝒙 ≥ 0 and 𝒙 = 0 ⟺ 𝒙 = 0. (Positive definiteness)

2. 𝜆𝒙 = 𝜆 𝒙 , ∀𝜆 ∈ ℝ (Homogeneity)

3. 𝒙+𝒚 ≤ 𝒙 + 𝒚 (Triangular inequality)

A norm is a function that assigns a length to a vector. To compute the distance between two vectors, we calculate the
norm of the difference between those two vectors. For example, the distance between two column vectors 𝒙 ∈ ℝ𝒏 and
𝒚 ∈ ℝ𝒏 using the Euclidean norm is

𝒙−𝒚 = 𝑥1 − 𝑦1 2 + 𝑥2 − 𝑦2 2 + ⋯ + 𝑥𝑛 − 𝑦𝑛 2 = 𝒙 − 𝒚 𝑻 (𝒙 − 𝒚)
Common Norms
𝑳𝒑 norm is a family of commonly used norms for vectors 𝒙 ∈ ℝ𝒏 that are determined by a scalar 𝑝 ≥ 1 as:
𝒑
𝒙 = 𝑥1 𝑝 + 𝑥2 𝑝 + ⋯ + 𝑥𝑛 𝑝
𝑝

Examples:

• 𝑳𝟏 norm: 𝒙 1 = 𝑥1 + 𝑥2 + ⋯ + 𝑥𝑛 (Manhattan/ City-block norm)

• 𝑳𝟐 norm: 𝒙 2 = 𝑥12 + 𝑥22 + ⋯ + 𝑥𝑛2 = 𝒙𝑻 𝒙 (Euclidean norm: we use only this)


𝑝
• 𝑳∞ norm: 𝒙 ∞ = 𝑙𝑖𝑚 𝑥1 𝑝 + 𝑥2 𝑝 + ⋯ + 𝑥𝑛 𝑝 = 𝑚𝑎𝑥 𝑥1 , 𝑥2 , … , 𝑥𝑛 (Maximum norm)
𝑝→∞
Angle between Two Vectors
(i) If 𝒖 and 𝒗 are two unit vectors, then 𝒖 ⋅ 𝑼 = 𝑐𝑜𝑠𝜃
𝒂⋅𝒃
(ii) Cosine Formula: If 𝒂 and 𝒃 are two nonzero vectors then = 𝑐𝑜𝑠𝜃
𝒂 𝒃

(iii) Schwartz Inequality: If 𝒂 and 𝒃 are two nonzero vectors then 𝒂 ⋅ 𝒃 ≤ 𝒂 𝒃


Angle between Two Vectors
Part (i): First, consider 𝒖 = (𝑐𝑜𝑠𝜃, 𝑠𝑖𝑛𝜃) and 𝑼 = 𝒊 = (1,0). Then, clearly 𝒖 ⋅ 𝑼 = 𝑐𝑜𝑠𝜃. After a rotation through any
angle 𝛼 these are still unit vectors. Call the vectors 𝒖 = (𝑐𝑜𝑠𝛽, 𝑠𝑖𝑛𝛽) and 𝑼 = (𝑐𝑜𝑠𝛼, 𝑠𝑖𝑛𝛼). Their dot product is

𝑐𝑜𝑠𝛼𝑐𝑜𝑠𝛽 + 𝑠𝑖𝑛𝛼𝑠𝑖𝑛𝛽 = 𝑐𝑜𝑠(𝛽 − 𝛼). Since 𝛽 − 𝛼 equals 𝜃, we have reached the formula 𝒖 ⋅ 𝑼 = 𝑐𝑜𝑠𝜃.

Parts (ii) and (iii) are immediate, following Part (i)


Other Properties of Matrices
Matrix Multiplication
Block Matrices and Block Multiplication

The elements of 𝑨 can be cut into blocks, which are smaller matrices. If the cuts between columns of 𝑨 match the cuts
between rows of 𝑩, then block multiplication is allowed.

𝑨𝟏𝟏 𝑨𝟏𝟐 𝑩𝟏𝟏 𝑩𝟏𝟐 𝑨 𝑩 + 𝑨𝟏𝟐 𝑩𝟏𝟐 𝑨𝟏𝟏 𝑩𝟏𝟐 + 𝑨𝟏𝟐 𝑩𝟐𝟐
= 𝟏𝟏 𝟏𝟏
𝑨𝟐𝟏 𝑨𝟐𝟐 𝑩𝟐𝟏 𝑩𝟐𝟐 𝑨𝟐𝟏 𝑩𝟏𝟏 + 𝑨𝟐𝟐 𝑩𝟐𝟏 𝑨𝟐𝟏 𝑩𝟏𝟐 + 𝑨𝟐𝟐 𝑩𝟐𝟐

Example:
1 3 0 0 1 3 0 0 7 12 3 16
2 3 1 6 2 3 1 6 5 26 4 24
=
−3 −1 1 0 −3 −1 1 0 −8 −13 0 −6
0 2 0 1 0 2 0 1 4 8 2 13

1 3 1 3 0 0 −3 −1 7 12
+ =
2 3 2 3 1 6 0 2 5 26
Linear Equations
𝑎11 𝑥1 + 𝑎12 𝑥2 + ⋯ + 𝑎1𝑛 𝑥𝑛 = 𝑏1
𝑎21 𝑥1 + 𝑎22 𝑥2 + ⋯ + 𝑎2𝑛 𝑥𝑛 = 𝑏2
⋮ ⇔ 𝑨𝒙 = 𝒃
𝑎𝑚1 𝑥1 + 𝑎𝑚2 𝑥2 + ⋯ + 𝑎𝑚𝑛 𝑥𝑛 = 𝑏𝑚

• 𝑨 = 𝑎𝑖𝑗 = Coefficient matrix


𝑚×𝑛

• 𝒙 = 𝑥𝑗 = Variable vector
𝑛×1

• 𝒃 = 𝑏𝑗 = Vector of right hand side


𝑚×1

The product 𝑨𝒙 is the combination of columns of 𝑨. Hence, the system has solution if 𝒃 is inside the spanned space of
the columns of 𝑨:
𝑎11 𝑎12 𝑎1𝑛 𝑏1
𝑎21 𝑎22 𝑎2𝑛 𝑏2
. . . .
𝑥1 . + 𝑥2 . + ⋯ + 𝑥𝑛 . = .
. . . .
𝑎𝑚1 𝑎𝑚2 𝑎𝑚𝑛 𝑏𝑚
Homogeneous systems
Homogeneous System: The system
𝑎11 𝑥1 + 𝑎12 𝑥2 + ⋯ + 𝑎1𝑛 𝑥𝑛 = 𝑏1
𝑎21 𝑥1 + 𝑎22 𝑥2 + ⋯ + 𝑎2𝑛 𝑥𝑛 = 𝑏2
𝑨𝑚×𝑛 𝒙𝑛×1 = 𝒃𝑚×1 ⟺ ⋮
𝑎𝑚1 𝑥1 + 𝑎𝑚2 𝑥2 + ⋯ + 𝑎𝑚𝑛 𝑥𝑛 = 𝑏𝑚

is called a homogeneous if 𝑏1 = 𝑏2 ⋯ = 𝑏𝑚 = 0. The system is non-homogeneous if at least one of the 𝑏𝑖′ 𝑠 is not 0.

1. if 𝑘 < 𝑛, then the columns of 𝑨 are linearly dependent, i.e., 𝑘 columns are independents and 𝑛 − 𝑘 columns can
be written as a linear combination of the other 𝑘 columns.
2. if 𝑘 = 𝑛, then the columns of 𝑨 are linearly independent, i.e., no column can be written as a linear combination of
other columns.
Moore-Penrose Pseudo Inverse
Moore-Penrose Pseudo Inverse: When 𝑘 = 𝑛 ≤ 𝑚 the system of linear equations 𝑨𝒙 = 𝒃 can have no solution. In that case, we
can resort to an approximation by using a least square in which we determine the best vector 𝒙 that minimizes the sum of square
of errors 𝑨𝒙 − 𝒃 𝟐
= 𝑨𝒙 − 𝒃 𝑻
𝑨𝒙 − 𝒃 . The best fit 𝒙 is obtained as
−𝟏 𝑻
𝒙 = 𝑨𝑻 𝑨 𝑨 𝒃
−𝟏 𝑻
Note that 𝑨𝑻 𝑨 is invertible because it is square matrix. 𝑨𝑻 𝑨 𝑨 is sometimes called Moore-Penrose Pseudo Inverse.

The minimum Euclidean norm 𝑨𝒙 − 𝒃 (i.e., the minimum of squares


of errors) occurs at a point 𝐱 that satisfies: 𝑨𝒙 ⊥ 𝑨𝒙 − 𝒃 :

𝑻
⇒ 𝑨𝒙 ⋅ 𝑨𝒙 − 𝒃 = 0 ⇒ 𝑨𝒙 ⋅ 𝑨𝒙 − 𝒃 = 0 ⇒ 𝒙𝑻 𝑨𝑻 𝑨𝒙 − 𝑨𝑻 𝒃 = 0
−𝟏 𝑻
⇒ 𝑨𝑻 𝑨𝒙 − 𝑨𝑻 𝒃 = 0 ⇒ 𝒙 = 𝑨𝑻 𝑨 𝑨 𝒃
Example
Moore-Penrose Pseudo Inverse

Solve the system by finding the inverse of the coefficient matrix.


𝑥+𝑦 =2
ቐ 𝑥−𝑦 =0
𝑥 + 2𝑦 = 1

1 1 2 1 1 𝑥 2 1 1 2
𝑨 = 1 −1 , 𝒃 = 0 ⇒ 1 −1 𝑦 = 0 ⇒ 𝑥 1 + 𝑦 −1 = 0 ⇒ no solution
1 2 1 1 2 1 1 2 1

−𝟏
𝑥 −𝟏 1 1 2
1 1 1 1 1 1 0.71
𝒙 = 𝑦 = 𝑨𝑻 𝑨 𝑻
𝑨 𝒃= 1 −1 0 =
1 −1 2 1 −1 2 0.43
1 2 1
1 1 𝑥 1 1 1.14
𝑨𝒙 = 1 −1 𝑦 = 𝑥 1 + 𝑦 −1 = 0.29
1 2 1 2 1.57
Example
Moore-Penrose Pseudo Inverse
library(pracma)

A=matrix(c(1,1,1,-1,1,2),3,2,1)
b=matrix(c(2,0,1),3,1)
x=solve(t(A)%*%A)%*%t(A)%*%b
Norm(A%*%x-b)

f= function(x) {
y=Norm(A%*%x-b);
return(y);
}

optim(x,f)

$par
[,1]
[1,] 0.7142857
[2,] 0.4285714

$value
[1] 1.069045
Basis of a Vector Space
Basis: The linearly set of independent vectors 𝒃𝑖 , 𝑖 = 1,2, … , 𝑘, in the vector space 𝑽 that every other vector 𝒙 ∈ 𝑽 is a
linear combination vectors from the basis and every linear combination is unique.

𝑘 𝑘

𝒙 = ෍ 𝜆𝑖 𝒃𝑖 = ෍ 𝛽𝑖 𝒃𝑖 ⇒ 𝜆𝑖 = 𝛽𝑖
1 1
Determinants
Determinants
Determinant: The determinant of the symmetric matrix 𝑨 ∈ ℝ𝑛×𝑛 is a recursive function that maps 𝑨 into a real number by using
Laplace Expansion:
𝑎11 𝑎12 . . . 𝑎1𝑛
𝑎21 𝑎12 . . . 𝑎2𝑛
. . ... .
det 𝑨 = 𝑨 = . . ... . In R use: det(A)
. . ... .
𝑎𝑛1 𝑎𝑛2 . . . 𝑎𝑛𝑛
Laplace Expansion: For all 𝒋 = 𝟏, … , 𝒏

• 𝑨 = σ𝑛𝑘=1 −1 𝑘+𝑗
𝑎𝑘𝑗 𝑨𝒌,𝒋 (expansion along column j)

• 𝑨 = σ𝑛𝑘=1 −1 𝑘+𝑗
𝑎𝑗𝑘 𝑨𝒋,𝒌 (expansion along row j)

𝑨𝒌,𝒋 ∈ ℝ(𝑛−1)×(𝑛−1) is a submatrix of 𝑨 that we obtain by deleting row 𝑘 and column 𝑗.

Remark: Using Laplace expansion along either the first row or the first column, it is not too difficult to verify:

• If 𝑨 ∈ ℝ1×1 then 𝑨 = 𝑎11 = 𝑎11


𝑎11 𝑎12
• If 𝑨 ∈ ℝ2×2 then 𝑨 = 𝑎 𝑎22 = 𝑎11 𝑎12 − 𝑎21 𝑎12
21

• The determinant of a diagonal matrix is the product of the elements on its main diagonal entries.
Determinants
Example
1 2 3
Compute the determinant of 𝑨 = 3 1 2 .
0 0 1

Solution. Using Laplace expansion along the first row, we have

1+1 1 2 1+2 3 2 1+3 3 1


𝐴 = −1 1 + −1 2 + −1 3 = 1 1 − 0 − 2 3 − 0 + 3 0 = −5
0 1 0 1 0 0

Remark: 𝐴 gives n-dimensional volume of a n-dimensional parallelepiped made by the column vectors of 𝐴. If 𝐴 = 0, then this
parallelotope has a zero volume in n dimensions. or it is not n-dimensional, which indicates that the dimension of the image of 𝐴 is
less than n (we say the rank of 𝐴 is less than n).
Determinants
Properties of Determinant
1. 𝑨𝑩 = 𝑨 𝑩
2. 𝑨 = 𝑨𝑻
𝟏
3. 𝑨−𝟏 = 𝑨

4. Adding a multiple of a column/row to another does not change 𝑨

5. Multiplication of a column/row with 𝜆 ∈ ℝ scales 𝑨 by 𝜆. In particular 𝜆𝑨 = 𝜆𝑛 𝑨

6. Swapping two rows/columns changes the sign of 𝑨

7. Determinant of any diagonal matrix is the product of the elements on its main diagonal entries.

8. Similar matrices have the same determinant


o Two matrices 𝑨, 𝑫 ∈ ℝ𝑛×𝑛 are similar if there exists an invertible matrix 𝑷 ∈ ℝ𝑛×𝑛 with 𝑫 = 𝑷−1 𝑨𝑷
1
o Using the definition: 𝑫 = 𝑷−1 𝑨𝑷 = 𝑷−1 𝑨 𝑷 = 𝑨 𝑷 = 𝑨
𝑷

Theorem: 𝑨 ∈ ℝ𝑛×𝑛 is invertible and full-rank, i.e., rank 𝑨 = 𝑛, if and only if 𝑨 ≠ 𝟎


Determinants
Example
1 2 3
Compute the determinant of 𝑨 = 3 1 2 (this time by using determinant properties).
0 0 1
1
Solution. Our strategy is to use determinant properties to change the first column to 0 .
0
To do so, adding -3 times row 1 to row 3 gives:

1 2 3 −3𝑅1 +𝑅2 1 2 3
3 1 2 0 −5 −7
0 0 1 0 0 1
Now expanding across column 1 is very easy:

1+1 −5 −7
𝐴 = 1 1 + 0 + 0 = −5
0 1

This approach is especially helpful for obtaining the determinants for higher dimensional matrices.
Eigenvalues and Eigenvectors
Eigenvalues and Eigenvectors
Definition: 𝜆 ∈ ℝ is an eigenvalue of 𝑨 ∈ ℝ𝑛×𝑛 and 𝒙 ∈ ℝ𝑛 \{𝟎} is the corresponding eigenvector of 𝑨 if:

𝑨𝒙 = 𝜆𝒙 In R use: eigen(A)
The above equation is known as the eigenvalue equation.

Remark: The following statements are equivalent:

• 𝜆 ∈ ℝ is an eigenvalue of 𝑨 ∈ ℝ𝑛×𝑛

• There exists 𝒙 ∈ ℝn \{𝟎} with 𝑨𝒙 = 𝜆𝒙, or equivalently (𝑨 − 𝜆𝑰)𝒙 = 𝟎, can be solved non-trivially, i.e., 𝒙 ≠ 𝟎.

• rank 𝑨 − 𝜆𝑰 < 𝑛

• det 𝑨 − 𝜆𝑰 = 𝟎 (𝑨 − 𝜆𝑰) is called singular, i.e., meaning that it is not invertible.

• Remark: 𝒑𝑨 𝜆 ≡ det 𝑨 − 𝜆𝑰 is also known as the Characteristic Polynomial


Eigenvalues and Eigenvectors
Properties of Eigenvalues and Eigenvectors

Theorem (non-uniqueness of eigenvector): If 𝒙 is an eigenvector of 𝑨 associated with the eigenvalue 𝜆, then for any
𝑐 ≠ 0, 𝑐𝒙 is also an eigenvector of 𝑨 with the same eigenvalue.
𝑨 𝑐𝒙 = 𝑐𝑨𝒙 = 𝑐𝜆𝒙 = 𝜆 𝑐𝒙

Theorem: 𝜆 ∈ ℝ is an eigenvalue of 𝑨 ∈ ℝ𝑛×𝑛 if and only if 𝜆 is a root of the characteristic polynomial of 𝑨


𝒑𝑨 𝜆 ≡ 𝑑𝑒𝑡 𝑨 − 𝜆𝑰 = 𝟎

Other properties:

• 𝑨 and 𝑨𝑻 have the same eigenvalues but not necessarily the same eigenvectors.

• Similar matrices have the same eigenvalues.

• Symmetric positive definite matrices always have positive eigenvalues.

• Determinant of a matrix is equal to the product of its eigenvalues.


Eigenvalues and Eigenvectors
Example
4 2
Find the eigenvalues and the eigenvectors of 𝑨 = .
1 3
Solution.

Step 1: eigenvalues
4 2 𝜆 0 4−𝜆 2
𝒑𝑨 𝜆 ≡ det 𝑨 − 𝜆𝑰 = det − = = 4 − 𝜆 3 − 𝜆 − 2 = 0 ⇒ 𝜆 = 2, 𝜆 = 5
1 3 0 𝜆 1 3−𝜆

4−𝜆 2
Step 2: eigenvectors corresponding to each eigenvalue: 𝒙=𝟎
1 3−𝜆

4−5 2 𝑥1 −1 2 𝑥1 𝑥1 2
If 𝜆 = 5 ⇒ 𝑥 =𝟎⇒ 𝑥 =𝟎⇒ 𝑥 =
1 3−5 2 1 −2 2 2 1

4−2 2 𝑥1 2 2 𝑥1 𝑥1 1
If 𝜆 = 2 ⇒ =𝟎⇒ =𝟎⇒ 𝑥 =
1 3 − 2 𝑥2 1 1 𝑥2 2 −1
Matrix Decomposition
(Important tool for obtaining complex computations, e.g., 𝑨, 𝑨−3.456 , 𝑒 𝑨 , and many other results)
Matrix Decomposition
Eigendecomposition and Diagonalization

Similar matrices: Two matrices 𝑨, 𝑫 ∈ ℝ𝑛×𝑛 are similar if there exists an invertible matrix 𝑷 ∈ ℝ𝑛×𝑛 with 𝑫 = 𝑷−𝟏 𝑨𝑷

Diagonal Matrix: A matrix 𝑫 ∈ ℝ𝑛×𝑛 is diagonal if 𝑑𝑖𝑗 = 0, ∀ 𝑖 ≠ 𝑗

Diagonalizable Matrix: A matrix 𝑨 ∈ ℝ𝑛×𝑛 is diagonalizable if it is similar to a diagonal matrix, i.e., if there exists a
diagonal matrix 𝑫 and an invertible matrix 𝑷 ∈ ℝ𝑛×𝑛 such that 𝑫 = 𝑷−𝟏 𝑨𝑷.

Theorem (Eigendecomposition): A square matrix 𝑨 ∈ ℝ𝑛×𝑛 can be factored into


𝑨 = 𝑷𝑫𝑷−𝟏

where 𝑷 ∈ ℝ𝑛×𝑛 is a matrix whose columns are the eigenvectors of 𝑨 and 𝑫 is a diagonal matrix whose diagonal
entries are eigenvalues of 𝑨.
Matrix Decomposition
Eigendecomposition

Proof.

𝑨 ∈ ℝ𝑛×𝑛 is diagonalizable if it is similar to a diagonal matrix 𝑫, i.e., if there exists 𝑷 ∈ ℝ𝑛×𝑛 such that 𝑫 = 𝑷−𝟏 𝑨𝑷, which is
the same as 𝑨𝑷 = 𝑷𝑫. Let 𝑫 be a diagonal matrix with the eigenvalues 𝜆𝑗 , 𝑗 = 1, … , 𝑛 on its main diagonal entries and 𝑷 =
𝒑𝟏 , … , 𝒑𝒏 . Then:

𝑨𝑷 = 𝑨 𝒑𝟏 , … , 𝒑𝒏 = 𝑨𝒑𝟏 , … , 𝑨𝒑𝒏 .

𝜆1 ⋯ 0
𝑷𝑫 = 𝒑𝟏 , … , 𝒑𝒏 ⋮ ⋱ ⋮ = 𝜆1 𝒑𝟏 , … , 𝜆𝑛 𝒑𝒏 .
0 ⋯ 𝜆𝑛

This implies that 𝑨𝒑𝟏 , … , 𝑨𝒑𝒏 = 𝜆1 𝒑𝟏 , … , 𝜆𝑛 𝒑𝒏 or 𝑨𝒑𝒋 = 𝜆1 𝒑𝒋 . Therefore, 𝒑𝒋 must be an eigenvector corresponding to 𝜆𝑗 .
Matrix Decomposition
Real Powers of a Matrix
Remark: For 𝑨 ∈ ℝ𝑛×𝑛 , we can see:
𝑨2 = 𝑨 × 𝑨 = 𝑷𝑫𝑷−1 𝑷𝑫𝑷−1 = 𝑷𝑫 𝑷−1 𝑷 𝑫𝑷−1 = 𝑷𝑫𝑰𝑫𝑷−1 = 𝑷𝑫𝑫𝑷−1 = 𝑷𝑫2 𝑷−1

𝑨3 = 𝑨2 × 𝑨 = 𝑷𝑫2 𝑷−1 𝑷𝑫𝑷−1 = 𝑷𝑫2 𝑷−1 𝑷 𝑫𝑷−1 = 𝑷𝑫2 𝑰𝑫𝑷−1 = 𝑷𝑫2 𝑫𝑷−1 = 𝑷𝑫3 𝑷−1

Continuing this way, we can verify that


𝜆1𝑘 ⋯ 0
𝑘 𝑘 −1
𝑨 = 𝑷𝑫 𝑷 =𝑷 ⋮ ⋱ ⋮ 𝑷−1
0 ⋯ 𝜆𝑘𝑛

It can be shown that the above result holds generally for any 𝑘 ∈ ℝ not just integer values. This result, which is based
on matrix decomposition, is extremely important in finding 𝑨𝑘 when 𝑘 is a very large number or when it is a real
number (e.g., 𝑨 , 𝑨−3.21 , …) in which case the direct approach is not applicable.
Matrix Decomposition
Exponential and Logarithm of a Matrix
Definition: For a matrix 𝑨 ∈ ℝ𝑛×𝑛 , the exponential of 𝑨 is defined by the Taylor expansion of 𝑒 on 𝑨 as:

𝑨
𝑨2 𝑨3
𝑒 =𝑰+𝑨+ + +⋯
2! 3!

Theorem: For a diagonalizable matrix 𝑨 ∈ ℝ𝑛×𝑛 , we have


𝑒 𝜆1 ⋯ 0
𝑨 𝑫 −1
𝑒 = 𝑷𝑒 𝑷 =𝑷 ⋮ ⋱ ⋮ 𝑷−1
0 ⋯ 𝑒 𝜆𝑘

Proof.
𝑨2 𝑨3
𝑒𝑨 = 𝑰 + 𝑨 + + +⋯
2! 3!
𝜆12
1 + 𝜆1 + +⋯ ⋯ 0
𝑷𝑫2 𝑷−1 𝑫2 𝑫3 2! 𝑒 𝜆1 ⋯ 0
=𝑰+ 𝑷𝑫𝑷−1 + +⋯=𝑷 𝑰+𝑫+ + + ⋯ 𝑷−1 = 𝑷 ⋮ ⋱ ⋮ 𝑷−1 =𝑷 ⋮ ⋱ ⋮ 𝑷−1
2! 2! 3!
𝜆2𝑛 0 ⋯ 𝑒 𝜆𝑘
0 ⋯ 1 + 𝜆𝑛 + +⋯
2!
Matrix Decomposition
Exponential and Logarithm of a Matrix
Theorem: For a diagonalizable matrix 𝑨 ∈ ℝ𝑛×𝑛 , we have
𝑙𝑛 𝜆1 ⋯ 0
𝑙𝑛 𝑨 = 𝑷 ⋮ ⋱ ⋮ 𝑷−1
0 ⋯ 𝑙𝑛 𝜆1

Proof. It is enough to show that the above formula satisfies 𝑒 𝑙𝑛 𝑨 = 𝑙𝑛 𝑒 𝑨 = 𝑨. We show only 𝑒 𝑙𝑛 𝑨 = 𝑰 as showing the other is very similar.

𝑙𝑛 𝜆1 ⋯ 0 2
(𝑙𝑛 𝑨)2 (𝑙𝑛 𝑨)3 𝟏 𝑙𝑛 𝜆1 ⋯ 0
𝑙𝑛 𝑨 ⋮ 𝑷−1 + 𝑷
𝑒 = 𝑰 + 𝑙𝑛 𝑨 + + +⋯=𝑰+𝑷 ⋮ ⋱ ⋮ ⋱ ⋮ 𝑷−1 + ⋯ =
2! 3! 0 ⋯ 𝑙𝑛 𝜆1 2! 2
0 ⋯ 𝑙𝑛 𝜆𝑛

1 2
𝑙𝑛 𝜆1 + 𝑙𝑛 𝜆1 +⋯ ⋯ 0 𝜆1 ⋯ 0
2! 𝑒 𝑙𝑛 𝜆1 ⋯ 0
−1 −1 ⋮ ⋱ ⋮ 𝑷−1 = 𝑷𝑫𝑷−1 = 𝑨
=𝑷 ⋮ ⋱ ⋮ 𝑷 =𝑷 ⋮ ⋱ ⋮ 𝑷 = 𝑷
1 2 0 ⋯ 𝑒 𝑙𝑛 𝜆𝑛 0 ⋯ 𝜆𝑛
0 ⋯ 𝑙𝑛 𝜆𝑛 + 𝑙𝑛 𝜆𝑛 +⋯
2!
Matrix Decomposition
Example
4 2
If 𝑨 = , find the following values:
1 3
a. 𝑨
b. 𝑨𝑒 (𝑒 ≈ 2.718: the Euler's constant).
c. 𝑒𝑨

2 0 2 1 1 2 1 2 0 1 1
Solution. From the previous example’s solution, we have 𝑫 = ,𝑷 = ⇒ 𝑨 = 𝑷𝑫𝑷−1 = 3
0 5 1 −1 1 −1 0 5 1 −2

1 1
1 2 1 2 0 1 1 1.96 0.55
a. 𝑨 = 𝑨2 = 𝑷𝑫2 𝑷−1 = 3 = .
1 −1 0 5 1 −2 0.27 1.69

1 2 1 2𝑒 0 1 1 55.15 48.57
b. 𝑨𝑒 = 𝑷𝑫𝑒 𝑷−1 = 3 𝑒 1 −2 = 24.27 30.86
1 −1 0 5

1 2 1 𝑒2 0 1 1 101.41 94.02
c. 𝑒 𝑨 = 𝑷𝑒 𝑫 𝑷−1 = 3 =
1 −1 0 𝑒5 1 −2 47.01 54.40
y$vectors%*%diag(exp(y$values))%*%matrix.inverse(y$vectors)
Matrix Decomposition
Relationship between Eigenvalues and Determinant

Theorem: Let 𝜆1 , … , 𝜆𝑛 be the eigenvalues of the matrix 𝑨 ∈ ℝ𝑚×𝑛 . Then

𝑨 = ෑ 𝜆𝑖 = 𝜆1 × 𝜆2 × ⋯ × 𝜆𝑛
𝑖=1

In addition, if 𝑨 is singular (i.e., 𝑨 = 0) then it has at least an eigenvalue, which is zero.

Proof.
From the eigendecomposition of 𝑨, we know that 𝑨 = 𝑷𝑫𝑷−1 where 𝑷 is the matrix of eigenvectors and 𝑫 is the
diagonal matrix whose main diagonal entries are the eigenvalues. Taking the determinant gives:
𝑨 = 𝑷𝑫𝑷−1 = 𝑷 𝑫 𝑷−1 = 𝑷 𝑫 𝑷 −1
= 𝑫 = 𝜆1 × 𝜆2 × ⋯ × 𝜆𝑛 .
If 𝐀 is singular, then 𝑨 = ς𝑛𝑖=1 𝜆𝑖 = 0. Hence, one of the eigenvalues is at least zero.
Matrix Norms
Norm of a matrix: The definition of the corresponding norm for an 𝑛 × 𝑛 matrix 𝑨 ∈ ℝ𝑛×𝑚 is

𝑨𝒙
𝑨 = max = max 𝑨𝒙 ∀𝒙 ∈ ℝ𝑛 .
𝒙≠𝟎 𝒙 𝒙 =𝟏

Where ⋅ is the Euclidean norm.

Remark: From the above, it follows that:


𝑨𝒙 ≤ 𝑨 𝒙 .
Remark: it can be shown that
𝑨 = 𝜆𝑚𝑎𝑥
where 𝜆𝑚𝑎𝑥 is the maximum eigenvalue of 𝑨 ∈ ℝ𝑛×𝑚 .
(The proof needs additional discussion about orthonormal eigenvector bases…)
Positive Definite Matrices
Definition: a symmetric 𝑨 ∈ ℝ𝑛×𝑛 is called positive semidefinite if and only if
𝒙𝑻 𝑨𝒙 ≥ 𝟎 ∀𝒙 ∈ ℝ𝑛

And positive definite if and only if


𝒙𝑻 𝑨𝒙 > 𝟎 ∀𝒙 ≠ 𝟎 ∈ ℝ𝑛

Theorem: 𝑨 ∈ ℝ𝑛×𝑛 is positive semidefinite if and only if all its eigenvalues are greater than or equal to zero.

Proof.
By definition, we have 𝒙𝑻 𝑨𝒙 ≥ 𝟎 ∀𝒙 ∈ ℝ𝑛 . Choose 𝒙 to be any of the eigenvectors of 𝑨 with 𝜆 the corresponding eigenvalue to 𝒙.
Hence, we have
2
𝒙𝑻 𝑨𝒙 = 𝒙𝑻 𝑨𝒙 = 𝒙𝑻 𝜆𝒙 = 𝜆𝒙𝑻 𝒙 = 𝜆 𝒙 2 ≥0
2
Since 𝒙𝑻 𝒙 = 𝒙 2 > 0 for any 𝒙 ≠ 𝟎 (it is the Euclidean or 𝑳2 norm), we must have 𝜆 ≥ 0.

Remark: From above, it follows that if 𝑨 ∈ ℝ𝑛×𝑛 is positive semidefinite then 𝑨 ≥ 0 because 𝑨 = ς𝑛𝑖=1 𝜆𝑖 .
Positive Definite Matrices
Theorem: For a matrix 𝑨 ∈ ℝ𝑚×𝑛 we can always obtain a symmetric positive semidefinite matrix 𝑺 ∈ ℝ𝑛×𝑛 by defining
𝑺 = 𝑨𝑻 𝑨.

Proof.
𝑻 𝑻
Symmetry requires that 𝑺 = 𝑺𝑻 . We have 𝑺𝑻 = 𝑨𝑻 𝑨 = 𝑨𝑻 𝑨𝑻 = 𝑨𝑻 𝑨.
2
By definition, a PSD matrix, we have 𝒙𝑻 𝑺𝒙 = 𝒙𝑻 𝑨𝑻 𝑨𝒙 = 𝒙𝑨 𝑻
𝒙𝑨 = 𝒙𝑨 2 ≥ 𝟎.

You might also like