Module 3: Supplementary Slides
(These additional materials are optional and intended for students who are interested)
Vectors
More Vector Arithmetic
m%%n Modulo operator (gives the remainder of m/n)
%/% Integer division (gives the integer part of m/n)
%*% Matrix multiplication (to be studied later)
%in% Returns TRUE if the left operand occurs in its right operand; FALSE otherwise
> 14%%5
[1] 4
> 14%%5
[1] 2
> 5%in%14
[1] FALSE
> 5%in%c(5,4)
[1] TRUE
General Norm of a Vector
Norm of a Vector
Definition: a norm for a vector 𝒙 ∈ ℝ𝑛 is a function 𝒙 : ℝ𝒏 → ℝ+ that satisfies the following properties:
1. 𝒙 ≥ 0 and 𝒙 = 0 ⟺ 𝒙 = 0. (Positive definiteness)
2. 𝜆𝒙 = 𝜆 𝒙 , ∀𝜆 ∈ ℝ (Homogeneity)
3. 𝒙+𝒚 ≤ 𝒙 + 𝒚 (Triangular inequality)
A norm is a function that assigns a length to a vector. To compute the distance between two vectors, we calculate the
norm of the difference between those two vectors. For example, the distance between two column vectors 𝒙 ∈ ℝ𝒏 and
𝒚 ∈ ℝ𝒏 using the Euclidean norm is
𝒙−𝒚 = 𝑥1 − 𝑦1 2 + 𝑥2 − 𝑦2 2 + ⋯ + 𝑥𝑛 − 𝑦𝑛 2 = 𝒙 − 𝒚 𝑻 (𝒙 − 𝒚)
Common Norms
𝑳𝒑 norm is a family of commonly used norms for vectors 𝒙 ∈ ℝ𝒏 that are determined by a scalar 𝑝 ≥ 1 as:
𝒑
𝒙 = 𝑥1 𝑝 + 𝑥2 𝑝 + ⋯ + 𝑥𝑛 𝑝
𝑝
Examples:
• 𝑳𝟏 norm: 𝒙 1 = 𝑥1 + 𝑥2 + ⋯ + 𝑥𝑛 (Manhattan/ City-block norm)
• 𝑳𝟐 norm: 𝒙 2 = 𝑥12 + 𝑥22 + ⋯ + 𝑥𝑛2 = 𝒙𝑻 𝒙 (Euclidean norm: we use only this)
𝑝
• 𝑳∞ norm: 𝒙 ∞ = 𝑙𝑖𝑚 𝑥1 𝑝 + 𝑥2 𝑝 + ⋯ + 𝑥𝑛 𝑝 = 𝑚𝑎𝑥 𝑥1 , 𝑥2 , … , 𝑥𝑛 (Maximum norm)
𝑝→∞
Angle between Two Vectors
(i) If 𝒖 and 𝒗 are two unit vectors, then 𝒖 ⋅ 𝑼 = 𝑐𝑜𝑠𝜃
𝒂⋅𝒃
(ii) Cosine Formula: If 𝒂 and 𝒃 are two nonzero vectors then = 𝑐𝑜𝑠𝜃
𝒂 𝒃
(iii) Schwartz Inequality: If 𝒂 and 𝒃 are two nonzero vectors then 𝒂 ⋅ 𝒃 ≤ 𝒂 𝒃
Angle between Two Vectors
Part (i): First, consider 𝒖 = (𝑐𝑜𝑠𝜃, 𝑠𝑖𝑛𝜃) and 𝑼 = 𝒊 = (1,0). Then, clearly 𝒖 ⋅ 𝑼 = 𝑐𝑜𝑠𝜃. After a rotation through any
angle 𝛼 these are still unit vectors. Call the vectors 𝒖 = (𝑐𝑜𝑠𝛽, 𝑠𝑖𝑛𝛽) and 𝑼 = (𝑐𝑜𝑠𝛼, 𝑠𝑖𝑛𝛼). Their dot product is
𝑐𝑜𝑠𝛼𝑐𝑜𝑠𝛽 + 𝑠𝑖𝑛𝛼𝑠𝑖𝑛𝛽 = 𝑐𝑜𝑠(𝛽 − 𝛼). Since 𝛽 − 𝛼 equals 𝜃, we have reached the formula 𝒖 ⋅ 𝑼 = 𝑐𝑜𝑠𝜃.
Parts (ii) and (iii) are immediate, following Part (i)
Other Properties of Matrices
Matrix Multiplication
Block Matrices and Block Multiplication
The elements of 𝑨 can be cut into blocks, which are smaller matrices. If the cuts between columns of 𝑨 match the cuts
between rows of 𝑩, then block multiplication is allowed.
𝑨𝟏𝟏 𝑨𝟏𝟐 𝑩𝟏𝟏 𝑩𝟏𝟐 𝑨 𝑩 + 𝑨𝟏𝟐 𝑩𝟏𝟐 𝑨𝟏𝟏 𝑩𝟏𝟐 + 𝑨𝟏𝟐 𝑩𝟐𝟐
= 𝟏𝟏 𝟏𝟏
𝑨𝟐𝟏 𝑨𝟐𝟐 𝑩𝟐𝟏 𝑩𝟐𝟐 𝑨𝟐𝟏 𝑩𝟏𝟏 + 𝑨𝟐𝟐 𝑩𝟐𝟏 𝑨𝟐𝟏 𝑩𝟏𝟐 + 𝑨𝟐𝟐 𝑩𝟐𝟐
Example:
1 3 0 0 1 3 0 0 7 12 3 16
2 3 1 6 2 3 1 6 5 26 4 24
=
−3 −1 1 0 −3 −1 1 0 −8 −13 0 −6
0 2 0 1 0 2 0 1 4 8 2 13
1 3 1 3 0 0 −3 −1 7 12
+ =
2 3 2 3 1 6 0 2 5 26
Linear Equations
𝑎11 𝑥1 + 𝑎12 𝑥2 + ⋯ + 𝑎1𝑛 𝑥𝑛 = 𝑏1
𝑎21 𝑥1 + 𝑎22 𝑥2 + ⋯ + 𝑎2𝑛 𝑥𝑛 = 𝑏2
⋮ ⇔ 𝑨𝒙 = 𝒃
𝑎𝑚1 𝑥1 + 𝑎𝑚2 𝑥2 + ⋯ + 𝑎𝑚𝑛 𝑥𝑛 = 𝑏𝑚
• 𝑨 = 𝑎𝑖𝑗 = Coefficient matrix
𝑚×𝑛
• 𝒙 = 𝑥𝑗 = Variable vector
𝑛×1
• 𝒃 = 𝑏𝑗 = Vector of right hand side
𝑚×1
The product 𝑨𝒙 is the combination of columns of 𝑨. Hence, the system has solution if 𝒃 is inside the spanned space of
the columns of 𝑨:
𝑎11 𝑎12 𝑎1𝑛 𝑏1
𝑎21 𝑎22 𝑎2𝑛 𝑏2
. . . .
𝑥1 . + 𝑥2 . + ⋯ + 𝑥𝑛 . = .
. . . .
𝑎𝑚1 𝑎𝑚2 𝑎𝑚𝑛 𝑏𝑚
Homogeneous systems
Homogeneous System: The system
𝑎11 𝑥1 + 𝑎12 𝑥2 + ⋯ + 𝑎1𝑛 𝑥𝑛 = 𝑏1
𝑎21 𝑥1 + 𝑎22 𝑥2 + ⋯ + 𝑎2𝑛 𝑥𝑛 = 𝑏2
𝑨𝑚×𝑛 𝒙𝑛×1 = 𝒃𝑚×1 ⟺ ⋮
𝑎𝑚1 𝑥1 + 𝑎𝑚2 𝑥2 + ⋯ + 𝑎𝑚𝑛 𝑥𝑛 = 𝑏𝑚
is called a homogeneous if 𝑏1 = 𝑏2 ⋯ = 𝑏𝑚 = 0. The system is non-homogeneous if at least one of the 𝑏𝑖′ 𝑠 is not 0.
1. if 𝑘 < 𝑛, then the columns of 𝑨 are linearly dependent, i.e., 𝑘 columns are independents and 𝑛 − 𝑘 columns can
be written as a linear combination of the other 𝑘 columns.
2. if 𝑘 = 𝑛, then the columns of 𝑨 are linearly independent, i.e., no column can be written as a linear combination of
other columns.
Moore-Penrose Pseudo Inverse
Moore-Penrose Pseudo Inverse: When 𝑘 = 𝑛 ≤ 𝑚 the system of linear equations 𝑨𝒙 = 𝒃 can have no solution. In that case, we
can resort to an approximation by using a least square in which we determine the best vector 𝒙 that minimizes the sum of square
of errors 𝑨𝒙 − 𝒃 𝟐
= 𝑨𝒙 − 𝒃 𝑻
𝑨𝒙 − 𝒃 . The best fit 𝒙 is obtained as
−𝟏 𝑻
𝒙 = 𝑨𝑻 𝑨 𝑨 𝒃
−𝟏 𝑻
Note that 𝑨𝑻 𝑨 is invertible because it is square matrix. 𝑨𝑻 𝑨 𝑨 is sometimes called Moore-Penrose Pseudo Inverse.
The minimum Euclidean norm 𝑨𝒙 − 𝒃 (i.e., the minimum of squares
of errors) occurs at a point 𝐱 that satisfies: 𝑨𝒙 ⊥ 𝑨𝒙 − 𝒃 :
𝑻
⇒ 𝑨𝒙 ⋅ 𝑨𝒙 − 𝒃 = 0 ⇒ 𝑨𝒙 ⋅ 𝑨𝒙 − 𝒃 = 0 ⇒ 𝒙𝑻 𝑨𝑻 𝑨𝒙 − 𝑨𝑻 𝒃 = 0
−𝟏 𝑻
⇒ 𝑨𝑻 𝑨𝒙 − 𝑨𝑻 𝒃 = 0 ⇒ 𝒙 = 𝑨𝑻 𝑨 𝑨 𝒃
Example
Moore-Penrose Pseudo Inverse
Solve the system by finding the inverse of the coefficient matrix.
𝑥+𝑦 =2
ቐ 𝑥−𝑦 =0
𝑥 + 2𝑦 = 1
1 1 2 1 1 𝑥 2 1 1 2
𝑨 = 1 −1 , 𝒃 = 0 ⇒ 1 −1 𝑦 = 0 ⇒ 𝑥 1 + 𝑦 −1 = 0 ⇒ no solution
1 2 1 1 2 1 1 2 1
−𝟏
𝑥 −𝟏 1 1 2
1 1 1 1 1 1 0.71
𝒙 = 𝑦 = 𝑨𝑻 𝑨 𝑻
𝑨 𝒃= 1 −1 0 =
1 −1 2 1 −1 2 0.43
1 2 1
1 1 𝑥 1 1 1.14
𝑨𝒙 = 1 −1 𝑦 = 𝑥 1 + 𝑦 −1 = 0.29
1 2 1 2 1.57
Example
Moore-Penrose Pseudo Inverse
library(pracma)
A=matrix(c(1,1,1,-1,1,2),3,2,1)
b=matrix(c(2,0,1),3,1)
x=solve(t(A)%*%A)%*%t(A)%*%b
Norm(A%*%x-b)
f= function(x) {
y=Norm(A%*%x-b);
return(y);
}
optim(x,f)
$par
[,1]
[1,] 0.7142857
[2,] 0.4285714
$value
[1] 1.069045
Basis of a Vector Space
Basis: The linearly set of independent vectors 𝒃𝑖 , 𝑖 = 1,2, … , 𝑘, in the vector space 𝑽 that every other vector 𝒙 ∈ 𝑽 is a
linear combination vectors from the basis and every linear combination is unique.
𝑘 𝑘
𝒙 = 𝜆𝑖 𝒃𝑖 = 𝛽𝑖 𝒃𝑖 ⇒ 𝜆𝑖 = 𝛽𝑖
1 1
Determinants
Determinants
Determinant: The determinant of the symmetric matrix 𝑨 ∈ ℝ𝑛×𝑛 is a recursive function that maps 𝑨 into a real number by using
Laplace Expansion:
𝑎11 𝑎12 . . . 𝑎1𝑛
𝑎21 𝑎12 . . . 𝑎2𝑛
. . ... .
det 𝑨 = 𝑨 = . . ... . In R use: det(A)
. . ... .
𝑎𝑛1 𝑎𝑛2 . . . 𝑎𝑛𝑛
Laplace Expansion: For all 𝒋 = 𝟏, … , 𝒏
• 𝑨 = σ𝑛𝑘=1 −1 𝑘+𝑗
𝑎𝑘𝑗 𝑨𝒌,𝒋 (expansion along column j)
• 𝑨 = σ𝑛𝑘=1 −1 𝑘+𝑗
𝑎𝑗𝑘 𝑨𝒋,𝒌 (expansion along row j)
𝑨𝒌,𝒋 ∈ ℝ(𝑛−1)×(𝑛−1) is a submatrix of 𝑨 that we obtain by deleting row 𝑘 and column 𝑗.
Remark: Using Laplace expansion along either the first row or the first column, it is not too difficult to verify:
• If 𝑨 ∈ ℝ1×1 then 𝑨 = 𝑎11 = 𝑎11
𝑎11 𝑎12
• If 𝑨 ∈ ℝ2×2 then 𝑨 = 𝑎 𝑎22 = 𝑎11 𝑎12 − 𝑎21 𝑎12
21
• The determinant of a diagonal matrix is the product of the elements on its main diagonal entries.
Determinants
Example
1 2 3
Compute the determinant of 𝑨 = 3 1 2 .
0 0 1
Solution. Using Laplace expansion along the first row, we have
1+1 1 2 1+2 3 2 1+3 3 1
𝐴 = −1 1 + −1 2 + −1 3 = 1 1 − 0 − 2 3 − 0 + 3 0 = −5
0 1 0 1 0 0
Remark: 𝐴 gives n-dimensional volume of a n-dimensional parallelepiped made by the column vectors of 𝐴. If 𝐴 = 0, then this
parallelotope has a zero volume in n dimensions. or it is not n-dimensional, which indicates that the dimension of the image of 𝐴 is
less than n (we say the rank of 𝐴 is less than n).
Determinants
Properties of Determinant
1. 𝑨𝑩 = 𝑨 𝑩
2. 𝑨 = 𝑨𝑻
𝟏
3. 𝑨−𝟏 = 𝑨
4. Adding a multiple of a column/row to another does not change 𝑨
5. Multiplication of a column/row with 𝜆 ∈ ℝ scales 𝑨 by 𝜆. In particular 𝜆𝑨 = 𝜆𝑛 𝑨
6. Swapping two rows/columns changes the sign of 𝑨
7. Determinant of any diagonal matrix is the product of the elements on its main diagonal entries.
8. Similar matrices have the same determinant
o Two matrices 𝑨, 𝑫 ∈ ℝ𝑛×𝑛 are similar if there exists an invertible matrix 𝑷 ∈ ℝ𝑛×𝑛 with 𝑫 = 𝑷−1 𝑨𝑷
1
o Using the definition: 𝑫 = 𝑷−1 𝑨𝑷 = 𝑷−1 𝑨 𝑷 = 𝑨 𝑷 = 𝑨
𝑷
Theorem: 𝑨 ∈ ℝ𝑛×𝑛 is invertible and full-rank, i.e., rank 𝑨 = 𝑛, if and only if 𝑨 ≠ 𝟎
Determinants
Example
1 2 3
Compute the determinant of 𝑨 = 3 1 2 (this time by using determinant properties).
0 0 1
1
Solution. Our strategy is to use determinant properties to change the first column to 0 .
0
To do so, adding -3 times row 1 to row 3 gives:
1 2 3 −3𝑅1 +𝑅2 1 2 3
3 1 2 0 −5 −7
0 0 1 0 0 1
Now expanding across column 1 is very easy:
1+1 −5 −7
𝐴 = 1 1 + 0 + 0 = −5
0 1
This approach is especially helpful for obtaining the determinants for higher dimensional matrices.
Eigenvalues and Eigenvectors
Eigenvalues and Eigenvectors
Definition: 𝜆 ∈ ℝ is an eigenvalue of 𝑨 ∈ ℝ𝑛×𝑛 and 𝒙 ∈ ℝ𝑛 \{𝟎} is the corresponding eigenvector of 𝑨 if:
𝑨𝒙 = 𝜆𝒙 In R use: eigen(A)
The above equation is known as the eigenvalue equation.
Remark: The following statements are equivalent:
• 𝜆 ∈ ℝ is an eigenvalue of 𝑨 ∈ ℝ𝑛×𝑛
• There exists 𝒙 ∈ ℝn \{𝟎} with 𝑨𝒙 = 𝜆𝒙, or equivalently (𝑨 − 𝜆𝑰)𝒙 = 𝟎, can be solved non-trivially, i.e., 𝒙 ≠ 𝟎.
• rank 𝑨 − 𝜆𝑰 < 𝑛
• det 𝑨 − 𝜆𝑰 = 𝟎 (𝑨 − 𝜆𝑰) is called singular, i.e., meaning that it is not invertible.
• Remark: 𝒑𝑨 𝜆 ≡ det 𝑨 − 𝜆𝑰 is also known as the Characteristic Polynomial
Eigenvalues and Eigenvectors
Properties of Eigenvalues and Eigenvectors
Theorem (non-uniqueness of eigenvector): If 𝒙 is an eigenvector of 𝑨 associated with the eigenvalue 𝜆, then for any
𝑐 ≠ 0, 𝑐𝒙 is also an eigenvector of 𝑨 with the same eigenvalue.
𝑨 𝑐𝒙 = 𝑐𝑨𝒙 = 𝑐𝜆𝒙 = 𝜆 𝑐𝒙
Theorem: 𝜆 ∈ ℝ is an eigenvalue of 𝑨 ∈ ℝ𝑛×𝑛 if and only if 𝜆 is a root of the characteristic polynomial of 𝑨
𝒑𝑨 𝜆 ≡ 𝑑𝑒𝑡 𝑨 − 𝜆𝑰 = 𝟎
Other properties:
• 𝑨 and 𝑨𝑻 have the same eigenvalues but not necessarily the same eigenvectors.
• Similar matrices have the same eigenvalues.
• Symmetric positive definite matrices always have positive eigenvalues.
• Determinant of a matrix is equal to the product of its eigenvalues.
Eigenvalues and Eigenvectors
Example
4 2
Find the eigenvalues and the eigenvectors of 𝑨 = .
1 3
Solution.
Step 1: eigenvalues
4 2 𝜆 0 4−𝜆 2
𝒑𝑨 𝜆 ≡ det 𝑨 − 𝜆𝑰 = det − = = 4 − 𝜆 3 − 𝜆 − 2 = 0 ⇒ 𝜆 = 2, 𝜆 = 5
1 3 0 𝜆 1 3−𝜆
4−𝜆 2
Step 2: eigenvectors corresponding to each eigenvalue: 𝒙=𝟎
1 3−𝜆
4−5 2 𝑥1 −1 2 𝑥1 𝑥1 2
If 𝜆 = 5 ⇒ 𝑥 =𝟎⇒ 𝑥 =𝟎⇒ 𝑥 =
1 3−5 2 1 −2 2 2 1
4−2 2 𝑥1 2 2 𝑥1 𝑥1 1
If 𝜆 = 2 ⇒ =𝟎⇒ =𝟎⇒ 𝑥 =
1 3 − 2 𝑥2 1 1 𝑥2 2 −1
Matrix Decomposition
(Important tool for obtaining complex computations, e.g., 𝑨, 𝑨−3.456 , 𝑒 𝑨 , and many other results)
Matrix Decomposition
Eigendecomposition and Diagonalization
Similar matrices: Two matrices 𝑨, 𝑫 ∈ ℝ𝑛×𝑛 are similar if there exists an invertible matrix 𝑷 ∈ ℝ𝑛×𝑛 with 𝑫 = 𝑷−𝟏 𝑨𝑷
Diagonal Matrix: A matrix 𝑫 ∈ ℝ𝑛×𝑛 is diagonal if 𝑑𝑖𝑗 = 0, ∀ 𝑖 ≠ 𝑗
Diagonalizable Matrix: A matrix 𝑨 ∈ ℝ𝑛×𝑛 is diagonalizable if it is similar to a diagonal matrix, i.e., if there exists a
diagonal matrix 𝑫 and an invertible matrix 𝑷 ∈ ℝ𝑛×𝑛 such that 𝑫 = 𝑷−𝟏 𝑨𝑷.
Theorem (Eigendecomposition): A square matrix 𝑨 ∈ ℝ𝑛×𝑛 can be factored into
𝑨 = 𝑷𝑫𝑷−𝟏
where 𝑷 ∈ ℝ𝑛×𝑛 is a matrix whose columns are the eigenvectors of 𝑨 and 𝑫 is a diagonal matrix whose diagonal
entries are eigenvalues of 𝑨.
Matrix Decomposition
Eigendecomposition
Proof.
𝑨 ∈ ℝ𝑛×𝑛 is diagonalizable if it is similar to a diagonal matrix 𝑫, i.e., if there exists 𝑷 ∈ ℝ𝑛×𝑛 such that 𝑫 = 𝑷−𝟏 𝑨𝑷, which is
the same as 𝑨𝑷 = 𝑷𝑫. Let 𝑫 be a diagonal matrix with the eigenvalues 𝜆𝑗 , 𝑗 = 1, … , 𝑛 on its main diagonal entries and 𝑷 =
𝒑𝟏 , … , 𝒑𝒏 . Then:
𝑨𝑷 = 𝑨 𝒑𝟏 , … , 𝒑𝒏 = 𝑨𝒑𝟏 , … , 𝑨𝒑𝒏 .
𝜆1 ⋯ 0
𝑷𝑫 = 𝒑𝟏 , … , 𝒑𝒏 ⋮ ⋱ ⋮ = 𝜆1 𝒑𝟏 , … , 𝜆𝑛 𝒑𝒏 .
0 ⋯ 𝜆𝑛
This implies that 𝑨𝒑𝟏 , … , 𝑨𝒑𝒏 = 𝜆1 𝒑𝟏 , … , 𝜆𝑛 𝒑𝒏 or 𝑨𝒑𝒋 = 𝜆1 𝒑𝒋 . Therefore, 𝒑𝒋 must be an eigenvector corresponding to 𝜆𝑗 .
Matrix Decomposition
Real Powers of a Matrix
Remark: For 𝑨 ∈ ℝ𝑛×𝑛 , we can see:
𝑨2 = 𝑨 × 𝑨 = 𝑷𝑫𝑷−1 𝑷𝑫𝑷−1 = 𝑷𝑫 𝑷−1 𝑷 𝑫𝑷−1 = 𝑷𝑫𝑰𝑫𝑷−1 = 𝑷𝑫𝑫𝑷−1 = 𝑷𝑫2 𝑷−1
𝑨3 = 𝑨2 × 𝑨 = 𝑷𝑫2 𝑷−1 𝑷𝑫𝑷−1 = 𝑷𝑫2 𝑷−1 𝑷 𝑫𝑷−1 = 𝑷𝑫2 𝑰𝑫𝑷−1 = 𝑷𝑫2 𝑫𝑷−1 = 𝑷𝑫3 𝑷−1
⋮
Continuing this way, we can verify that
𝜆1𝑘 ⋯ 0
𝑘 𝑘 −1
𝑨 = 𝑷𝑫 𝑷 =𝑷 ⋮ ⋱ ⋮ 𝑷−1
0 ⋯ 𝜆𝑘𝑛
It can be shown that the above result holds generally for any 𝑘 ∈ ℝ not just integer values. This result, which is based
on matrix decomposition, is extremely important in finding 𝑨𝑘 when 𝑘 is a very large number or when it is a real
number (e.g., 𝑨 , 𝑨−3.21 , …) in which case the direct approach is not applicable.
Matrix Decomposition
Exponential and Logarithm of a Matrix
Definition: For a matrix 𝑨 ∈ ℝ𝑛×𝑛 , the exponential of 𝑨 is defined by the Taylor expansion of 𝑒 on 𝑨 as:
𝑨
𝑨2 𝑨3
𝑒 =𝑰+𝑨+ + +⋯
2! 3!
Theorem: For a diagonalizable matrix 𝑨 ∈ ℝ𝑛×𝑛 , we have
𝑒 𝜆1 ⋯ 0
𝑨 𝑫 −1
𝑒 = 𝑷𝑒 𝑷 =𝑷 ⋮ ⋱ ⋮ 𝑷−1
0 ⋯ 𝑒 𝜆𝑘
Proof.
𝑨2 𝑨3
𝑒𝑨 = 𝑰 + 𝑨 + + +⋯
2! 3!
𝜆12
1 + 𝜆1 + +⋯ ⋯ 0
𝑷𝑫2 𝑷−1 𝑫2 𝑫3 2! 𝑒 𝜆1 ⋯ 0
=𝑰+ 𝑷𝑫𝑷−1 + +⋯=𝑷 𝑰+𝑫+ + + ⋯ 𝑷−1 = 𝑷 ⋮ ⋱ ⋮ 𝑷−1 =𝑷 ⋮ ⋱ ⋮ 𝑷−1
2! 2! 3!
𝜆2𝑛 0 ⋯ 𝑒 𝜆𝑘
0 ⋯ 1 + 𝜆𝑛 + +⋯
2!
Matrix Decomposition
Exponential and Logarithm of a Matrix
Theorem: For a diagonalizable matrix 𝑨 ∈ ℝ𝑛×𝑛 , we have
𝑙𝑛 𝜆1 ⋯ 0
𝑙𝑛 𝑨 = 𝑷 ⋮ ⋱ ⋮ 𝑷−1
0 ⋯ 𝑙𝑛 𝜆1
Proof. It is enough to show that the above formula satisfies 𝑒 𝑙𝑛 𝑨 = 𝑙𝑛 𝑒 𝑨 = 𝑨. We show only 𝑒 𝑙𝑛 𝑨 = 𝑰 as showing the other is very similar.
𝑙𝑛 𝜆1 ⋯ 0 2
(𝑙𝑛 𝑨)2 (𝑙𝑛 𝑨)3 𝟏 𝑙𝑛 𝜆1 ⋯ 0
𝑙𝑛 𝑨 ⋮ 𝑷−1 + 𝑷
𝑒 = 𝑰 + 𝑙𝑛 𝑨 + + +⋯=𝑰+𝑷 ⋮ ⋱ ⋮ ⋱ ⋮ 𝑷−1 + ⋯ =
2! 3! 0 ⋯ 𝑙𝑛 𝜆1 2! 2
0 ⋯ 𝑙𝑛 𝜆𝑛
1 2
𝑙𝑛 𝜆1 + 𝑙𝑛 𝜆1 +⋯ ⋯ 0 𝜆1 ⋯ 0
2! 𝑒 𝑙𝑛 𝜆1 ⋯ 0
−1 −1 ⋮ ⋱ ⋮ 𝑷−1 = 𝑷𝑫𝑷−1 = 𝑨
=𝑷 ⋮ ⋱ ⋮ 𝑷 =𝑷 ⋮ ⋱ ⋮ 𝑷 = 𝑷
1 2 0 ⋯ 𝑒 𝑙𝑛 𝜆𝑛 0 ⋯ 𝜆𝑛
0 ⋯ 𝑙𝑛 𝜆𝑛 + 𝑙𝑛 𝜆𝑛 +⋯
2!
Matrix Decomposition
Example
4 2
If 𝑨 = , find the following values:
1 3
a. 𝑨
b. 𝑨𝑒 (𝑒 ≈ 2.718: the Euler's constant).
c. 𝑒𝑨
2 0 2 1 1 2 1 2 0 1 1
Solution. From the previous example’s solution, we have 𝑫 = ,𝑷 = ⇒ 𝑨 = 𝑷𝑫𝑷−1 = 3
0 5 1 −1 1 −1 0 5 1 −2
1 1
1 2 1 2 0 1 1 1.96 0.55
a. 𝑨 = 𝑨2 = 𝑷𝑫2 𝑷−1 = 3 = .
1 −1 0 5 1 −2 0.27 1.69
1 2 1 2𝑒 0 1 1 55.15 48.57
b. 𝑨𝑒 = 𝑷𝑫𝑒 𝑷−1 = 3 𝑒 1 −2 = 24.27 30.86
1 −1 0 5
1 2 1 𝑒2 0 1 1 101.41 94.02
c. 𝑒 𝑨 = 𝑷𝑒 𝑫 𝑷−1 = 3 =
1 −1 0 𝑒5 1 −2 47.01 54.40
y$vectors%*%diag(exp(y$values))%*%matrix.inverse(y$vectors)
Matrix Decomposition
Relationship between Eigenvalues and Determinant
Theorem: Let 𝜆1 , … , 𝜆𝑛 be the eigenvalues of the matrix 𝑨 ∈ ℝ𝑚×𝑛 . Then
𝑨 = ෑ 𝜆𝑖 = 𝜆1 × 𝜆2 × ⋯ × 𝜆𝑛
𝑖=1
In addition, if 𝑨 is singular (i.e., 𝑨 = 0) then it has at least an eigenvalue, which is zero.
Proof.
From the eigendecomposition of 𝑨, we know that 𝑨 = 𝑷𝑫𝑷−1 where 𝑷 is the matrix of eigenvectors and 𝑫 is the
diagonal matrix whose main diagonal entries are the eigenvalues. Taking the determinant gives:
𝑨 = 𝑷𝑫𝑷−1 = 𝑷 𝑫 𝑷−1 = 𝑷 𝑫 𝑷 −1
= 𝑫 = 𝜆1 × 𝜆2 × ⋯ × 𝜆𝑛 .
If 𝐀 is singular, then 𝑨 = ς𝑛𝑖=1 𝜆𝑖 = 0. Hence, one of the eigenvalues is at least zero.
Matrix Norms
Norm of a matrix: The definition of the corresponding norm for an 𝑛 × 𝑛 matrix 𝑨 ∈ ℝ𝑛×𝑚 is
𝑨𝒙
𝑨 = max = max 𝑨𝒙 ∀𝒙 ∈ ℝ𝑛 .
𝒙≠𝟎 𝒙 𝒙 =𝟏
Where ⋅ is the Euclidean norm.
Remark: From the above, it follows that:
𝑨𝒙 ≤ 𝑨 𝒙 .
Remark: it can be shown that
𝑨 = 𝜆𝑚𝑎𝑥
where 𝜆𝑚𝑎𝑥 is the maximum eigenvalue of 𝑨 ∈ ℝ𝑛×𝑚 .
(The proof needs additional discussion about orthonormal eigenvector bases…)
Positive Definite Matrices
Definition: a symmetric 𝑨 ∈ ℝ𝑛×𝑛 is called positive semidefinite if and only if
𝒙𝑻 𝑨𝒙 ≥ 𝟎 ∀𝒙 ∈ ℝ𝑛
And positive definite if and only if
𝒙𝑻 𝑨𝒙 > 𝟎 ∀𝒙 ≠ 𝟎 ∈ ℝ𝑛
Theorem: 𝑨 ∈ ℝ𝑛×𝑛 is positive semidefinite if and only if all its eigenvalues are greater than or equal to zero.
Proof.
By definition, we have 𝒙𝑻 𝑨𝒙 ≥ 𝟎 ∀𝒙 ∈ ℝ𝑛 . Choose 𝒙 to be any of the eigenvectors of 𝑨 with 𝜆 the corresponding eigenvalue to 𝒙.
Hence, we have
2
𝒙𝑻 𝑨𝒙 = 𝒙𝑻 𝑨𝒙 = 𝒙𝑻 𝜆𝒙 = 𝜆𝒙𝑻 𝒙 = 𝜆 𝒙 2 ≥0
2
Since 𝒙𝑻 𝒙 = 𝒙 2 > 0 for any 𝒙 ≠ 𝟎 (it is the Euclidean or 𝑳2 norm), we must have 𝜆 ≥ 0.
Remark: From above, it follows that if 𝑨 ∈ ℝ𝑛×𝑛 is positive semidefinite then 𝑨 ≥ 0 because 𝑨 = ς𝑛𝑖=1 𝜆𝑖 .
Positive Definite Matrices
Theorem: For a matrix 𝑨 ∈ ℝ𝑚×𝑛 we can always obtain a symmetric positive semidefinite matrix 𝑺 ∈ ℝ𝑛×𝑛 by defining
𝑺 = 𝑨𝑻 𝑨.
Proof.
𝑻 𝑻
Symmetry requires that 𝑺 = 𝑺𝑻 . We have 𝑺𝑻 = 𝑨𝑻 𝑨 = 𝑨𝑻 𝑨𝑻 = 𝑨𝑻 𝑨.
2
By definition, a PSD matrix, we have 𝒙𝑻 𝑺𝒙 = 𝒙𝑻 𝑨𝑻 𝑨𝒙 = 𝒙𝑨 𝑻
𝒙𝑨 = 𝒙𝑨 2 ≥ 𝟎.