0% found this document useful (0 votes)
27 views2 pages

Linear Algebra Concepts Overview

Uploaded by

xsh7p9fjk8
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views2 pages

Linear Algebra Concepts Overview

Uploaded by

xsh7p9fjk8
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Orthogonal Matrix:

CT Complex Numbers: zz ∈ R, zz = a2 + b2 ≥ 0, zz = 0 ⇐⇒ z = 0 Standard Inner Product on Cn:


Complex conjugate: z = a + bi → z = a – bi, z = z ⇐⇒ z ∈ R Standard Norm: Q ∈ Rn x n iff Q-1 = QT
Symmetric Matrices:
A is symmetric if AT = A, and if A ∈ Rn x n all
eigenvalues λi of A are real.
Vector Norms: Orthogonal Projection on a Subspace:
All λi have algebraic multiplicity =
Properties: Orthogonal transformations preserve the geometric multiplicity. All eigenvectors for
lp-norm: Euclidian length of any vector, and the distinct eigenvalues are orthogonal.
+ friends magnitude of the angle between any two
For A ∈ Rm x n all vectors b ∈ Rm, there Spectral Theorem:
l1, l2, and l∞ norms are equivalent vectors:
Equivalence: for r, s > 0, the la and lb norms are equivalent iff: exists a unique bi ∈ im(A), and a unique For symmetric A ∈ Rn x n, A can be
bk ∈ ker(AT) such that b = bi + bk. diagonalised as A = QDQT = QDQ-1 where Q
is orthogonal and D is a diagonal matrix
Matrix Norms: det Q = 1 or -1, for all eigvals λi, |λi| = 1
Least Square Method + Linear where the diagonal elements are A’s
Properties: same as above +
Regression: Gram-Schmidt Bullshit: eigenvalues.
Norms: largest of abs sum of cols largest singular value of A largest of abs sum of rows If Ax = b has no solution we attempt to 1. Roots of det (A – λI) (eigenvalues (λi)1≤i)
minimise ||Ax – b||22. 2. For each λi find the corresponding
Matrix norm || . || on Rm x n
is consistent with the vector Normal Equation: ATAx = ATb gives eigenspace (sub into A – λI and solve = 0)
norms || . ||a on Rn and || . ||b on Rm if ∀ A ∈ Rm x n, x ∈ Rn: solution to the least square problem. 3. Make orthonormal (magnitude 1, and
if a = b, || . || compatible with || . ||a Finding s0 ∈ R and s ∈ Rn minimising the may need gram-schmidt)
Subordinate Matrix Norm: sum of errors between the model 4. Combine bases to form Q
predictions s0 + s · ai and observations 5. Write associated eigenvalues for cols of
yi, can be done by finding the z = [s0 … <v, u> = v . u (see above for complex version) Q in order to form D
sn]T minimising ||Az − y||22, i.e. by
Frobenius Norm Positive (Semi-)Definite Matrices: Cholesky Decomposition <3:
solving the normal equation ATAz = ATy.
Definite iff A = LLT
- all eigenvalues strictly positive A must be positive (semi-)definite and symmetric
For p = 1, 2, ∞, matrix norm || . ||p is - all diagonal elements strictly positive
subordinate and compatible with vector norm || . || p -
Singular Value Decomposition: Principal Component Analysis: Semi iff
A = USVT A ∈ Rm x n = m samples of n dimensional data - all eigenvalues are non-negative
1. Find eigenvalues of AAT (these form matrix S A = USVT - principal axes of A = columns of V - all diagonal elements non-negative
which has the same dimensions as A and - principal components of A = columns of US -
descending sqrt of eigenvalues in the diagonal) Both over 1 ≤ i ≤ rank(A) Largest coefficient of A on diagonal for
2. Find orthonormal set of vectors of ATA (these First principal axis: both definite and semi-definite
Negative equivalents with flipped signs Solve A = LLT
are the columns of V – remember the final
product uses VT!!) Given For matrix A, ATA and AAT are both
3. Find columns of U using formula we see the relation between A and the principal symmetric and positive semi-definite
for 1 ≤ i ≤ rank(A) – remember the Vi come from components and axes. If σ1 >> σ2, the data in A can be
compressed by projecting in the direction of the Lower triangular iff
V not VT!! To extend U to enough cols pick vj Upper triangular iff
which are perp to lin comb of existing vi and G-S. principal component:
Mat Dims: A: n x m, U: n x n, S: n x m, VT: m x m QR Decomposition (Gram-Shit): Householder Map:
This is sometimes used in data compression, PCA, and For hyper-plane P going through the origin
Properties: A = QR
dimensionality reduction algorithms with unit normal u ∈ Rm, i.e.,
A = [a1, …, an], assuming a1, …, an are linearly
rank(A) = no. +ve singular values in S Jordan Normal Form: independent P = {x ∈ R m : u · x = 0} the Householder
1. Use Gram-Schmidt to construct an matrix defined by Hu = I − 2uuT induces
The positive singular values of A are the square orthonormal basis (e1, …, en) s.t. reflection wrt P.
roots of the eigenvalues of AAT or ATA span {e1, …, en} = span {a1, …, an} Properties:
Orthonormal Basis for im(A): span of the first 2. Q = [e1, …, en]. Note Q is semi-orthogonal: - Hu is involutory: Hu = Hu-1
rank(A) columns of U - Hu is orthogonal: HuT = Hu-1
Note: The algebraic multiplicity of an eigenvalue λ is - Hu preserves the Euclidian length of
Orthonormal Basis for ker(A): span of remaining
the sum of the sizes of blocks with λ on the diagonal. vectors: ||Hu(x)|| = ||x||
columns of V after taking out first rank(A) cols
The geometric multiplicity of λ is the number of blocks - Hu preserves angles between vectors
Generalised Eigenvalues: with λ on the diagonal. - All rotations and reflections are
Given square A ∈ R n x n a non-zero vector v ∈ C n
1. Find eigenvalues of A (note the ai of each λ) 3. Choose orthogonal operations
is a generalised eigenvector of rank m 2. Find eigenspaces for each (note gi of each Eλi) - Orthogonal projection Q on the
associated with eigenvalue λ ∈ C for A if: 3. If gi < ai find missing ai – gi generalised eigenvectors hyperplane P is given by: Q = I − uuT with
4. Put eigenvectors in order into matrix (change of Convergence of QR in A = LU Case: Q2 = Q and Q = QT
Thus, any eigenvector associated with λ is itself basis matrix B) Non-singular matrix A can be factorised as
a generalised eigenvector of rank 1. 5. J = B-1AB A = LU where L is lower triangular and U upper, iff A can be reduced to REF without
The image of a vector of the eigenspace swapping any rows. Transform A into REF to get U, then we know L-1A = U so can find L.
- Associated to λ through A − λI is 0. QR Algorithm: Uniqueness: A = LU is only unique iff A is non singular and the diagonal elements of L are
- of rank 1 (if there are some) through A −λI is in Used to find eigenvalues of matrices, works for most all 1.
the eigenspace associated to λ. matrices. Consider sequence Ak defined below: An = Q 1 … Q n R 1 … R n
- of rank 2 (if there are some) through A−λI is in A0 = A Convergence: Let A ∈ Rn x n be a symmetric positive definite matrix with distinct
the vector space generated by the generalised For k ∈ N apply the QR decomposition to eigenvalues λ1 > λ2 > . . . > λn > 0 with eigendecomposition A = QΛQT. Suppose QT = LU with
eigenvector of rank 1. Ak: Ak = Qk + 1Rk + 1 unit lower triangular L and the diagonal elements of U are positive. Then Ak → Λ
- and so forth... Stop after sufficient iterations
Generalised EVs Associated with λ: Properties: (note Q~ is Q with ~ on top denoting Convergence:
A ∈ Rn x n with eigenvalue λ ∈ C of algebraic orthogonality of Q) Convergence of real numbers:
multiplicity k, there are k linearly independent For k ∈ N, Ak is similar to A. Ak = Q˜kT AQ˜k so Ak and A limn → ∞ an = l iff
generalised eigenvectors v ∈ Cn associated with have the same eigenvalues and v is an eigenvector of 1. Find limit l
λ. It includes the eigenvectors associated with λ, Ak iff Q˜kv is an eigenvector of A. The sequence Ak 2. Take ε > 0
as they are generalised eigenvectors. converges to an upper triangular matrix under certain 3. Put |an – l| < ε, find expression for n > … and set N = roof of what n is >
Number of Generalised Eigenvectors: conditions. The eigenvalues of an upper triangular Cauchy Sequence:
A ∈ Rn x n has n linearly independent generalised matrix are the diagonal elements. (terms get gradually closer). an is only convergent if it is Cauchy.
eigenvectors. There exist a basis of Cn of Symmetric A: All Ak are symmetric. For large enough k, Metric Spaces:
generalised eigenvectors of A. the columns of Q~k are in effect the eigenvectors of A. A tuple (S, d) where S is a non-empty set and d is a metric over S (d : S x S → R).
Example: Prove the below properties hold to show we have a metric space:
For matrix A defined as: Fixed Point Equations: Convergence in a Metric Space:
we have det(A − λI) = (1 − λ)3 which gives λ1 = 1 For non-empty set S and f: S → S, p ∈ S is called a fixed Space = (S, d), sequence = an, limit = l ∈ S
and 2 linearly independent EVs v1 = (0, 1, −1)T, point if f(p) = p. E.g. for f(x) = x2, f(p) = p for p = 0, 1: an converges to l iff:
v2 = (1, 0, 0)T but since λ1 has algebraic
multiplicity of 3 we find v3 using (A − λ1I)v3 = v2 Contraction:
which gives v3 = (0, 0, 1)T. We use v2 here as it is For metric space (S, d) and f : S → S, f is called a If an converges it’s limit is unique.
in the row space (a multiple of it is a row of A) contraction of S (or a contracting map) if there exists Cauchy Seq. (Metric Spaces):
0 ≤ α < 1 called the contraction constant such that: For (S, d) and an a sequence in S, an is only convergent if it is Cauchy.
so v3 will be linearly independent.
Complete Spaces:
Fixed Point Theorem: Metric space (S, d) is a complete space iff every Cauchy sequence in S is also converging in
Let (S, d) be a complete metric space and f a S. For any k > 0, Rk equipped with any of the three metrics induced by l1, l2 or l∞ norms is
contraction of S. Then f has a unique fixed point. complete.
Applications: Newton’s Method and Initial Value
Problem for differential equations.
Condition Number: Conditioning of a Problem: Iterative Refinement:
Measure of sensibility of a problem to small fluctuations in input. T-Digit Arithmetic: Solving Ax = b numerically you get x~, with the residual vector
Stability of the System: how the system responds to noise in input Rule of thumb: for condition number k(A) you lose defined as r = b – Ax~ being a reliable indicator of accuracy iff A is
Sensibility of Solution: how small changes in parameters affect the about c = log10 k(A) significant figures in accuracy well-conditioned. Iterative refinement aims to reduce round-off
solution of a parametric equation Error Bounds and Iterative Refinement: errors. Suppose we are solving: Approx sol:
Let P = problem of interest, d = input, ε = perturbation in input, s(d) When solving Ax = b, with approximate solution x~, Correction factors:
= desired output, s(d + ε) = perturbed output. using the residual vector ||r|| = ||b = Ax~|| or
other geometric measures may not always be good Where:
ways to measure how good an estimate x~ is. The
Relative Condition Number: norms of A and A-1 may provide useful information We sub the formulae of x1, x2, and x3
on error bounds through the following theorem: into the above, then subtract the subbing
Suppose that x~ is an approximation to the solution in of x~ as the solution to obtain:
Value depends on the norms being used. Can also be defined in of Ax = b, and A is a non-singular matrix and r is the We solve to obtain
terms of relative difference. residual vector of x~. Then for any natural norm new correction
Unstable system/Ill-conditioned problem: large condition number Bbbbbbbbbbbbbbbbbband if x ≠ 0 and b ≠ 0, factors to improve
(cannot always assume ill-conditioned for matrices though!!) the solution of xi.
Stable system/Well-conditioned problem: small condition number With t-digit arithmetic and Gaussian Elimination, one can expect
Condition Number for Square Non-Singular Matrices: Bbbbbbbbbbbbbbbb and one can show that the approximation to
E.g. for A we have ||A||∞ = 3.0001, ||A-1||∞ = the condition number is:
Gives a bound on the relative change on A-1 given by perturbation on 20000, and k(A) = ||A||∞||A-1||∞ = 60002 – the size
A. of the condition number should keep us away from Functions of Several Variables:
Pseudo-Inverse and Condition Number: making hasty decisions on accuracy. Clairaut’s Theorem:
Not all matrices can be inverted hence k(A) cannot be calculated.
We can instead use the pseudo-inverse and define a generalisation Iterative Techniques for Eigenvalues and Eigenvectors: bbbbbbbbbbfxy(a, b) = fyx(a, b)
Power Method/Power Iteration: Directional Derivatives:
of the above formula for k(A): If f is a differentiable function of
Take initial vector x0 = [1, 1, 1, 1]T. Converges to the
eigenvalue with largest modulus. x and y then f has a directional
ATA is square, then Ax = b → ATAx = ATb → x = (ATA)-1ATb derivative in the direction of unit
Iterative Solutions of Linear Equations: vector u = <a, b>: Duf(x, y) = fx(x, y)a + fy(x, y)b. For unit vector at
Jacobi: each stage uses the previous stage’s results angle θ use <a, b> = <cos θ, sin θ> (maximised at θ = 0). Du
Gauss-Seidel: each stage uses most recent values Limitations: If not all eigenvalues have distinct represents the rate of change of z in the direction of u. Can also be
G-S Convergence: For Ax = b, converges if modulus, will return linear combination of the written as: ,
corresponding EVs. Convergence slow if dominant
Directional derivative = scalar proj of gradient vec onto u (make u a
Simultaneous equations eigenvector not very dominant.
unit vector first).
u(x1, x2) and v(x1, x2) converge if: Inverse Power Iteration: Do the above on A-1 to get
1/𝜆 where 𝜆 is the EV with smallest modulus of A. Tangent Plane and Normal Line to Level Surface:
Shifts: A – sI If 𝛻𝐹(𝑥0, 𝑦0, 𝑧0) ≠ 𝟎, it is natural to define the tangent plane to the
Prove G-S convergence using 2x2 general A and b 𝜆 ∈ R is an eigenvalue of matrix A iff (𝜆 − 𝑠) is an level surface 𝐹(𝑥, 𝑦, 𝑧) = 𝑘 at 𝑃(𝑥0, 𝑦0, 𝑧0) as the plane that passes
Splitting – General Method: eigenvalue of matrix A − sI. If {𝜆, v} is an eigenpair of A through P and has normal vector 𝛻𝐹(𝑥0, 𝑦0, 𝑧0). 𝛻𝐹(𝑥0, 𝑦0, 𝑧0) gives
A = G + R, then solve xk + 1 = Mxk + c. For consistent norm ||.|| on and s ≠ 𝜆 then {1/(𝜆 – s), v} is an eigenpair of (A – sI)-1. the direction of fastest increase of f.
Rn x n if ||M|| < 1 the sequence converges for any starting point x0. Allows us to focus on other eigenvalues. Maxima and Minima:
Rate of convergence r ∝ − log10 ||M|| Method: choose s such that 𝜇1 = 1/(𝜆j − s) is the Local max: at (a, b) if f(x, y) ≤ f(a, b) for all points (x, y) in some disk
Note about non-diagonal matrices: dominant eigenvalue of (A – sI)-1. The eigenvalue of with centre (a, b). Similar for local min and global max/min if
For a matrix like with all diagonal elements 0, we cannot interest for A is given by : 𝜆j = 1/𝜇1 + s. If the inequalities hold for all (x, y). fx(a, b) = 0 and fy(a, b) = 0 (𝜵𝒇(𝒂, 𝒃) =
split it so we use a change of basis (C = C-1): eigenvector oscillates, 1/𝜇1 should be negative. 0). Critical point if prev condition or one does not exist.
Rayleigh Quotient: Second Derivatives Test:
While using an iterative technique, can use Rayleigh
By denoting C-1AC = B, C-1x = y, and C-1b = c we solve By = c and can Quotient to monitor convergence to eigenvalue
retrieve x through x = Cy. directly, not it’s modulus:
Jacobi Method: Deflation:
A = D + R where R = L + U (D = diags, L = lower tri, U = upper tri) Find second dominant eigenvalue by deflating A to
M = -D-1R and c = D-1b B ∈ R(n – 1) x (n – 1) which has all eigenvalues of A except
Gauss-Seidel Method: dominant one.
A = (D + L) + U, M = -(D + L)-1U, c = (D + L)-1b Steps: find , do , then Hessian Matrix:
Condition Number and Convergence: Bbbbbbbbbbbbbbbbb (remember H-1 = H) 2nd derivative test generalizes to test based on eigenvalues of D (+ve
For Ax = b if the condition number of A is large J and G-S converge Then can use B to get 𝜆2 = min, -ve = max, mix = saddle, singular D = inconclusive).
more slowly/not at all
If A is weakly row diagonally dominant and irreducible, J and G-S Linear Programming:
both converge (G-S faster) Graphical Method:
Draw lines, shade exclusion zones, rearrange objective
Diagonally Dominant Matrix: function for y, move line until maximised, sub for P.
A ∈ Rn x n is strictly row diagonally dominant if: Simplex Method: (stop when no more –ve in z row)
1. Most negative value in z row (highlight col) 1 4
A is non-singular. 2. Ratios (sol / col), highlight lowest ratio’s row
Let A ∈ Rn x n and 𝜌(A) = max𝜆 ∈ Sp(A) |𝜆|be the spectral radius of A. 3. Replace row var (basic) with highlighted col var (non-
For 𝜖 > 0, there exists an induced norm such that ||A|| < 𝜌(A) + 𝜖. basic)
4. Make pivot = 1 (divide highlighted row by cross 2 5
Irreducible Matrix: highlighted value
If A (an n x n matrix) cannot take the form 5. Gaussian elimination to clear rest of highlighted col
through symmetric permutation of its rows and columns, where A11 3 Solution:
etc are square block matrices and P is a permutation matrix. Conjugate Gradient Method:
x1 = 4, x2 = 0, x3 = 3, z = 38
Permutation Matrix: Ax = b, for A ∈ Rn x n converge in n steps
When xn not in z col, xn = 0.
Square matrix with all elements 0 except exactly one 1 in each Larger condition number = slower convergence
row/column. PTP = PPT = I. Check convergence by size of norm of residual Dual LP Problem:
Checking Irreducibility with Graph: Every minimisation/maximisation has a dual problem which aims to
If aij where , i ≠ j, draw an arrow from point i to j. max/min based on the same constraints. When both are optimised
Irreducible iff from any point you can go to any other they are equal. Sometimes the dual is easier to solve than the primal.
point by following arrows.
Given graph is for a 4x4 reducible matrix.
Then solve
Gradient Based Optimisation: Common Stopping Criteria: maximisation
Quadratic Form: Gradient close enough to 0 problem and
Improvements in function value are saturating get the same
result :)
For symmetric A: Movement between iterates small enough
Gradient/Steepest Descent: Finds local minimum
Gradient: where 𝛼 is a fixed step size Relative measure (no dependence on scale of f)
Find 𝛼: Exact line search
Alt: choose const,
diminishing step size Another relative measure
Sub x1 into f(x) and solve d/dh = 0 to get optimal h (= 𝛼) (1/k)
Steepest: , Algorithm for Solving System of Linear Equations:
For quadratic function , Note that x = x + 𝛼r
where r (residual) = b – Ax
Remember works in orthogonal steps, so (x2 – x1) . (x3 – x2) = 0 = b – A(x + 𝛼r) = r - 𝛼Ar
Gradient Ascent: Finds local maximum Slow convergence hence not
widely used for linear equs –
used for non-linear equs tho

You might also like