0% found this document useful (0 votes)

194 views52 pages

PCA

Principal Component Analysis

Uploaded by

Hotland Sitorus

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

194 views52 pages

PCA

Principal Component Analysis

Uploaded by

Hotland Sitorus

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Principal Component Analysis

CS498

Todays lecture
Adaptive Feature Extraction Principal Component Analysis
How, why, when, which

A dual goal
Find a good representation
The features part

Reduce redundancy in the data

A side eect of proper features

Example case
Describe this input

What about now?

A good feature
Simplify the explanation of the input
Represent repeating patterns When dened makes the input simpler

How do we dene these abstract qualities?

On to the math

Linear features

Z = WX
=
Weight Matrix
dimensions samples features features dimensions samples

Feature Matrix

Input Matrix

A 2D case
Matrix representation of data

Z = WX = T T T z1 w1 x1 = T = T T z2 w 2 x 2

Scatter plot of same data

Dening a goal
Desirable feature features
Give simple weights Avoid feature similarity

How do we dene these?

One way to proceed

Simple weights
Minimize relation of the two dimensions

Feature similarity
Same thing!

One way to proceed

Simple weights
Minimize relation of the two dimensions z =0 Decorrelate: zT 1 2

Feature similarity
Same thing! w2 = 0 Decorrelate: wT 1

Diagonalizing the covariance

Covariance matrix
T z1z1 Cov ( z1, z2 ) = T z2 z1 zz /N T z2 z2
T 1 2

Diagonalizing the covariance suppresses cross-dimensional co-activity

if z1 is high, z2 wont be, etc
T z1z1 Cov ( z1 , z2 ) = T z2 z1 zz
T 1 2

zT z 2 2

1 / N = 0

0 1

Problem denition
For a given input X Find a feature matrix W So that the weights decorrelate
T WX WX = N I ZZ = NI ( )( ) T

How do we solve this?

Any ideas?

( WX )( WX )

= NI

Solving for diagonalization

( WX )( WX )
T

T T

= NI
T

WXX W = N I WCov ( X ) W = I

Solving for diagonalization

Covariance matrices are positive denite
Therefore symmetric
have orthogonal eigenvectors and real eigenvalues

and are factorizable by:

UT AU =
Where U has eigenvectors of A in its columns =diag(i), where i are the eigenvalues of A

Solving for diagonalization

The solution is a function of the eigenvectors U and eigenvalues of Cov(X)
WCov ( X ) W = I
T 1

1 0

0 2

So what does it do?

Input data covariance:
14.9 0.05 Cov ( X ) 0.05 1.08 0.12 30.4 /N W 8.17 0.03

Extracted feature matrix: Weights covariance:

1 0 Cov(WX) = 0 1

Another solution
This is not the only solution to the problem Consider this one: ( WX )( WX )T = N I
WXXT WT = N I W = XX

1/2

Another solution
This is not the only solution to the problem Consider this one: ( WX )( WX )T = N I
WXXT WT = N I

Similar but out of scope for now

W = XX

1/2

W = US1/2 VT

[U, S, V] = SVD XXT

Decorrelation in pictures
An implicit Gaussian assumption
N-D data has N directions of variance
Input Data

Undoing the variance

The decorrelating matrix W contains two vectors that normalize the inputs variance
Input Data

Resulting transform
Input gets scaled to a well behaved Gaussian with unit variance in all dimensions
Input Data Transformed Data (feature weights)

A more complex case

Having correlation between two dimensions
We still nd the directions of maximal variance But we also rotate in addition to scaling
Input Data Transformed Data (feature weights)

One more detail

So far we considered zero-mean inputs
The transforming operation was a rotation

If the input mean is not zero bad things happen!

Make sure that your data is zero-mean!
Input Data Transformed Data (feature weights)

Principal Component Analysis

This transform is known as PCA
The features are the principal components
They are orthogonal to each other And produce orthogonal (white) weights

Major tool in statistics

Removes dependencies from multivariate data

Also known as the KLT

Karhunen-Loeve transform

A better way to compute PCA

The Singular Value Decomposition way
[U, S, V] = SVD(A) A = USVT

Relationship to eigendecomposition
In our case (covariance input A), U and S will hold the eigenvectors/values of A

Why the SVD?

More stable, more robust, fancy extensions

PCA through the SVD

dimensions features samples samples

= SVD

dimensions

features

samples

Feature Matrix

eigenvalue matrix

Input Matrix

Weight Matrix
dimensions features features features features features dimensions dimensions

= SVD

Input Covariance Feature Matrix Eigenvalue matrix Weight Matrix

Dimensionality reduction
PCA is great for high dimensional data Allows us to perform dimensionality reduction
Helps us nd relevant structure in data Helps us throw away things that wont matter

A simple example
Two very correlated dimensions
e.g. size and weight of fruit One eective variable

PCA matrix here is:

0.2 0.13 W = 13.7 28.2

Large variance between the two components

about two orders of magnitude

A simple example
Second principal component needs to be super-boosted to whiten the weights
maybe is it useless?

Keep only high variance

Throw away components with minor contributions

What is the number of dimensions?

If the input was M dimensional, how many dimensions do we keep?
No solid answer (estimators exist but are aky)

Look at the singular/eigen-values

They will show the variance of each component, at some point it will be small

Example
Eigenvalues of 1200 dimensional video data
Little variance after component 30 We dont need to keep the rest of the data
2.5 x 10
5

1.5

0.5

So where are the features?

We strayed o-subject
What happened to the features? We only mentioned that they are orthogonal

We talked about the weights so far, lets talk about the principal components
They should encapsulate structure How do they look like?

Face analysis
Analysis of a face database
What are good features for faces?

Is there anything special there?

What will PCA give us? Any extra insight?

Lets use MATLAB to nd out

The Eigenfaces

Low-rank model
Instead of using 780 pixel values we use the PCA weights (here 50 and 5)
Input Full Approximation Mean Face

Dominant eigenfaces

985.953 1

732.591 2

655.408 3

229.737 4

227.179 5

Cumulative approx

PCA for large dimensionalities

Sometimes the data is high dim
e.g. videos 1280x720xT = 921,600D x T frames

You will not do an SVD that big!

Complexity is O(4m2n + 8mn2 + 9n3)

Useful approach is the EM-PCA

EM-PCA in a nutshell
Alternate between successive approximations
Start with random C and loop over: Z = C+X
C = XZ+

After convergence C spans the PCA space

If we choose a low rank C then computations are signicantly more ecient than the SVD
More later when we cover EM

PCA for online data

Sometimes we have too many data samples
Irrespective of the dimensionality e.g. long video recordings

Incremental SVD algorithms

Update the U,S,V matrices with only a small set or a single sample point Very ecient updates

A Video Example
The movie is a series of frames
Each frame is a data point 126, 80x60 pixel frames Data will be 4800x126

We can do PCA on that

PCA Results

PCA for online data II

Neural net algorithms Naturally online approaches
With each new datum, PCs are updated

Ojas and Sangers rules

Gradient algorithms that update W

Great when you have minimal resources

PCA and the Fourier transform

Weve seen why sinusoids are important
But can we statistically justify it?

PCA has a deep connection with the DFT

In fact you can derive the DFT from PCA

An example
Lets take a time series which is not white
Each sample is somewhat correlated with the previous one (Markov process)
x (t ), , x (t + T )

Well make it multidimensional

x ( t ) x ( t + 1) X= x (t + N ) x (t + 1 + N )

An example
In this context, features will be repeating temporal patterns smaller than N
x (t ) x (t + 1) Z = W x (t + N ) x (t + 1 + N )

If W is the Fourier matrix then we are performing a frequency analysis

PCA on the time series

By denition there is a correlation between successive samples
1 1e 0 1e 1 1e Cov ( X ) 1e 1 1e 0 1e 1 =

Resulting covariance matrix will be symmetric Toeplitz with a diagonal tapering towards 0

Solving for PCA

The eigenvectors of Toeplitz matrices like this one are (approximately) sinusoids
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

And ditto with images

Analysis of coherent images results in 2D sinusoids

So now you know

The Fourier transform is an optimal decomposition for time series
In fact you will often not do PCA and do a DFT

There is also a loose connection with our perceptual system

We kind of use similar lters in our ears and eyes (but well make that connection later)

Recap
Principal Component Analysis
Get used to it! Decorrelates multivariate data, nds useful components, reduces dimensionality

Many ways to get to it

Knowing what to use with your data helps

Interesting connection to Fourier transform

Check these out for more

Eigenfaces
[Link] [Link] [Link]

Incremental SVD
[Link]

EM-PCA
[Link]

Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
62 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
39 pages
Chap2 UML Bias PCA Correlation1
No ratings yet
Chap2 UML Bias PCA Correlation1
166 pages
Understanding Principal Component Analysis
100% (1)
Understanding Principal Component Analysis
33 pages
Feature Extraction in Machine Learning
No ratings yet
Feature Extraction in Machine Learning
17 pages
High-Dimensional Data Analysis Techniques
No ratings yet
High-Dimensional Data Analysis Techniques
88 pages
High-Dimensional Data Analysis Techniques
No ratings yet
High-Dimensional Data Analysis Techniques
88 pages
PCA and K-Means Clustering Overview
No ratings yet
PCA and K-Means Clustering Overview
96 pages
Principal Component Analysis
100% (1)
Principal Component Analysis
10 pages
Dimensionality Reduction Techniques in Data Mining
No ratings yet
Dimensionality Reduction Techniques in Data Mining
9 pages
Eigenvalues and Eigenvectors Study Guide
No ratings yet
Eigenvalues and Eigenvectors Study Guide
13 pages
PCA in Face Recognition Explained
No ratings yet
PCA in Face Recognition Explained
62 pages
Data Reduction Techniques in PCA
No ratings yet
Data Reduction Techniques in PCA
36 pages
Dimensionality Reduction and PCA Guide
No ratings yet
Dimensionality Reduction and PCA Guide
28 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
8 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
16 pages
PCA and Autoencoders in Unsupervised Learning
No ratings yet
PCA and Autoencoders in Unsupervised Learning
3 pages
IDAPILecture 15
No ratings yet
IDAPILecture 15
6 pages
Understanding PCA in Data Analysis
No ratings yet
Understanding PCA in Data Analysis
70 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
8 pages
Principal Component Analysis (PCA)
No ratings yet
Principal Component Analysis (PCA)
9 pages
PCA for Feature Extraction Explained
No ratings yet
PCA for Feature Extraction Explained
23 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
21 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
27 pages
PCA for Feature Extraction and Visualization
No ratings yet
PCA for Feature Extraction and Visualization
46 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
44 pages
PCA Insights from Andrew Ng's Lecture
No ratings yet
PCA Insights from Andrew Ng's Lecture
50 pages
Understanding Dimensionality Reduction Techniques
No ratings yet
Understanding Dimensionality Reduction Techniques
123 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
15 pages
PCA_ref
No ratings yet
PCA_ref
21 pages
Understanding Eigenvectors and PCA
No ratings yet
Understanding Eigenvectors and PCA
66 pages
PCA and ICA in Dimensionality Reduction
No ratings yet
PCA and ICA in Dimensionality Reduction
55 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
63 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
15 pages
Dimensionality Reduction with PCA
No ratings yet
Dimensionality Reduction with PCA
26 pages
Feature Extraction Techniques Explained
No ratings yet
Feature Extraction Techniques Explained
32 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
18 pages
PCA for Dimensionality Reduction Guide
No ratings yet
PCA for Dimensionality Reduction Guide
32 pages
Properties of Symmetric Matrices and PCA
No ratings yet
Properties of Symmetric Matrices and PCA
6 pages
PCA for Dimensionality Reduction in ML
No ratings yet
PCA for Dimensionality Reduction in ML
71 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
26 pages
Dimensionality Reduction Techniques
No ratings yet
Dimensionality Reduction Techniques
60 pages
PCA and SVD: Techniques Overview
No ratings yet
PCA and SVD: Techniques Overview
31 pages
Understanding Dimensionality Reduction Techniques
No ratings yet
Understanding Dimensionality Reduction Techniques
6 pages
PCA Explained: Mathematical Foundations
No ratings yet
PCA Explained: Mathematical Foundations
11 pages
PCA Applications in Finance Explained
No ratings yet
PCA Applications in Finance Explained
38 pages
4- Dimensionality Reduction PCA
No ratings yet
4- Dimensionality Reduction PCA
19 pages
PCA in Machine Learning Explained
No ratings yet
PCA in Machine Learning Explained
11 pages
Principal Component Analysis (PCA) : Learning Representations. Dimensionality Reduction
No ratings yet
Principal Component Analysis (PCA) : Learning Representations. Dimensionality Reduction
49 pages
PCA in Data Analytics Explained
No ratings yet
PCA in Data Analytics Explained
9 pages
PCA and Cluster Analysis Overview
No ratings yet
PCA and Cluster Analysis Overview
14 pages
Matrix Operations and PCA Techniques
No ratings yet
Matrix Operations and PCA Techniques
26 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
3 pages
Feature Selection & Extraction Techniques
No ratings yet
Feature Selection & Extraction Techniques
29 pages
PCA for Dimensionality Reduction Guide
No ratings yet
PCA for Dimensionality Reduction Guide
21 pages
Supervised vs Unsupervised Learning
No ratings yet
Supervised vs Unsupervised Learning
36 pages
Dimensionality Reduction Techniques Explained
No ratings yet
Dimensionality Reduction Techniques Explained
28 pages
Feature Selection Techniques in Classification
No ratings yet
Feature Selection Techniques in Classification
29 pages
Understanding Edge Detection Techniques
No ratings yet
Understanding Edge Detection Techniques
47 pages
Binary to Decimal Conversion Functions
No ratings yet
Binary to Decimal Conversion Functions
2 pages
Metadata Extraction Tool Updates 3.4
No ratings yet
Metadata Extraction Tool Updates 3.4
4 pages
Power of Attorney Document Template
No ratings yet
Power of Attorney Document Template
2 pages
Sequential Control Systems Overview
100% (1)
Sequential Control Systems Overview
2 pages
MS Excel and Matlab (Interchanging Data) : Download Network Monitor
No ratings yet
MS Excel and Matlab (Interchanging Data) : Download Network Monitor
3 pages
Nature-Inspired TSP Algorithms in MATLAB
No ratings yet
Nature-Inspired TSP Algorithms in MATLAB
4 pages
Ant Colony Optimization
No ratings yet
Ant Colony Optimization
14 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
18 pages
Matlab - Development of Neural Network Theory For Artificial Life-Thesis - MATLAB and Java Code
No ratings yet
Matlab - Development of Neural Network Theory For Artificial Life-Thesis - MATLAB and Java Code
126 pages
MATLAB Tutorial for Beginners
No ratings yet
MATLAB Tutorial for Beginners
18 pages
Interactive Face Recognition System Design
100% (1)
Interactive Face Recognition System Design
43 pages
Bahan Ajar Bahasa Inggris
100% (1)
Bahan Ajar Bahasa Inggris
66 pages
Comprehensive Hotel Vocabulary Guide
No ratings yet
Comprehensive Hotel Vocabulary Guide
13 pages
FLIR Accelerates Development of Thermal Imaging FPGA: User Story
No ratings yet
FLIR Accelerates Development of Thermal Imaging FPGA: User Story
2 pages
Matlab and Simulink Based Books: Clean Energy Technologies
No ratings yet
Matlab and Simulink Based Books: Clean Energy Technologies
1 page
Lie Symmetries in Differential Equations
100% (1)
Lie Symmetries in Differential Equations
1,451 pages
Algebraic Expressions for Class 7
No ratings yet
Algebraic Expressions for Class 7
4 pages
Understanding Solutions: Types and Properties
No ratings yet
Understanding Solutions: Types and Properties
10 pages
Coating Coverage Calculations Technical Bulletin
100% (1)
Coating Coverage Calculations Technical Bulletin
2 pages
Medical Imaging Techniques Overview
No ratings yet
Medical Imaging Techniques Overview
2 pages
High Force MEMS Chevron Actuator
No ratings yet
High Force MEMS Chevron Actuator
6 pages
Zenit-122 Camera User Manual
No ratings yet
Zenit-122 Camera User Manual
11 pages
Trigonometric and Integral Formulas
No ratings yet
Trigonometric and Integral Formulas
10 pages
Refractive Index Investigation Project
No ratings yet
Refractive Index Investigation Project
27 pages
Progress in Physical Geography
No ratings yet
Progress in Physical Geography
34 pages
BNBC 2020: Special Stair Types Guide
No ratings yet
BNBC 2020: Special Stair Types Guide
12 pages
Water Flooding in Oil Recovery
50% (2)
Water Flooding in Oil Recovery
48 pages
Stress Compatibility in Elasticity
No ratings yet
Stress Compatibility in Elasticity
33 pages
EDM Principles and History Overview
0% (1)
EDM Principles and History Overview
24 pages
Understanding Boundary Value Problems
No ratings yet
Understanding Boundary Value Problems
8 pages
Coupled Electro-Thermal Model for HVDC Cables
No ratings yet
Coupled Electro-Thermal Model for HVDC Cables
65 pages
Is.13946.2.1994 - Hydrofracturing and Over Coring - Part2
No ratings yet
Is.13946.2.1994 - Hydrofracturing and Over Coring - Part2
14 pages
LED Headlight Cooling Device Model
No ratings yet
LED Headlight Cooling Device Model
4 pages
2D Geometric Transformations Explained
No ratings yet
2D Geometric Transformations Explained
20 pages
Induced Potential, Transformers and The National Grid (H) QP - AQA Physics GCSE
No ratings yet
Induced Potential, Transformers and The National Grid (H) QP - AQA Physics GCSE
9 pages
Archimedes' Principle Laboratory Report
No ratings yet
Archimedes' Principle Laboratory Report
7 pages
DCS & WCDMA Band Combiner Data Sheet
No ratings yet
DCS & WCDMA Band Combiner Data Sheet
2 pages
Cambridge O Level Physics Paper 2
No ratings yet
Cambridge O Level Physics Paper 2
8 pages
12th Physics Revision Test Series Schedule
No ratings yet
12th Physics Revision Test Series Schedule
1 page
Advanced Filter for Salt and Pepper Noise
No ratings yet
Advanced Filter for Salt and Pepper Noise
5 pages
Composite Construction Methods Explained
100% (4)
Composite Construction Methods Explained
33 pages
Electric Resistance Lab Report
No ratings yet
Electric Resistance Lab Report
7 pages
Chilled Water Piping Calculations Guide
No ratings yet
Chilled Water Piping Calculations Guide
4 pages
Gearless Power Transmission Mechanism
100% (2)
Gearless Power Transmission Mechanism
68 pages
V-Belt Power Transmission Design Analysis
No ratings yet
V-Belt Power Transmission Design Analysis
8 pages