Dimensionality Reduction
Dimensionality Reduction
Curse of Dimensionality.
• A major problem is the curse of dimensionality.
• If the data x lies in high dimensional space, then an enormous
amount of data is required to learn distributions or decision rules.
• Example: 50 dimensions. Each dimension has 20 levels. This gives a
total of cells. But the no. of data samples will be far less. There
will not be enough data samples to learn.
Curse of Dimensionality
• One way to deal with dimensionality is to assume that we know the
form of the probability distribution.
• For example, a Gaussian model in N dimensions has N + N(N-1)/2
parameters to estimate.
• Requires data to learn reliably. This may be practical.
What is feature reduction?
• Feature reduction refers to the mapping of the original high-dimensional data onto a
lower-dimensional space.
• Criterion for feature reduction can be different based on different problem settings.
• Unsupervised setting: minimize the information loss
• Supervised setting: maximize the class discrimination
Linear transformation
Y p
G T p d
X d
G d p : X Y GT X p
Feature reduction versus feature selection
• Feature reduction
• All original features are used
• The transformed features are linear combinations of the original features.
• Feature selection
• Only a subset of the original features are used.
Factor 2
Football Baseball
Evening at home
Factor 1
Go to a party
Home is best place
Plays
Movies
Factor Analysis Model
Each variable is expressed as a linear combination of factors. The
factors are some common factors plus a unique factor. The factor
model is represented as:
where
Xi = i th standardized variable
Aij = standardized mult reg coeff of var i on common factor j
Fj = common factor j
Vi = standardized reg coeff of var i on unique factor i
Ui = the unique factor for variable i
m = number of common factors
Factor Analysis Model
• The first set of weights (factor score coefficients) are
chosen so that the first factor explains the largest
portion of the total variance.
Where:
Fi = estimate of i th factor
Wi= weight or factor score coefficient
k = number of variables
Statistics Associated with Factor Analysis
• Bartlett's test of sphericity. Bartlett's test of sphericity
is used to test the hypothesis that the variables are
uncorrelated in the population (i.e., the population corr
matrix is an identity matrix)
Rotation of Factors
Interpretation of Factors
Calculation of
Factor Scores
37
PCA-maximum variance formulation
goal
38
PCA-maximum variance formulation
40
PCA - Steps
− Suppose x1, x2, ..., xM are N x 1 vectors
1
M
41
Eigen value and Eigen vector
A is a square matrix
Find, C and λ –
λ is the eigen value , C is a
vector
PCA – Steps (cont’d)
an orthogonal basis
( x x).ui
where bi
(ui .ui )
43
PCA – Linear Transformation
( x x).ui
If ui has unit length: bi ( x x).ui
(ui .ui )
44
How to choose K?
46
Normalization
47