0% found this document useful (0 votes)
119 views

Dimensionality Reduction

Factor analysis is a technique used to reduce the dimensionality of data by identifying underlying factors that explain the correlations between observed variables. It involves constructing a correlation matrix, determining the number of factors to extract, rotating the factors to improve interpretability, and calculating factor scores. The goal is to represent the data using a smaller number of artificial variables called factors that account for most of the variance in the observed variables.

Uploaded by

nitin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
119 views

Dimensionality Reduction

Factor analysis is a technique used to reduce the dimensionality of data by identifying underlying factors that explain the correlations between observed variables. It involves constructing a correlation matrix, determining the number of factors to extract, rotating the factors to improve interpretability, and calculating factor scores. The goal is to represent the data using a smaller number of artificial variables called factors that account for most of the variance in the observed variables.

Uploaded by

nitin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 48

Dimensionality Reduction

Curse of Dimensionality.
• A major problem is the curse of dimensionality.
• If the data x lies in high dimensional space, then an enormous
amount of data is required to learn distributions or decision rules.
• Example: 50 dimensions. Each dimension has 20 levels. This gives a
total of cells. But the no. of data samples will be far less. There
will not be enough data samples to learn.
Curse of Dimensionality
• One way to deal with dimensionality is to assume that we know the
form of the probability distribution.
• For example, a Gaussian model in N dimensions has N + N(N-1)/2
parameters to estimate.
• Requires data to learn reliably. This may be practical.
What is feature reduction?
• Feature reduction refers to the mapping of the original high-dimensional data onto a
lower-dimensional space.
• Criterion for feature reduction can be different based on different problem settings.
• Unsupervised setting: minimize the information loss
• Supervised setting: maximize the class discrimination

• Given a set of data points of p variables


Compute the linear transformation (projection) x1 , x2 ,, xn 
G  d  p : x  d  y  GT x   p ( p  d )
What is feature reduction?
Original data reduced data

Linear transformation

Y p
G T   p d

X  d

G  d  p : X  Y  GT X   p
Feature reduction versus feature selection
• Feature reduction
• All original features are used
• The transformed features are linear combinations of the original features.

• Feature selection
• Only a subset of the original features are used.

• Continuous versus discrete


Factor Analysis
Factor Analysis
• Factor analysis is a class of procedures used for data reduction
and summarization.
• It is an interdependence technique: no distinction between
dependent and independent variables.

• Factor analysis is used:


• To identify underlying dimensions, or factors, that explain the
correlations among a set of variables.
• To identify a new, smaller, set of uncorrelated variables to
replace the original set of correlated variables.
Factors Underlying Selected Psychographics
and Lifestyles

Factor 2

Football Baseball

Evening at home
Factor 1
Go to a party
Home is best place

Plays
Movies
Factor Analysis Model
Each variable is expressed as a linear combination of factors. The
factors are some common factors plus a unique factor. The factor
model is represented as:

Xi = Ai 1F1 + Ai 2F2 + Ai 3F3 + . . . + AimFm + ViUi

where

Xi = i th standardized variable
Aij = standardized mult reg coeff of var i on common factor j
Fj = common factor j
Vi = standardized reg coeff of var i on unique factor i
Ui = the unique factor for variable i
m = number of common factors
Factor Analysis Model
• The first set of weights (factor score coefficients) are
chosen so that the first factor explains the largest
portion of the total variance.

• Then a second set of weights can be selected, so that


the second factor explains most of the residual
variance, subject to being uncorrelated with the first
factor.

• This same principle applies for selecting additional


weights for the additional factors.
Factor Analysis Model
The common factors themselves can be expressed as
linear combinations of the observed variables.

Fi = Wi1X1 + Wi2X2 + Wi3X3 + . . . + WikXk

Where:
Fi = estimate of i th factor
Wi= weight or factor score coefficient
k = number of variables
Statistics Associated with Factor Analysis
• Bartlett's test of sphericity. Bartlett's test of sphericity
is used to test the hypothesis that the variables are
uncorrelated in the population (i.e., the population corr
matrix is an identity matrix)

• Correlation matrix. A correlation matrix is a lower


triangle matrix showing the simple correlations, r,
between all possible pairs of variables included in the
analysis. The diagonal elements are all 1.
Statistics Associated with Factor Analysis
• Communality. Amount of variance a variable shares
with all the other variables. This is the proportion of
variance explained by the common factors.
• Eigenvalue. Represents the total variance explained by
each factor.
• Factor loadings. Correlations between the variables and
the factors.
• Factor matrix. A factor matrix contains the factor
loadings of all the variables on all the factors
Statistics Associated with Factor Analysis

• Factor scores. Factor scores are composite scores estimated for


each respondent on the derived factors.
• Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy. Used
to examine the appropriateness of factor analysis. High values
(between 0.5 and 1.0) indicate appropriateness. Values below 0.5
imply not.
• Percentage of variance. The percentage of the total variance
attributed to each factor.
• Scree plot. A scree plot is a plot of the Eigenvalues against the
number of factors in order of extraction.
Conducting Factor Analysis
Problem formulation

Construction of the Correlation Matrix

Method of Factor Analysis

Determination of Number of Factors

Rotation of Factors

Interpretation of Factors

Calculation of
Factor Scores

Determination of Model Fit


Formulate the Problem

• The objectives of factor analysis should be identified.

• The variables to be included in the factor analysis


should be specified. The variables should be
measured on an interval or ratio scale.

• An appropriate sample size should be used. As a


rough guideline, there should be at least four or five
times as many observations (sample size) as there are
variables.
Construct the Correlation Matrix

• The analytical process is based on a matrix of correlations


between the variables.

• If the Bartlett's test of sphericity is not rejected, then factor


analysis is not appropriate.

• If the Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy is


small, then the correlations between pairs of variables cannot be
explained by other variables and factor analysis may not be
appropriate.
Determine the Method of Factor Analysis
• In Principal components analysis, the total variance in the data is
considered.
-Used to determine the min number of factors that will account
for max variance in the data.

• In Common factor analysis, the factors are estimated based only


on the common variance.
-Communalities are inserted in the diagonal of the correlation
matrix.
-Used to identify the underlying dimensions and when the
common variance is of interest.
Determine the Number of Factors

• A Priori Determination. Use prior knowledge.

• Determination Based on Eigenvalues. Only factors with


Eigenvalues greater than 1.0 are retained.

• Determination Based on Scree Plot. A scree plot is a plot of the


Eigenvalues against the number of factors in order of extraction.
The point at which the scree begins denotes the true number of
factors.

• Determination Based on Percentage of Variance.


Rotation of Factors

• Through rotation the factor matrix is transformed into a simpler


one that is easier to interpret.

• After rotation each factor should have nonzero, or significant,


loadings for only some of the variables. Each variable should
have nonzero or significant loadings with only a few factors, if
possible with only one.

• The rotation is called orthogonal rotation if the axes are


maintained at right angles.
Rotation of Factors
• Varimax procedure. Axes maintained at right angles
-Most common method for rotation.
-An orthogonal method of rotation that minimizes the number
of variables with high loadings on a factor.
-Orthogonal rotation results in uncorrelated factors.
• Oblique rotation. Axes not maintained at right angles
-Factors are correlated.
-Oblique rotation should be used when factors in the population
are likely to be strongly correlated.
Interpret Factors

• A factor can be interpreted in terms of the


variables that load high on it.
• Another useful aid in interpretation is to plot
the variables, using the factor loadings as
coordinates. Variables at the end of an axis are
those that have high loadings on only that
factor, and hence describe the factor.
Calculate Factor Scores

The factor scores for the i th factor may be


estimated as follows:

Fi = Wi1 X1 + Wi2 X2 + Wi3 X3 + . . . + Wik Xk


Determine the Model Fit

• The correlations between the variables can be


deduced from the estimated correlations
between the variables and the factors.

• The differences between the observed


correlations (in the input correlation matrix) and
the reproduced correlations (estimated from the
factor matrix) can be examined to determine
model fit. These differences are called residuals.
Principal Component Analysis (PCA)
• Dimensionality reduction implies information loss; PCA
preserves as much information as possible by minimizing
the reconstruction error:

• How should we determine the “best” lower dimensional


space?

The “best” low-dimensional space can be determined by the


“best” eigenvectors of the covariance matrix of the data (i.e.,
the eigenvectors corresponding to the “largest” eigenvalues –
also called “principal components”).
36
two commonly used definitions of PCA
PCA give rise to the same algorithm

• PCA-principal component analysis


• maximum variance formulation
• minimum-error formulation
• application of PCA
• PCA for high-dimensional data
• Kernel PCA
• Probabilistic PCA

37
PCA-maximum variance formulation

• PCA can be defined as


• the orthogonal projection of the data onto a lower dimensional linear space-
principal subspace
• s.t. the variance of the projected data is maximized

goal

38
PCA-maximum variance formulation

red dots: data points


purple line: principal subspace
39
green dots: projected points
PCA-maximum variance formulation

• data set: {xn} n=1,2,…N


• xn: D dimensions
• goal:
• project the data onto a space having dimensionality M < D
• maximize the variance of the projected data

40
PCA - Steps
− Suppose x1, x2, ..., xM are N x 1 vectors

(i.e., center at zero)

1
M

41
Eigen value and Eigen vector

A is a square matrix
Find, C and λ –
λ is the eigen value , C is a
vector
PCA – Steps (cont’d)
an orthogonal basis

( x  x).ui
where bi 
(ui .ui )

43
PCA – Linear Transformation
( x  x).ui
If ui has unit length: bi   ( x  x).ui
(ui .ui )

• The linear transformation RN  RK that performs the


dimensionality reduction is:

44
How to choose K?

• Choose K using the following criterion:

• In this case, we say that we “preserve” 90% or 95% of the


information (variance) in the data.

• If K=N, then we “preserve” 100% of the information in the


data.
45
Error due to dimensionality reduction

• The original vector x can be reconstructed using its


principal components:

• PCA minimizes the reconstruction error:

• It can be shown that the reconstruction error is:

46
Normalization

• The principal components are dependent on the units used


to measure the original variables as well as on the range of
values they assume.

• Data should always be normalized prior to using PCA.

• A common normalization method is to transform all the data


to have zero mean and unit standard deviation:

47

You might also like