0% found this document useful (0 votes)

119 views

Dimensionality Reduction

Factor analysis is a technique used to reduce the dimensionality of data by identifying underlying factors that explain the correlations between observed variables. It involves constructing a correlation matrix, determining the number of factors to extract, rotating the factors to improve interpretability, and calculating factor scores. The goal is to represent the data using a smaller number of artificial variables called factors that account for most of the variance in the observed variables.

Uploaded by

nitin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

119 views

Dimensionality Reduction

Uploaded by

nitin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 48

Dimensionality Reduction

Curse of Dimensionality.
• A major problem is the curse of dimensionality.
• If the data x lies in high dimensional space, then an enormous
amount of data is required to learn distributions or decision rules.
• Example: 50 dimensions. Each dimension has 20 levels. This gives a
total of cells. But the no. of data samples will be far less. There
will not be enough data samples to learn.
Curse of Dimensionality
• One way to deal with dimensionality is to assume that we know the
form of the probability distribution.
• For example, a Gaussian model in N dimensions has N + N(N-1)/2
parameters to estimate.
• Requires data to learn reliably. This may be practical.
What is feature reduction?
• Feature reduction refers to the mapping of the original high-dimensional data onto a
lower-dimensional space.
• Criterion for feature reduction can be different based on different problem settings.
• Unsupervised setting: minimize the information loss
• Supervised setting: maximize the class discrimination

• Given a set of data points of p variables

Compute the linear transformation (projection) x1 , x2 ,, xn 
G  d  p : x  d  y  GT x   p ( p  d )
What is feature reduction?
Original data reduced data

Linear transformation

Y p
G T   p d

X  d

G  d  p : X  Y  GT X   p
Feature reduction versus feature selection
• Feature reduction
• All original features are used
• The transformed features are linear combinations of the original features.

• Feature selection
• Only a subset of the original features are used.

• Continuous versus discrete

Factor Analysis
Factor Analysis
• Factor analysis is a class of procedures used for data reduction
and summarization.
• It is an interdependence technique: no distinction between
dependent and independent variables.

• Factor analysis is used:

• To identify underlying dimensions, or factors, that explain the
correlations among a set of variables.
• To identify a new, smaller, set of uncorrelated variables to
replace the original set of correlated variables.
Factors Underlying Selected Psychographics
and Lifestyles

Factor 2

Football Baseball

Evening at home
Factor 1
Go to a party
Home is best place

Plays
Movies
Factor Analysis Model
Each variable is expressed as a linear combination of factors. The
factors are some common factors plus a unique factor. The factor
model is represented as:

Xi = Ai 1F1 + Ai 2F2 + Ai 3F3 + . . . + AimFm + ViUi

where

Xi = i th standardized variable
Aij = standardized mult reg coeff of var i on common factor j
Fj = common factor j
Vi = standardized reg coeff of var i on unique factor i
Ui = the unique factor for variable i
m = number of common factors
Factor Analysis Model
• The first set of weights (factor score coefficients) are
chosen so that the first factor explains the largest
portion of the total variance.

• Then a second set of weights can be selected, so that

the second factor explains most of the residual
variance, subject to being uncorrelated with the first
factor.

• This same principle applies for selecting additional

weights for the additional factors.
Factor Analysis Model
The common factors themselves can be expressed as
linear combinations of the observed variables.

Fi = Wi1X1 + Wi2X2 + Wi3X3 + . . . + WikXk

Where:
Fi = estimate of i th factor
Wi= weight or factor score coefficient
k = number of variables
Statistics Associated with Factor Analysis
• Bartlett's test of sphericity. Bartlett's test of sphericity
is used to test the hypothesis that the variables are
uncorrelated in the population (i.e., the population corr
matrix is an identity matrix)

• Correlation matrix. A correlation matrix is a lower

triangle matrix showing the simple correlations, r,
between all possible pairs of variables included in the
analysis. The diagonal elements are all 1.
Statistics Associated with Factor Analysis
• Communality. Amount of variance a variable shares
with all the other variables. This is the proportion of
variance explained by the common factors.
• Eigenvalue. Represents the total variance explained by
each factor.
• Factor loadings. Correlations between the variables and
the factors.
• Factor matrix. A factor matrix contains the factor
loadings of all the variables on all the factors
Statistics Associated with Factor Analysis

• Factor scores. Factor scores are composite scores estimated for

each respondent on the derived factors.
• Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy. Used
to examine the appropriateness of factor analysis. High values
(between 0.5 and 1.0) indicate appropriateness. Values below 0.5
imply not.
• Percentage of variance. The percentage of the total variance
attributed to each factor.
• Scree plot. A scree plot is a plot of the Eigenvalues against the
number of factors in order of extraction.
Conducting Factor Analysis
Problem formulation

Construction of the Correlation Matrix

Method of Factor Analysis

Determination of Number of Factors

Rotation of Factors

Interpretation of Factors

Calculation of
Factor Scores

Determination of Model Fit

Formulate the Problem

• The objectives of factor analysis should be identified.

• The variables to be included in the factor analysis

should be specified. The variables should be
measured on an interval or ratio scale.

• An appropriate sample size should be used. As a

rough guideline, there should be at least four or five
times as many observations (sample size) as there are
variables.
Construct the Correlation Matrix

• The analytical process is based on a matrix of correlations

between the variables.

• If the Bartlett's test of sphericity is not rejected, then factor

analysis is not appropriate.

• If the Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy is

small, then the correlations between pairs of variables cannot be
explained by other variables and factor analysis may not be
appropriate.
Determine the Method of Factor Analysis
• In Principal components analysis, the total variance in the data is
considered.
-Used to determine the min number of factors that will account
for max variance in the data.

• In Common factor analysis, the factors are estimated based only

on the common variance.
-Communalities are inserted in the diagonal of the correlation
matrix.
-Used to identify the underlying dimensions and when the
common variance is of interest.
Determine the Number of Factors

• A Priori Determination. Use prior knowledge.

• Determination Based on Eigenvalues. Only factors with

Eigenvalues greater than 1.0 are retained.

• Determination Based on Scree Plot. A scree plot is a plot of the

Eigenvalues against the number of factors in order of extraction.
The point at which the scree begins denotes the true number of
factors.

• Determination Based on Percentage of Variance.

Rotation of Factors

• Through rotation the factor matrix is transformed into a simpler

one that is easier to interpret.

• After rotation each factor should have nonzero, or significant,

loadings for only some of the variables. Each variable should
have nonzero or significant loadings with only a few factors, if
possible with only one.

• The rotation is called orthogonal rotation if the axes are

maintained at right angles.
Rotation of Factors
• Varimax procedure. Axes maintained at right angles
-Most common method for rotation.
-An orthogonal method of rotation that minimizes the number
of variables with high loadings on a factor.
-Orthogonal rotation results in uncorrelated factors.
• Oblique rotation. Axes not maintained at right angles
-Factors are correlated.
-Oblique rotation should be used when factors in the population
are likely to be strongly correlated.
Interpret Factors

• A factor can be interpreted in terms of the

variables that load high on it.
• Another useful aid in interpretation is to plot
the variables, using the factor loadings as
coordinates. Variables at the end of an axis are
those that have high loadings on only that
factor, and hence describe the factor.
Calculate Factor Scores

The factor scores for the i th factor may be

estimated as follows:

Fi = Wi1 X1 + Wi2 X2 + Wi3 X3 + . . . + Wik Xk

Determine the Model Fit

• The correlations between the variables can be

deduced from the estimated correlations
between the variables and the factors.

• The differences between the observed

correlations (in the input correlation matrix) and
the reproduced correlations (estimated from the
factor matrix) can be examined to determine
model fit. These differences are called residuals.
Principal Component Analysis (PCA)
• Dimensionality reduction implies information loss; PCA
preserves as much information as possible by minimizing
the reconstruction error:

• How should we determine the “best” lower dimensional

space?

The “best” low-dimensional space can be determined by the

“best” eigenvectors of the covariance matrix of the data (i.e.,
the eigenvectors corresponding to the “largest” eigenvalues –
also called “principal components”).
36
two commonly used definitions of PCA
PCA give rise to the same algorithm

• PCA-principal component analysis

• maximum variance formulation
• minimum-error formulation
• application of PCA
• PCA for high-dimensional data
• Kernel PCA
• Probabilistic PCA

37
PCA-maximum variance formulation

• PCA can be defined as

• the orthogonal projection of the data onto a lower dimensional linear space-
principal subspace
• s.t. the variance of the projected data is maximized

goal

38
PCA-maximum variance formulation

red dots: data points

purple line: principal subspace
39
green dots: projected points
PCA-maximum variance formulation

• data set: {xn} n=1,2,…N

• xn: D dimensions
• goal:
• project the data onto a space having dimensionality M < D
• maximize the variance of the projected data

40
PCA - Steps
− Suppose x1, x2, ..., xM are N x 1 vectors

(i.e., center at zero)

1
M

41
Eigen value and Eigen vector

A is a square matrix
Find, C and λ –
λ is the eigen value , C is a
vector
PCA – Steps (cont’d)
an orthogonal basis

( x  x).ui
where bi 
(ui .ui )

43
PCA – Linear Transformation
( x  x).ui
If ui has unit length: bi   ( x  x).ui
(ui .ui )

• The linear transformation RN  RK that performs the

dimensionality reduction is:

44
How to choose K?

• Choose K using the following criterion:

• In this case, we say that we “preserve” 90% or 95% of the

information (variance) in the data.

• If K=N, then we “preserve” 100% of the information in the

data.
45
Error due to dimensionality reduction

• The original vector x can be reconstructed using its

principal components:

• PCA minimizes the reconstruction error:

• It can be shown that the reconstruction error is:

46
Normalization

• The principal components are dependent on the units used

to measure the original variables as well as on the range of
values they assume.

• Data should always be normalized prior to using PCA.

• A common normalization method is to transform all the data

to have zero mean and unit standard deviation:

Step-By-Step-Diabetes-Classification-Knn-Detailed-Copy1 - Jupyter Notebook
No ratings yet
Step-By-Step-Diabetes-Classification-Knn-Detailed-Copy1 - Jupyter Notebook
12 pages
Feature Selection and Feature Extraction in Pattern Analysis: A Literature Review
No ratings yet
Feature Selection and Feature Extraction in Pattern Analysis: A Literature Review
14 pages
BRM Data Analysis Techniques
No ratings yet
BRM Data Analysis Techniques
53 pages
PCA
100% (1)
PCA
33 pages
Data Mining
100% (4)
Data Mining
9 pages
Sajjad DS
100% (2)
Sajjad DS
97 pages
Capstone Project
No ratings yet
Capstone Project
10 pages
Data Cleaning Data Transformation Data Reduction Discretization and Generating Concept Hierarchies
No ratings yet
Data Cleaning Data Transformation Data Reduction Discretization and Generating Concept Hierarchies
25 pages
Machine Learning Techniques For Heart Disease Prediction: A. Lakshmanarao, Y.Swathi, P.Sri Sai Sundareswar
No ratings yet
Machine Learning Techniques For Heart Disease Prediction: A. Lakshmanarao, Y.Swathi, P.Sri Sai Sundareswar
4 pages
Data Mining Thesis
No ratings yet
Data Mining Thesis
104 pages
Data Mining Project Shivani Pandey
100% (1)
Data Mining Project Shivani Pandey
40 pages
Attribute Oriented Induction
100% (1)
Attribute Oriented Induction
6 pages
Data Visualisation With Tableau
No ratings yet
Data Visualisation With Tableau
26 pages
Python Data Analysis Visualization
No ratings yet
Python Data Analysis Visualization
34 pages
Subject Code:Mb20Ba01 Subject Name: Data Visulization For Managers Faculty Name: Dr.M.Karthikeyan
No ratings yet
Subject Code:Mb20Ba01 Subject Name: Data Visulization For Managers Faculty Name: Dr.M.Karthikeyan
34 pages
UNIT - 2 .DataScience 04.09.18
No ratings yet
UNIT - 2 .DataScience 04.09.18
53 pages
Design A Library System - Flowcharts and Pseudocode
No ratings yet
Design A Library System - Flowcharts and Pseudocode
1 page
Cluster
100% (1)
Cluster
72 pages
Assignment 02
No ratings yet
Assignment 02
9 pages
Data Mining
100% (3)
Data Mining
18 pages
What Is Exploratory Data Analysis (EDA)
100% (1)
What Is Exploratory Data Analysis (EDA)
13 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
19 pages
02 - Decision Tree Classification On Iris Dataset
No ratings yet
02 - Decision Tree Classification On Iris Dataset
6 pages
Data Science PPT Module 1
100% (1)
Data Science PPT Module 1
24 pages
Data Preprocessing
No ratings yet
Data Preprocessing
77 pages
DataMining Lecture 1
No ratings yet
DataMining Lecture 1
35 pages
RMM Unit-I Introdution To Data Mining
No ratings yet
RMM Unit-I Introdution To Data Mining
129 pages
Student Franchisee Management System
No ratings yet
Student Franchisee Management System
21 pages
The Box-Jenkins Methodology For RIMA Models
No ratings yet
The Box-Jenkins Methodology For RIMA Models
172 pages
Machine Learning LAB MANUAL
No ratings yet
Machine Learning LAB MANUAL
56 pages
Answers To Problems For Data Mining and Predictive Analytics (2nd Edition) by Larose
No ratings yet
Answers To Problems For Data Mining and Predictive Analytics (2nd Edition) by Larose
12 pages
SE 7204 BIG Data Analysis Unit I Final
No ratings yet
SE 7204 BIG Data Analysis Unit I Final
66 pages
Top 10 Data Mining Algorithms
No ratings yet
Top 10 Data Mining Algorithms
65 pages
A Complete Tutorial Which Teaches Data Exploration in Detail PDF
No ratings yet
A Complete Tutorial Which Teaches Data Exploration in Detail PDF
18 pages
2nd Unit - 2.2 - Data Analytics
No ratings yet
2nd Unit - 2.2 - Data Analytics
22 pages
Synopsis Minor Project-2
No ratings yet
Synopsis Minor Project-2
5 pages
Assignment Data Analysis Example
100% (1)
Assignment Data Analysis Example
10 pages
Data Cleaning and Preprocessing Techniques
No ratings yet
Data Cleaning and Preprocessing Techniques
13 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
9 pages
Capstone Notes-1
No ratings yet
Capstone Notes-1
18 pages
DSBDAL - Assignment No 9
No ratings yet
DSBDAL - Assignment No 9
12 pages
Simple Linear Regression - Assignn5
No ratings yet
Simple Linear Regression - Assignn5
8 pages
CS3362 - Data Science Laboratory - Manual - Final-1
No ratings yet
CS3362 - Data Science Laboratory - Manual - Final-1
76 pages
Data Mart Info
No ratings yet
Data Mart Info
5 pages
Time Series Analysis
100% (1)
Time Series Analysis
2 pages
Prediction of Mobile Phone Price Class Using Supervised Machine Learning Techniques
No ratings yet
Prediction of Mobile Phone Price Class Using Supervised Machine Learning Techniques
4 pages
Data Mining - An Overview
No ratings yet
Data Mining - An Overview
40 pages
Worksheet - Data Visualization
No ratings yet
Worksheet - Data Visualization
3 pages
Unit II Notes
No ratings yet
Unit II Notes
36 pages
Capstone Presentation
No ratings yet
Capstone Presentation
58 pages
Assignment 4 On Visualization On Graph With Solution
No ratings yet
Assignment 4 On Visualization On Graph With Solution
14 pages
Key Data Mining Tasks: 1. Descriptive Analytics
No ratings yet
Key Data Mining Tasks: 1. Descriptive Analytics
10 pages
Telecommunication Customer Churn (New)
100% (1)
Telecommunication Customer Churn (New)
23 pages
Topic 1 Etw3482
100% (2)
Topic 1 Etw3482
69 pages
Chapter 2: Estimating The Term Structure: 2.4 Principal Component Analysis
No ratings yet
Chapter 2: Estimating The Term Structure: 2.4 Principal Component Analysis
15 pages
7 - Classification
No ratings yet
7 - Classification
71 pages
Principal Components Analysis
No ratings yet
Principal Components Analysis
50 pages
Assignment 1
No ratings yet
Assignment 1
3 pages
The Concise Guide to the Internet of Things for Executives
From Everand
The Concise Guide to the Internet of Things for Executives
alasdair gilchrist
4/5 (7)
Equity of Cybersecurity in the Education System: High Schools, Undergraduate, Graduate and Post-Graduate Studies.
From Everand
Equity of Cybersecurity in the Education System: High Schools, Undergraduate, Graduate and Post-Graduate Studies.
Joseph O. Esin
No ratings yet
Factor Analysis
No ratings yet
Factor Analysis
18 pages
Mathematical Foundations for Machine Learning
No ratings yet
Mathematical Foundations for Machine Learning
2 pages
Survey Paper 3
No ratings yet
Survey Paper 3
21 pages
A_study_of_employee_acceptance_of_artificial_intel (1)
No ratings yet
A_study_of_employee_acceptance_of_artificial_intel (1)
13 pages
Cluster 3.0 Manual: Michael Eisen Updated by Michiel de Hoon
No ratings yet
Cluster 3.0 Manual: Michael Eisen Updated by Michiel de Hoon
32 pages
Machine Learning
100% (1)
Machine Learning
65 pages
Objective Reduction in Many-Objective Optimization: Linear and Nonlinear Algorithms
No ratings yet
Objective Reduction in Many-Objective Optimization: Linear and Nonlinear Algorithms
23 pages
Localización y Mapeo en Agricultura Basado en La Extracción de Características Puntuales y La Segmentación de Semiplanos A Partir de Datos LiDAR 3D
No ratings yet
Localización y Mapeo en Agricultura Basado en La Extracción de Características Puntuales y La Segmentación de Semiplanos A Partir de Datos LiDAR 3D
14 pages
Drain An Online Log Parsing Approach With Fixed Depth Tree
No ratings yet
Drain An Online Log Parsing Approach With Fixed Depth Tree
8 pages
KCA 034 - Unit 2
No ratings yet
KCA 034 - Unit 2
97 pages
The Cartoon Guide To Statistics-3
100% (1)
The Cartoon Guide To Statistics-3
8 pages
CIMMYT
No ratings yet
CIMMYT
106 pages
Instant Access to Fixed Income Relative Value Analysis Website A Practitioner s Guide to the Theory Tools and Trades 2nd Edition Doug Huggins & Christian Schaller ebook Full Chapters
100% (4)
Instant Access to Fixed Income Relative Value Analysis Website A Practitioner s Guide to the Theory Tools and Trades 2nd Edition Doug Huggins & Christian Schaller ebook Full Chapters
61 pages
5 Pca
No ratings yet
5 Pca
14 pages
Investigating The Drivers For Social Commerce in Socia 2018 Journal of Retai
No ratings yet
Investigating The Drivers For Social Commerce in Socia 2018 Journal of Retai
9 pages
Cross-Validation: - PCA Cross-Validation Is Done in Two Phases and Several Deletion Rounds
No ratings yet
Cross-Validation: - PCA Cross-Validation Is Done in Two Phases and Several Deletion Rounds
44 pages
David Montague, Algorithmic Trading of Futures Via Machine Learning
No ratings yet
David Montague, Algorithmic Trading of Futures Via Machine Learning
5 pages
PAWEES 2013 Full Paper
No ratings yet
PAWEES 2013 Full Paper
249 pages
Author's Accepted Manuscript: Psychiatry Research
No ratings yet
Author's Accepted Manuscript: Psychiatry Research
17 pages
Microbiome. Volume 2, Issue 7
No ratings yet
Microbiome. Volume 2, Issue 7
7 pages
Self-Report Captures 27 Distinct Categories of Emotion Bridged by Continuous Gradients
No ratings yet
Self-Report Captures 27 Distinct Categories of Emotion Bridged by Continuous Gradients
10 pages
hw4 PDF
No ratings yet
hw4 PDF
3 pages
BCSL606 MACHINE LEARNING LAB
No ratings yet
BCSL606 MACHINE LEARNING LAB
33 pages
1.remote Sensing and GIS Integration
No ratings yet
1.remote Sensing and GIS Integration
18 pages
Exploring_the_Use_of_Different_Feature_Levels_of_CNN_for_Anomaly_Detection
No ratings yet
Exploring_the_Use_of_Different_Feature_Levels_of_CNN_for_Anomaly_Detection
5 pages
REVIEW. Artificial Intelligence and Machine Learning-Based Monitoring and Design of Biological Wastewater Treatment Systems
No ratings yet
REVIEW. Artificial Intelligence and Machine Learning-Based Monitoring and Design of Biological Wastewater Treatment Systems
13 pages
Face Recognition Based Attendance System
No ratings yet
Face Recognition Based Attendance System
59 pages
Credit Card Fraud Detection Using Machine Learning
No ratings yet
Credit Card Fraud Detection Using Machine Learning
6 pages
Distributionpatternoftracemetalsinselectedsurfacesoilsof Palakkaddistrict South India
No ratings yet
Distributionpatternoftracemetalsinselectedsurfacesoilsof Palakkaddistrict South India
12 pages
The Relationship Between Career Management and Organisational Commitment: The Moderating Effect of Openness To Experience
No ratings yet
The Relationship Between Career Management and Organisational Commitment: The Moderating Effect of Openness To Experience
192 pages

Dimensionality Reduction

Uploaded by

Dimensionality Reduction

Uploaded by

Dimensionality Reduction

• Given a set of data points of p variables

• Continuous versus discrete

• Factor analysis is used:

Xi = Ai 1F1 + Ai 2F2 + Ai 3F3 + . . . + AimFm + ViUi

• Then a second set of weights can be selected, so that

• This same principle applies for selecting additional

Fi = Wi1X1 + Wi2X2 + Wi3X3 + . . . + WikXk

• Correlation matrix. A correlation matrix is a lower

• Factor scores. Factor scores are composite scores estimated for

Construction of the Correlation Matrix

Method of Factor Analysis

Determination of Number of Factors

Determination of Model Fit

• The objectives of factor analysis should be identified.

• The variables to be included in the factor analysis

• An appropriate sample size should be used. As a

• The analytical process is based on a matrix of correlations

• If the Bartlett's test of sphericity is not rejected, then factor

• If the Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy is

• In Common factor analysis, the factors are estimated based only

• A Priori Determination. Use prior knowledge.

• Determination Based on Eigenvalues. Only factors with

• Determination Based on Scree Plot. A scree plot is a plot of the

• Determination Based on Percentage of Variance.

• Through rotation the factor matrix is transformed into a simpler

• After rotation each factor should have nonzero, or significant,

• The rotation is called orthogonal rotation if the axes are

• A factor can be interpreted in terms of the

The factor scores for the i th factor may be

Fi = Wi1 X1 + Wi2 X2 + Wi3 X3 + . . . + Wik Xk

• The correlations between the variables can be

• The differences between the observed

• How should we determine the “best” lower dimensional

The “best” low-dimensional space can be determined by the

• PCA-principal component analysis

• PCA can be defined as

red dots: data points

• data set: {xn} n=1,2,…N

(i.e., center at zero)

• The linear transformation RN  RK that performs the

• Choose K using the following criterion:

• In this case, we say that we “preserve” 90% or 95% of the

• If K=N, then we “preserve” 100% of the information in the

• The original vector x can be reconstructed using its

• PCA minimizes the reconstruction error:

• It can be shown that the reconstruction error is:

• The principal components are dependent on the units used

• Data should always be normalized prior to using PCA.

• A common normalization method is to transform all the data

You might also like