0% found this document useful (0 votes)

121 views24 pages

Lesson 7-Feature Selection and Principal Component Analysis

The document discusses feature selection and principal component analysis (PCA) as strategies to manage complex datasets with numerous features. It outlines various feature selection methods including filter, wrapper, and embedded techniques, as well as the process of PCA for dimensionality reduction. PCA is emphasized as a tool for both data visualization and extracting principal components that capture maximum variance from the original feature set.

Uploaded by

n21dccn176

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

121 views24 pages

Lesson 7-Feature Selection and Principal Component Analysis

Uploaded by

n21dccn176

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Feature Selection and

Principal Component
Analysis
2 Introduction of Feature Selection

 It is not always a good thing to deal with feature sets having maybe thousands
of features or even more.

 More features tend to make models more complex and difficult to interpret.
Besides this, it can often lead to models over-fitting on the training data

 The ultimate objective is to select an optimal number of features to train and

build models that generalize very well on the data and prevent overfitting.
3 Feature Selection Strategies

 Feature selection strategies can be divided into three main areas based
on the type of strategy and techniques employed for the same.

 Filter methods: These techniques select features purely based on metrics like
correlation, mutual information and so on. It includes Threshold-Based
Methods and statistical tests.

 Wrapper methods: Using a recursive approach to build multiple models using

feature subsets and select the best subset of features

 Embedded methods: These techniques try to combine the benefits of the

other two methods by leveraging Machine Learning models themselves to
rank and score feature variables based on their importance.
4 Feature Selection: Threshold-Based
Methods
 This is a filter based feature selection strategy, where you can use some form of cut-off or
thresholding for limiting the total number of features during feature selection.

 Thresholds can be of various forms. Some of them can be used during the feature
engineering process itself, where you can specify threshold parameters.

 A simple example of using thresholds is to use variance-based thresholding where

features having low variance (below a user-specified threshold) are removed. This
signifies that we want to remove features that have values that are more or less
constant across all the observations in our datasets.
5 …..
 We can apply this to Pokémon dataset. First we convert the Generation
feature to a categorical feature as follows.

Next, we want to remove features from the one hot encoded features where the
variance is less than 0.15.
6 …….
 To view the variances as well as which features were finally selected by this
algorithm, we can use the variances_ property and the get_support(...) function
respectively.
7 Feature Selection: Statistical Methods
 Filter based feature selection method, which is slightly more sophisticated, to select
features based on univariate statistical tests.

 Use several statistical tests for regression and classification based models including
mutual information, ANOVA (analysis of variance) and chi-square tests.

 Based on scores obtained from these statistical tests, can select the best features on
the basis of their score.
8 Feature Selection: Recursive Feature
Elimination
 Can also rank and score features with the help of a Machine Learning based model estimator such
that recursively keep eliminating lower scored features till you arrive at the specific feature subset
count.

 Recursive Feature Elimination, also known as RFE.

 The basic idea is to start off with a specific Machine Learning estimator like the Logistic Regression
algorithm. Next take the entire feature set of features and the corresponding response class
variables.

 RFE aims to assign weights to these features based on the model fit. Features with the smallest
weights are pruned out and then a model is fit again on the remaining features to obtain the new
weights or scores.

 This process is recursively carried out multiple times and each time features with the lowest
scores/weights are eliminated, until the pruned feature subset contains the desired number of
features that the user wanted to select (this is taken as an input parameter at the start).
9 Feature Selection: Model-Based
Selection
 Tree based models like decision trees and ensemble models like random forests
(ensemble of trees) can be utilized not just for modeling alone but for feature selection.

 These models can be used to compute feature importances when building the model
that can inturn be used for selecting the best features and discarding irrelevant features
with lower scores.
10 Dimensionality Reduction

 Dealing with a lot of features can lead to issues like model overfitting, complex models.

 Dimensionality reduction is the process of reducing the total number of features in our
feature set using strategies like feature selection or feature extraction.

 Feature extraction where the basic objective is to extract new features from the existing
set of features such that the higher-dimensional dataset with many features can be
reduced into a lower-dimensional dataset of these newly created features.

 A very popular technique of linear data transformation from higher to lower dimensions is
Principal Component Analysis, also known as PCA
11 Principal Component Analysis

 Principal component analysis (PCA) refers to the process by which principal

components are computed, and the subsequent use of these components in
understanding the data.

 PCA is an unsupervised approach, since it involves only a set of features X1,X2,...,Xp, and
no associated response Y.

 Apart from producing derived variables for use in supervised learning problems, PCA
also serves as a tool for data visualization (visualization of the observations or
visualization of the variables)..
12 What Are Principal Components?

 Suppose that we wish to visualize n observations with measurements on a set of p

features, X1,X2,...,Xp, as part of an exploratory data analysis. We could do this by
examining two-dimensional scatterplots of the data, each of which contains the n
observations’ measurements on two of the features. However, there are p(p−1)/2 such
scatterplots; for example, with p=10 there are 45 plots!

 We would like to find a low-dimensional representation of the data that captures as much
of the information as possible.

 PCA provides a tool to do just this

13 What Are Principal Components?
(cont.)
 The idea is that each of the n observations lives in p-dimensional space,
but not all of these dimensions are equally interesting.

 PCA seeks a small number of dimensions that are as interesting as possible,

where the concept of interesting is measured by the amount that the
observations vary along each dimension.

 Each of the dimensions found by PCA is a linear combination of the p

features.
14
The manner of finding principal
components
 The first principal component of a set of features X1,X2,...,Xp is the normalized linear
combination of the features:

 We refer to the elements as the loadings of the first principal

 the principal component loading vector

 Given a n x p data set X, how do we compute the first principal com ponent? Since we are
only interested in variance, we assume that each of the variables in X has been centered to
have mean zero (that is, the column means of X are zero). We then look for the linear
combination of the sample feature values of the form
15 The manner of finding principal
components (cont.)
 The first principal component loading vector solves the optimization problem:

 The problem can be solved via an eigen decomposition, a standard technique

in linear algebra.
16

Ninety observations simulated in three dimensions.

Left: the first two principal component directions span the plane that best fits the data. It
minimizes the sum of squared distances from each point to the plane.
Right: the first two principal component score vectors give the coordinates of the projection
of the 90 observations on to the plane. The variance in the plane is maximized.
17 Feature Extraction with Principal
Component Analysis
 In any PCA transformation, the total number of PCs is always less than or equal to
the initial number of features.

 The first principal component tries to capture the maximum variance of the original
set of features. Each of the succeeding components tries to capture more of the
variance such that they are orthogonal to the preceding components.

 An important point to remember is that PCA is sensitive to feature scaling.

Feature Extraction with Principal Component
18
Analysis (cont.)
Singular Value Decomposition

 The process of singular value decomposition, also known as SVD, is another

matrix decomposition or factorization process such that we are able to
break down a matrix to obtain singular vectors and singular values.

 Considering a matrix M having dimensions m x n such that m denotes total

rows and n denotes total columns, the SVD of the matrix can be
represented with the following equation.
Feature Extraction with Principal
19
Component Analysis (cont.)
 Considering we have a data matrix F(n x D) , where we have n observations
and D dimensions (features), we can depict SVD of the feature matrix as
(F(n x D) ) = USVT such that all the principal components are contained in the
component VT, which can be depicted as follows:

 The principal components are represented by {PC1 , PC2 , ... PCD } , which
are all one-dimensional vectors of dimensions (1 x D). For extracting the first
d principal components, we can first transpose this matrix to obtain the
following representation.
20 Feature Extraction with Principal
Component Analysis (cont.)
 Now we can extract out the first d principal components such that d ≤ D and the
reduced principal component set can be depicted as follows.

𝑃𝐶(𝐷𝑥𝑑) = 𝑉 𝑇 𝑇 = 𝑃𝐶1 𝐷𝑥1 |𝑃𝐶2 𝐷𝑥1 | ⋯ |𝑃𝐶𝑑 𝐷𝑥1

 Finally, to perform dimensionality reduction, we can get the reduced feature set using the
following mathematical transformation

F(n x d) = F(n x D) ⋅PC(D x d)

 where the dot product between the original feature matrix and the reduced subset of principal
components gives us a reduced feature set of d features.

 A very important point to remember here is that might need to center initial feature matrix
by removing the mean because by default, PCA assumes that your data is centered
around the origin
21 How Principal Component Analysis (PCA)
Work
 Standardize the Data: If the features of your dataset are on different scales, it’s essential
to standardize them (subtract the mean and divide by the standard deviation).
 Compute the Covariance Matrix: Calculate the covariance matrix for the standardized
dataset.
 Compute Eigenvectors and Eigenvalues: Find the eigenvectors and eigenvalues of the
covariance matrix. The eigenvectors represent the directions of maximum variance,
and the corresponding eigenvalues indicate the magnitude of variance along those
directions.
 Sort Eigenvectors by Eigenvalues: Sort the eigenvectors based on their corresponding
eigenvalues in descending order.
 Choose Principal Components: Select the top k eigenvectors (principal components)
where k is the desired dimensionality of the reduced dataset.
 Transform the Data: Multiply the original standardized data by the selected principal
components to obtain the new, lower-dimensional representation of the data.
import numpy as np
from [Link] import PCA
from [Link] import StandardScaler
22 import [Link] as plt

# Example Data
[Link](42)
X = [Link](100, 3) # 100 samples with 3 features

# Step 1: Standardize the Data

scaler = StandardScaler()
X_std = scaler.fit_transform(X)

# Step 2-5: PCA

pca = PCA()
X_pca = pca.fit_transform(X_std)

# Plot Explained Variance Ratio

explained_var_ratio = pca.explained_variance_ratio_
cumulative_var_ratio = [Link](explained_var_ratio)

[Link](range(1, len(cumulative_var_ratio) + 1), cumulative_var_ratio, marker='o')

[Link]('Number of Principal Components')
[Link]('Cumulative Explained Variance Ratio')
[Link]('Explained Variance Ratio vs. Number of Principal Components')
[Link]()
23 PCA Demo: Iris Data Set
24 End of Lesson

5 Data Pre Processing III
No ratings yet
5 Data Pre Processing III
30 pages
Feature Extraction in Machine Learning
No ratings yet
Feature Extraction in Machine Learning
17 pages
6 - Data Pre-Processing-III
No ratings yet
6 - Data Pre-Processing-III
30 pages
Unit 3 - MSC
No ratings yet
Unit 3 - MSC
51 pages
U5@-Data Reduction
No ratings yet
U5@-Data Reduction
22 pages
Feature Extraction Techniques Guide
No ratings yet
Feature Extraction Techniques Guide
16 pages
Presentation 1
No ratings yet
Presentation 1
15 pages
ML Module 6
No ratings yet
ML Module 6
6 pages
Unit V Foml
No ratings yet
Unit V Foml
18 pages
Feature Extraction Techniques
No ratings yet
Feature Extraction Techniques
32 pages
Mod2 Dimensionality Reduction
No ratings yet
Mod2 Dimensionality Reduction
18 pages
Pattern Classification 06. Feature Selection & Extraction: Abdelmoniem Bayoumi, PHD
No ratings yet
Pattern Classification 06. Feature Selection & Extraction: Abdelmoniem Bayoumi, PHD
29 pages
L06 Feature Selection and Extraction
No ratings yet
L06 Feature Selection and Extraction
29 pages
Dimensionality Reduction Guide
No ratings yet
Dimensionality Reduction Guide
15 pages
Kinya Sharon - Ass2 - Machine Learning
No ratings yet
Kinya Sharon - Ass2 - Machine Learning
12 pages
6 Principal Component Analysis
No ratings yet
6 Principal Component Analysis
7 pages
What Is Principal Component Analysis (PCA) ?
No ratings yet
What Is Principal Component Analysis (PCA) ?
13 pages
CHBE413CDS Lecture 12 Unsupervised DimRed
No ratings yet
CHBE413CDS Lecture 12 Unsupervised DimRed
30 pages
Unit 5 (Dimensionality Reduction)
No ratings yet
Unit 5 (Dimensionality Reduction)
96 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
34 pages
Unit 3
No ratings yet
Unit 3
28 pages
ML Mod 4 Part 2
No ratings yet
ML Mod 4 Part 2
32 pages
Unit 3
No ratings yet
Unit 3
55 pages
Dimensionality Reduction Techniques
No ratings yet
Dimensionality Reduction Techniques
82 pages
20 Pca
No ratings yet
20 Pca
50 pages
PCA Finds Representation Through Linear Transformation
No ratings yet
PCA Finds Representation Through Linear Transformation
28 pages
CHP 4
No ratings yet
CHP 4
72 pages
10-601 Machine Learning (Fall 2010) Principal Component Analysis
No ratings yet
10-601 Machine Learning (Fall 2010) Principal Component Analysis
8 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
22 pages
Data Reduction Techniques
100% (1)
Data Reduction Techniques
41 pages
Pca 1
No ratings yet
Pca 1
3 pages
PCA - Ensemble Classifiers
No ratings yet
PCA - Ensemble Classifiers
9 pages
Multivariate Parametric Methods: Steven J Zeil
No ratings yet
Multivariate Parametric Methods: Steven J Zeil
36 pages
Dimensionality Reduction Technique
No ratings yet
Dimensionality Reduction Technique
17 pages
Deep Learning Notes III To IV
No ratings yet
Deep Learning Notes III To IV
22 pages
کتاب نهم بارگزاری شده
No ratings yet
کتاب نهم بارگزاری شده
55 pages
PCA - Feb 8
No ratings yet
PCA - Feb 8
28 pages
SVD and PCA in Data Science
No ratings yet
SVD and PCA in Data Science
58 pages
Dimensionality Reduction Techniques in Data Mining Aim To Reduce The Number of Features
No ratings yet
Dimensionality Reduction Techniques in Data Mining Aim To Reduce The Number of Features
9 pages
PCA Dev
No ratings yet
PCA Dev
16 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
19 pages
Love Report 1
No ratings yet
Love Report 1
10 pages
FALLSEM2024-25 SWE1015 ETH VL2024250103260 2024-09-18 Reference-Material-I
No ratings yet
FALLSEM2024-25 SWE1015 ETH VL2024250103260 2024-09-18 Reference-Material-I
62 pages
PCA in Machine Learning Explained
No ratings yet
PCA in Machine Learning Explained
33 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
3 pages
Ferath Kherif PCA
No ratings yet
Ferath Kherif PCA
17 pages
Need of Principal Component Analysis
No ratings yet
Need of Principal Component Analysis
8 pages
Machine Learning (CSO851) - Lecture 03
No ratings yet
Machine Learning (CSO851) - Lecture 03
71 pages
Feature Selection and Extraction
No ratings yet
Feature Selection and Extraction
26 pages
Remote Sensing Assignment
No ratings yet
Remote Sensing Assignment
10 pages
Principle Component Analysis
No ratings yet
Principle Component Analysis
7 pages
10 ASAP Advanced Statistics Dimension Reduction
No ratings yet
10 ASAP Advanced Statistics Dimension Reduction
8 pages
Unit Iii Dimentionality Reduction
No ratings yet
Unit Iii Dimentionality Reduction
12 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
8 pages
Handout
No ratings yet
Handout
6 pages
Principal Component Analysis1
No ratings yet
Principal Component Analysis1
26 pages
ML - Unit 3
No ratings yet
ML - Unit 3
4 pages
PCA Complete
No ratings yet
PCA Complete
8 pages
A COMPLETE GUIDE TO PRINCIPAL COMPONENT ANALYSIS in ML 1598272724
No ratings yet
A COMPLETE GUIDE TO PRINCIPAL COMPONENT ANALYSIS in ML 1598272724
16 pages
Tendeka AICD-RCP Valves
No ratings yet
Tendeka AICD-RCP Valves
2 pages
CIT Project Guidelines - Proposal
No ratings yet
CIT Project Guidelines - Proposal
11 pages
1500-A - Bio Safety Cabinet
No ratings yet
1500-A - Bio Safety Cabinet
6 pages
Bing and Cookie Data Analysis
No ratings yet
Bing and Cookie Data Analysis
38 pages
Fees Structure
No ratings yet
Fees Structure
3 pages
MadXAbhi - Industrial Management - by MadXAbhi - Robot
No ratings yet
MadXAbhi - Industrial Management - by MadXAbhi - Robot
41 pages
CTSIM
No ratings yet
CTSIM
5 pages
Fender Bass Setup Guide
No ratings yet
Fender Bass Setup Guide
4 pages
Essay On Doordarshan in Hindi
100% (1)
Essay On Doordarshan in Hindi
6 pages
Settings Provider
No ratings yet
Settings Provider
193 pages
Telegram Bot Development Portfolio
No ratings yet
Telegram Bot Development Portfolio
2 pages
Website Audit for SEO Experts
No ratings yet
Website Audit for SEO Experts
10 pages
Electrical Tech Application
No ratings yet
Electrical Tech Application
1 page
Revision Material CLASS X MATHS
No ratings yet
Revision Material CLASS X MATHS
111 pages
Lucca 40 14 05 25
No ratings yet
Lucca 40 14 05 25
2 pages
Bank Proof of Funds PDF
90% (10)
Bank Proof of Funds PDF
1 page
Human Factors & System Design
No ratings yet
Human Factors & System Design
62 pages
FB-7training Text
No ratings yet
FB-7training Text
58 pages
Quality Assurance Specialist Resume
No ratings yet
Quality Assurance Specialist Resume
5 pages
As 127459 KV-8000 KV-XCM02 Um D09GB WW GB 2052 1
No ratings yet
As 127459 KV-8000 KV-XCM02 Um D09GB WW GB 2052 1
76 pages
Understanding MissingNo. Variants
No ratings yet
Understanding MissingNo. Variants
18 pages
CMR Institute of Technology: Ugc Autonomous
No ratings yet
CMR Institute of Technology: Ugc Autonomous
1 page
Winning Resume Tips for Job Seekers
100% (1)
Winning Resume Tips for Job Seekers
5 pages
DC Converter With Low Pass
No ratings yet
DC Converter With Low Pass
38 pages
What Is The Limitation of Multimodal LLMS? A Deeper Look Into Multimodal LLMs Through Prompt Probing
No ratings yet
What Is The Limitation of Multimodal LLMS? A Deeper Look Into Multimodal LLMs Through Prompt Probing
13 pages
Discrete Structures Tutorial Problems
No ratings yet
Discrete Structures Tutorial Problems
2 pages
8.8.2 Packet Tracer - Compare CLI and SDN Controller Network Management - ILM
No ratings yet
8.8.2 Packet Tracer - Compare CLI and SDN Controller Network Management - ILM
7 pages
Early Project Appraisal Making The Initial Choices (Knut Samset)
No ratings yet
Early Project Appraisal Making The Initial Choices (Knut Samset)
303 pages
Sample Paper - 10
No ratings yet
Sample Paper - 10
20 pages
SGV-advantage-9-22-2023-APAC No.10001108
No ratings yet
SGV-advantage-9-22-2023-APAC No.10001108
32 pages

Lesson 7-Feature Selection and Principal Component Analysis

Uploaded by

Lesson 7-Feature Selection and Principal Component Analysis

Uploaded by

Feature Selection and

 The ultimate objective is to select an optimal number of features to train and

 Wrapper methods: Using a recursive approach to build multiple models using

 Embedded methods: These techniques try to combine the benefits of the

 A simple example of using thresholds is to use variance-based thresholding where

 Recursive Feature Elimination, also known as RFE.

 Principal component analysis (PCA) refers to the process by which principal

 Suppose that we wish to visualize n observations with measurements on a set of p

 PCA provides a tool to do just this

 PCA seeks a small number of dimensions that are as interesting as possible,

 Each of the dimensions found by PCA is a linear combination of the p

 We refer to the elements as the loadings of the first principal

 The problem can be solved via an eigen decomposition, a standard technique

Ninety observations simulated in three dimensions.

 An important point to remember is that PCA is sensitive to feature scaling.

 The process of singular value decomposition, also known as SVD, is another

 Considering a matrix M having dimensions m x n such that m denotes total

𝑃𝐶(𝐷𝑥𝑑) = 𝑉 𝑇 𝑇 = 𝑃𝐶1 𝐷𝑥1 |𝑃𝐶2 𝐷𝑥1 | ⋯ |𝑃𝐶𝑑 𝐷𝑥1

F(n x d) = F(n x D) ⋅PC(D x d)

# Step 1: Standardize the Data

# Step 2-5: PCA

# Plot Explained Variance Ratio

[Link](range(1, len(cumulative_var_ratio) + 1), cumulative_var_ratio, marker='o')

You might also like