How to extract features using PCA in Python?

This recipe helps you extract features using PCA in Python
Last Updated: 22 Dec 2022

Get access to Data Science projects View all Data Science projects

FEATURE EXTRACTION DATA CLEANING PYTHON DATA MUNGING MACHINE LEARNING RECIPES PANDAS CHEATSHEET ALL TAGS

Recipe Objective

In many datasets we find that number of features are very large and if we want to train the model it take more computational cost. To decrease the number of features we can use Principal component analysis (PCA). PCA decrease the number of features by selecting dimension of features which have most of the variance.

So this recipe is a short example of how can extract features using PCA in Python

Get Closer To Your Dream of Becoming a Data Scientist with 70+ Solved End-to-End ML Projects

Recipe Objective

Step 1 - Import the library

from sklearn import decomposition, datasets from sklearn.preprocessing import StandardScaler

Here we have imported various modules like decomposition, datasets and StandardScale from differnt libraries. We will understand the use of these later while using it in the in the code snipet.
For now just have a look on these imports.

Step 2 - Setup the Data

Here we have used datasets to load the inbuilt cancer dataset and we have created objects X and y to store the data and the target value respectively. dataset = datasets.load_breast_cancer() X = dataset.data print(X.shape) print(X)

Explore More Data Science and Machine Learning Projects for Practice. Fast-Track Your Career Transition with ProjectPro

Step 3 - Using StandardScaler and PCA

StandardScaler is used to remove the outliners and scale the data by making the mean of the data 0 and standard deviation as 1. So we are creating an object std_scl to use standardScaler. std_slc = StandardScaler() X_std = std_slc.fit_transform(X) print(X_std.shape) print(X_std)

We are also using Principal Component Analysis(PCA) which will reduce the dimension of features by creating new features which have most of the varience of the original data. We have passed the parameter n_components as 4 which is the number of feature in final dataset. pca = decomposition.PCA(n_components=4) X_std_pca = pca.fit_transform(X_std) print(X_std_pca.shape) print(X_std_pca) As an output we get:

(569, 30)

[[1.799e+01 1.038e+01 1.228e+02 ... 2.654e-01 4.601e-01 1.189e-01]
 [2.057e+01 1.777e+01 1.329e+02 ... 1.860e-01 2.750e-01 8.902e-02]
 [1.969e+01 2.125e+01 1.300e+02 ... 2.430e-01 3.613e-01 8.758e-02]
 ...
 [1.660e+01 2.808e+01 1.083e+02 ... 1.418e-01 2.218e-01 7.820e-02]
 [2.060e+01 2.933e+01 1.401e+02 ... 2.650e-01 4.087e-01 1.240e-01]
 [7.760e+00 2.454e+01 4.792e+01 ... 0.000e+00 2.871e-01 7.039e-02]]

(569, 30)

[[ 1.09706398 -2.07333501  1.26993369 ...  2.29607613  2.75062224
   1.93701461]
 [ 1.82982061 -0.35363241  1.68595471 ...  1.0870843  -0.24388967
   0.28118999]
 [ 1.57988811  0.45618695  1.56650313 ...  1.95500035  1.152255
   0.20139121]
 ...
 [ 0.70228425  2.0455738   0.67267578 ...  0.41406869 -1.10454895
  -0.31840916]
 [ 1.83834103  2.33645719  1.98252415 ...  2.28998549  1.91908301
   2.21963528]
 [-1.80840125  1.22179204 -1.81438851 ... -1.74506282 -0.04813821
  -0.75120669]]

(569, 4)

[[ 9.19283682  1.94858315 -1.12316659  3.63373524]
 [ 2.3878018  -3.76817178 -0.52929307  1.1182629 ]
 [ 5.73389628 -1.07517381 -0.55174687  0.91208083]
 ...
 [ 1.25617928 -1.90229673  0.56273054 -2.0892281 ]
 [10.37479406  1.67201009 -1.87702907 -2.35603254]
 [-5.4752433  -0.67063675  1.49044361 -2.29915639]]

Download Materials

iPython Notebook

What Users are saying..

Abhinav Agarwal

Graduate Student at Northwestern University

I come from Northwestern University, which is ranked 9th in the US. Although the high-quality academics at school taught me all the basics I needed, obtaining practical experience was a challenge.... Read More

Relevant Projects

Machine Learning Projects

Data Science Projects

Python Projects for Data Science

Data Science Projects in R

Machine Learning Projects for Beginners

Deep Learning Projects

Neural Network Projects

Tensorflow Projects

NLP Projects

Kaggle Projects

IoT Projects

Big Data Projects

Hadoop Real-Time Projects Examples

Spark Projects

Data Analytics Projects for Students

Relevant Projects

Build Portfolio Optimization Machine Learning Models in R

Machine Learning Project for Financial Risk Modelling and Portfolio Optimization with R- Build a machine learning model in R to develop a strategy for building a portfolio for maximized returns.

View Project Details

Deploy Transformer BART Model for Text summarization on GCP

Learn to Deploy a Machine Learning Model for the Abstractive Text Summarization on Google Cloud Platform (GCP)

View Project Details

Learn How to Build a Logistic Regression Model in PyTorch

In this Machine Learning Project, you will learn how to build a simple logistic regression model in PyTorch for customer churn prediction.

View Project Details

Learn How to Build PyTorch Neural Networks from Scratch

In this deep learning project, you will learn how to build PyTorch neural networks from scratch.

View Project Details

Autogen Project to Build an Intelligent AI Personal Assistant

Build a multi-agent AI personal assistant using Autogen that can handle tasks like managing calendars, emails, reminders, messaging, research, and weather updates, automating everyday workflows with LLMs and tool integrations.

View Project Details

ML Model Deployment on AWS for Customer Churn Prediction

MLOps Project-Deploy Machine Learning Model to Production Python on AWS for Customer Churn Prediction

View Project Details

Azure Text Analytics for Medical Search Engine Deployment

Microsoft Azure Project - Use Azure text analytics cognitive service to deploy a machine learning model into Azure Databricks

View Project Details

Avocado Machine Learning Project Python for Price Prediction

In this ML Project, you will use the Avocado dataset to build a machine learning model to predict the average price of avocado which is continuous in nature based on region and varieties of avocado.

View Project Details

End-to-End Snowflake Healthcare Analytics Project on AWS-2

In this AWS Snowflake project, you will build an end to end retraining pipeline by checking Data and Model Drift and learn how to redeploy the model if needed

View Project Details

FEAST Feature Store Example for Scaling Machine Learning

FEAST Feature Store Example- Learn to use FEAST Feature Store to manage, store, and discover features for customer churn prediction machine learning project.

View Project Details

How to extract features using PCA in Python?

Recipe Objective

Table of Contents

Step 1 - Import the library

Step 2 - Setup the Data

Step 3 - Using StandardScaler and PCA

What Users are saying..

Abhinav Agarwal

Relevant Projects

You might also like

Relevant Projects