0% found this document useful (0 votes)
23 views

Data Preprocessing Python

The document discusses the steps for pre-processing data in Python including importing libraries and datasets, handling missing and categorical data, splitting datasets, feature scaling, and describing various data visualization techniques. Key steps involve importing Pandas and SciKit-Learn, reading data from files, exploring and cleaning data using methods like dropna() and fillna(), splitting data for training and testing, and illustrating histograms, bar plots, scatter plots, and other visualizations using libraries like Matplotlib and Seaborn.

Uploaded by

Gunjan Suman
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

Data Preprocessing Python

The document discusses the steps for pre-processing data in Python including importing libraries and datasets, handling missing and categorical data, splitting datasets, feature scaling, and describing various data visualization techniques. Key steps involve importing Pandas and SciKit-Learn, reading data from files, exploring and cleaning data using methods like dropna() and fillna(), splitting data for training and testing, and illustrating histograms, bar plots, scatter plots, and other visualizations using libraries like Matplotlib and Seaborn.

Uploaded by

Gunjan Suman
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 11

School of Computer Engineering and

Technology

1
Lab Assignment

Write a python program to perform pre-processing


on suitable dataset and illustrate various
visualization techniques on suitable sample data.
Analyze the same.

2
Index
⚫ Data Preprocessing steps in Python
⚫ Importing the libraries.
⚫ Importing the Dataset.
⚫ Handling of Missing Data.
⚫ Handling of Categorical Data.
⚫ Splitting the dataset into training and testing datasets.
⚫ Feature Scaling.

3
4
Step 1: Import Libraries
⚫ Following are the key libraries that we will need to perform
Assignment.
⚫ NumPy
⚫ SciPy
⚫ Pandas
⚫ SciKit-Learn
⚫ matplotlib
⚫ Seaborn
⚫ Bokeh
⚫ Altair
⚫ Plotly
⚫ ggplot
⚫ Eg: import pandas as pd

5
Step 2: Import the Dataset
⚫ There are different file format commonly used to read data
from
⚫ .csv
⚫ .xls
⚫ .txt

dataset =
pd.read_excel(‘age_salary.xls’)
dataset =
pd.read_table(‘age_salary.txt’)
6
Methods for preprocessing data

⚫ .head()
⚫ .tail()
⚫ .columns()
⚫ .info()
⚫ .describe()
⚫ .dtypes()
⚫ .index()
⚫ fillna()
⚫ dropna()
⚫ isnull()
7
⚫ isna()
• Demo Program
Methods description
A DataFrame is a 2-dimensional data structure that can store data of different types
(including characters, integers, floating point values, factors and more) in columns.

df.attribute description

dtypes list the types of the columns


columns list the column names
axes list the row labels and column names
ndim number of dimensions

size number of elements


shape return a tuple representing the dimensionality

values numpy representation of the data


8
df.method() description
head( [n] ), first/last n rows
tail( [n] )
describe() generate descriptive statistics (for
numeric columns only)
max(), min() return max/min values for all numeric
columns
mean(), median() return mean/median values for all numeric
columns
std() standard deviation

dropna() drop all the records with missing values

9
Introduction to Visualization

description
distplot histogram
barplot estimate of central tendency for a numeric
variable
jointplot Scatterplot
regplot Regression plot
pairplot Pairplot

10
References
⚫ https://data-flair.training/blogs/python-ml-data-preproc
essing/
⚫ Python for Data Analysis, Research Computing
Services, Katia Oleinik ([email protected])
⚫ https://blog.insightdatascience.com/data-visualization-
in-python-advanced-functionality-in-seaborn-
20d217f1a9a6

11

You might also like