Data Preprocessing Python
Data Preprocessing Python
Technology
1
Lab Assignment
2
Index
⚫ Data Preprocessing steps in Python
⚫ Importing the libraries.
⚫ Importing the Dataset.
⚫ Handling of Missing Data.
⚫ Handling of Categorical Data.
⚫ Splitting the dataset into training and testing datasets.
⚫ Feature Scaling.
3
4
Step 1: Import Libraries
⚫ Following are the key libraries that we will need to perform
Assignment.
⚫ NumPy
⚫ SciPy
⚫ Pandas
⚫ SciKit-Learn
⚫ matplotlib
⚫ Seaborn
⚫ Bokeh
⚫ Altair
⚫ Plotly
⚫ ggplot
⚫ Eg: import pandas as pd
5
Step 2: Import the Dataset
⚫ There are different file format commonly used to read data
from
⚫ .csv
⚫ .xls
⚫ .txt
dataset =
pd.read_excel(‘age_salary.xls’)
dataset =
pd.read_table(‘age_salary.txt’)
6
Methods for preprocessing data
⚫ .head()
⚫ .tail()
⚫ .columns()
⚫ .info()
⚫ .describe()
⚫ .dtypes()
⚫ .index()
⚫ fillna()
⚫ dropna()
⚫ isnull()
7
⚫ isna()
• Demo Program
Methods description
A DataFrame is a 2-dimensional data structure that can store data of different types
(including characters, integers, floating point values, factors and more) in columns.
df.attribute description
9
Introduction to Visualization
description
distplot histogram
barplot estimate of central tendency for a numeric
variable
jointplot Scatterplot
regplot Regression plot
pairplot Pairplot
10
References
⚫ https://data-flair.training/blogs/python-ml-data-preproc
essing/
⚫ Python for Data Analysis, Research Computing
Services, Katia Oleinik ([email protected])
⚫ https://blog.insightdatascience.com/data-visualization-
in-python-advanced-functionality-in-seaborn-
20d217f1a9a6
11