04-Data Manipulation With Pandas

The document provides an introduction to Python's Pandas library, detailing its installation, features, and data structures such as Series and DataFrame. It covers various operations including data retrieval, handling missing data, filtering, addition, deletion, merging, and grouping. Additionally, it discusses statistical functions, transformation functions, categorical data, and working with time series data.

Uploaded by

Nguyễn Lê Vy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

60 views28 pages

04-Data Manipulation With Pandas

Uploaded by

Nguyễn Lê Vy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Introduction to Python Libraries

data
Operations on Pandas
➔ Retrieving Data from csv file
➔ Handling of missing data
➔ Data Extraction/Filter
➔ Data Addition/Deletion
➔ Concatenation DataFrame
➔ Merging /Joining DataFrame
➔ Data Grouping
Retrieving Data from CSV
➔ CSV (comma-separated value) files are a common file format for
transferring and storing data.
➔ Using read_csv() function to retrieve data from CSV file , where
the delimiter is a comma character.
➔ Demo
Handling of missing data
Missing data
➔ a very big problem in a real-life
scenarios
➔ it exists and was not collected or it
never existed
➔ represented for None and NaN (Not a
Number) indicating missing or null
values
➔ Checking for missing values using isnull() and notnull()
➔ Filling missing values using fillna(), replace() and interpolate()
➔ Dropping missing values using dropna()
Data Filter
➔ Using filter() method
[Link](items, like, regex, axis)

● item – Takes list of axis labels that need to filter.

● like – Takes axis string label that need to filter
● regex – regular expression
● axis – {0 or ‘index’, 1 or ‘columns’, None}, default None. When
not specified it used columns.
Data Addition
➔ Adding a new column data: declare a new list as a column data
and add to a existing Dataframe

➔ Adding a new row data: concat the old dataframe with new one
Data Deletion
➔ Using the drop() method
➔ Delete a column:

➔ Deleting a new row: concat the old dataframe with new one
Data Merging/Joining
➔ Using merge()method to combine data on common columns or indices.
➔ Using join()method to combine the columns of two differently-indexed
DataFrames into a single result DataFrame based on a key column or an
index.

● with one unique key combination

on =[key]
how= ‘inner’ how= ‘left’ ● using multiple join keys
on=[key1,key2,...]

how= how= ‘outer’

‘right’
Joining Examples
Data Grouping
➔ grouping the data according to the categories and apply a function
to the categories.
➔ often involves 3 operations:
● Splitting the Data Object
● Applying a function
● Combining the results
Examples of Splitting Data Objects
Using groupby() for splitting the dataframe over some
criteria into data subsets.

● groupby('key')

● groupby(['key1','key2']
)
Statistical functions in Pandas
➔ computing a summary statistic
sum() Compute sum of column values first() Compute first of group values

last() Compute last of group values

min() Compute min of column values
count() Compute count of column values
max() Compute max of column values

mean() Compute mean of column std() Standard deviation of column

size() Compute column sizes var() Compute variance of column

describe() Generates descriptive statistics sem() Standard error of the mean of

column
Transformation Functions
➔ Returns a self-produced dataframe with transformed values after
applying the function specified in its parameter .
◆ apply()
◆ applymap()
◆ melt()
◆ transform()
Examples
applymap () function
➔ Used to apply a function to a Dataframe elementwise.
[Link]( func)
● func: Python function, returns a single value from a single value.
melt() function
➔ used to unpivot a given DataFrame from wide format to long format
[Link]([id_vars], [value_vars],var_name, value_name,
[col_level])
● [id_vars]: column(s) to use as identifier variables.
● [value_vars]: column(s) to unpivot. If not specified, uses all columns that
are not set as id_vars.
● var_name: name to use for the ‘variable’ column. If None it uses
[Link] or ‘variable’.
● value_name: name to use for the ‘value’ column.
● [col_level]: if columns are a MultiIndex then use this level to melt.
transform() function
➔ used to call function (func) on self producing a DataFrame with transformed
values and that has the same axis length as self.
[Link](func, axis, *args, **kwargs)
● func: Function to use for transforming the data
● axis: 0 or ‘index’: apply function to each column; 1 or ‘columns’: apply
function to each row.
● *args: Positional arguments to pass to func.
● **kwargs: Keyword arguments to pass to func.
Categorical Data
➔ a pandas data type corresponding to categorical variables in statistics, e.g.
gender, social class, blood type,....
➔ Using the standard pandas Categorical constructor to create categorical
object
[Link](values, categories, ordered)
Working with time series data
➔ A time series is any data set where the values are measured at
different points in time.
◆ uniformly spaced. Eg. hourly weather measurements, daily
counts of web site visits, or monthly sales totals.
◆ irregularly spaced. Eg. timestamped data in a computer
system’s event log, a history of 115 emergency calls
➔ Pandas provides useful objects in working with time series data:
◆ Timestamp Object
◆ Period Object
◆ Timedelta Object
Period Object
➔ Period object represents an interval in time used to check if a
specific event occurs within a certain period such as when
monitoring the number of flights taking off or the average stock
price during a period.
# Create time period
p1 = [Link]('2020-12-25')
# Create time stamp
t1 = [Link]('2020-12-25
18:12')
# Test Time interval
p1.start_time < t1 < p1.end_time

Pandas
No ratings yet
Pandas
29 pages
Pandas
No ratings yet
Pandas
5 pages
On Data Handling Using Pandas-I
100% (2)
On Data Handling Using Pandas-I
63 pages
Class Xii Information Practices PPT On Data Handling Using Pandas-I
No ratings yet
Class Xii Information Practices PPT On Data Handling Using Pandas-I
64 pages
1 Data Handling Using Pandas 1
No ratings yet
1 Data Handling Using Pandas 1
63 pages
Pandas Library: Data Manipulation & Analysis Guide
No ratings yet
Pandas Library: Data Manipulation & Analysis Guide
9 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
13 pages
Mastering Pandas: DataFrame Operations
100% (2)
Mastering Pandas: DataFrame Operations
33 pages
Introduction to Pandas Library
No ratings yet
Introduction to Pandas Library
31 pages
DAP 3 Module
No ratings yet
DAP 3 Module
62 pages
Pandas
No ratings yet
Pandas
25 pages
DataFrame Ac Win Final
No ratings yet
DataFrame Ac Win Final
30 pages
All Document Reader 1715619870900
No ratings yet
All Document Reader 1715619870900
6 pages
Rajni Ip File Final
No ratings yet
Rajni Ip File Final
42 pages
Pandas
No ratings yet
Pandas
13 pages
Introduction To Pandas in Data Analytics
No ratings yet
Introduction To Pandas in Data Analytics
12 pages
Lab 9
No ratings yet
Lab 9
9 pages
Pandas
No ratings yet
Pandas
13 pages
DevOps Session 3 Pandas
No ratings yet
DevOps Session 3 Pandas
33 pages
Pandas Series and DataFrames Guide
100% (2)
Pandas Series and DataFrames Guide
64 pages
Pandas Data Structures and Operations
No ratings yet
Pandas Data Structures and Operations
36 pages
Pandas
No ratings yet
Pandas
2 pages
Pandas: Import
100% (1)
Pandas: Import
13 pages
Unit IV
No ratings yet
Unit IV
49 pages
Pandas DataFrame Cheat Sheet Guide
No ratings yet
Pandas DataFrame Cheat Sheet Guide
10 pages
Python Pandas DataFrame Guide
100% (2)
Python Pandas DataFrame Guide
23 pages
Introduction To Pandas
No ratings yet
Introduction To Pandas
27 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
12 pages
Pandas DataFrame Basics Guide
No ratings yet
Pandas DataFrame Basics Guide
32 pages
Pandas Notes
No ratings yet
Pandas Notes
20 pages
Python Pandas Tutorial For Beginners
100% (1)
Python Pandas Tutorial For Beginners
203 pages
Cheat Sheet
No ratings yet
Cheat Sheet
12 pages
FDS Module 2 Notes
No ratings yet
FDS Module 2 Notes
24 pages
Pandas
No ratings yet
Pandas
63 pages
Overview of Pandas DataFrames
No ratings yet
Overview of Pandas DataFrames
21 pages
Pandas DataFrame Cheat Sheet
No ratings yet
Pandas DataFrame Cheat Sheet
4 pages
Pandas DataFrame Cheat Sheet
100% (1)
Pandas DataFrame Cheat Sheet
10 pages
Pandas
No ratings yet
Pandas
7 pages
Data Handling with Pandas: Series & DataFrame
No ratings yet
Data Handling with Pandas: Series & DataFrame
44 pages
Introduction to Pandas Library in Python
No ratings yet
Introduction to Pandas Library in Python
39 pages
Class XII IP Key Points (Python Pandas)
No ratings yet
Class XII IP Key Points (Python Pandas)
5 pages
Pandas Research
No ratings yet
Pandas Research
14 pages
Unit 4
No ratings yet
Unit 4
36 pages
Pandas Class 12 Ncertttt
No ratings yet
Pandas Class 12 Ncertttt
48 pages
Exp3 Python
No ratings yet
Exp3 Python
15 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
10 pages
Phan1 Pandas Numpy Matplotlib
No ratings yet
Phan1 Pandas Numpy Matplotlib
158 pages
Pandas Notes
No ratings yet
Pandas Notes
6 pages
ML Unit-2 Notes
No ratings yet
ML Unit-2 Notes
17 pages
Pandas DataFrame Cheat Sheet Guide
No ratings yet
Pandas DataFrame Cheat Sheet Guide
12 pages
1 Data Handlinng Using Pandas-I
No ratings yet
1 Data Handlinng Using Pandas-I
46 pages
Pandas Python Data Analysis Guide
No ratings yet
Pandas Python Data Analysis Guide
32 pages
Introduction to Pandas Basics
No ratings yet
Introduction to Pandas Basics
6 pages
Pandas DataFrame Notes
100% (1)
Pandas DataFrame Notes
10 pages
Data Handing Using Pandas-I
100% (2)
Data Handing Using Pandas-I
46 pages
Data Handling Part Ii
No ratings yet
Data Handling Part Ii
41 pages
Notes Unit 5 Information Security Management System
No ratings yet
Notes Unit 5 Information Security Management System
17 pages
AssuredSAN 4xx4 Configuration Guide
No ratings yet
AssuredSAN 4xx4 Configuration Guide
44 pages
CNC 8070 Installation Manual Guide
No ratings yet
CNC 8070 Installation Manual Guide
830 pages
User Manual: Allen-Bradley
No ratings yet
User Manual: Allen-Bradley
106 pages
Steps To Deactivate Activity in Pipeline 1740623841
No ratings yet
Steps To Deactivate Activity in Pipeline 1740623841
12 pages
HDM-4 Modeling Framework Overview
No ratings yet
HDM-4 Modeling Framework Overview
9 pages
EXAM
No ratings yet
EXAM
3 pages
License Agreement
No ratings yet
License Agreement
14 pages
AI Report Format
No ratings yet
AI Report Format
23 pages
Class XI Unsolved Question&Answers (Part B-Unit-1 To 4)
No ratings yet
Class XI Unsolved Question&Answers (Part B-Unit-1 To 4)
13 pages
SQL - Drop Table - 1keydata
No ratings yet
SQL - Drop Table - 1keydata
1 page
IT-PREBOARD - Practice Paper
No ratings yet
IT-PREBOARD - Practice Paper
11 pages
Data Structures and Algorithms
No ratings yet
Data Structures and Algorithms
23 pages
Using Computer and Managing Files
No ratings yet
Using Computer and Managing Files
51 pages
COM 101 - Introduction To Computing - Comprehensive Revision Notes For ND1 Exams
No ratings yet
COM 101 - Introduction To Computing - Comprehensive Revision Notes For ND1 Exams
5 pages
51talk Next Step Reminders
No ratings yet
51talk Next Step Reminders
3 pages
(Trainers Copy) xAI Instructions and Guidelines-2
No ratings yet
(Trainers Copy) xAI Instructions and Guidelines-2
10 pages
Lab 4
No ratings yet
Lab 4
3 pages
Evolution of Ubuntu Operating System
No ratings yet
Evolution of Ubuntu Operating System
31 pages
LibreNMS Setup Guide for CentOS 8
No ratings yet
LibreNMS Setup Guide for CentOS 8
29 pages
DHH 805
No ratings yet
DHH 805
4 pages
Advertisment For Engaging Project Staff in HCP522401, HCP522401 & MLP0017 Projects
No ratings yet
Advertisment For Engaging Project Staff in HCP522401, HCP522401 & MLP0017 Projects
7 pages
LACE Handbook 2015
No ratings yet
LACE Handbook 2015
28 pages
Control Structures in C
No ratings yet
Control Structures in C
9 pages
Pro Club 777
No ratings yet
Pro Club 777
5 pages
Tarmoq Boshqaruvida SNMP Protokoli
No ratings yet
Tarmoq Boshqaruvida SNMP Protokoli
10 pages
ServiceNow Developer Expertise
No ratings yet
ServiceNow Developer Expertise
3 pages
Digital Student ID System (DSID)
No ratings yet
Digital Student ID System (DSID)
2 pages
Prison Life GUI NEW OP FE KILL ALL
No ratings yet
Prison Life GUI NEW OP FE KILL ALL
7 pages
Data Analysis
No ratings yet
Data Analysis
1 page

04-Data Manipulation With Pandas

Uploaded by

04-Data Manipulation With Pandas

Uploaded by

Introduction to Python Libraries

Nguyen Ngoc Thao, [Link]

● item – Takes list of axis labels that need to filter.

● with one unique key combination

how= how= ‘outer’

last() Compute last of group values

mean() Compute mean of column std() Standard deviation of column

size() Compute column sizes var() Compute variance of column

describe() Generates descriptive statistics sem() Standard error of the mean of

You might also like