0% found this document useful (0 votes)
60 views28 pages

04-Data Manipulation With Pandas

The document provides an introduction to Python's Pandas library, detailing its installation, features, and data structures such as Series and DataFrame. It covers various operations including data retrieval, handling missing data, filtering, addition, deletion, merging, and grouping. Additionally, it discusses statistical functions, transformation functions, categorical data, and working with time series data.

Uploaded by

Nguyễn Lê Vy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views28 pages

04-Data Manipulation With Pandas

The document provides an introduction to Python's Pandas library, detailing its installation, features, and data structures such as Series and DataFrame. It covers various operations including data retrieval, handling missing data, filtering, addition, deletion, merging, and grouping. Additionally, it discusses statistical functions, transformation functions, categorical data, and working with time series data.

Uploaded by

Nguyễn Lê Vy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Introduction to Python Libraries

Nguyen Ngoc Thao, [Link]


In Charge of Deep Medicine,
Phan Chau Trinh Medical University
[Link]@[Link]
0935192556
Content
➢ Introduction
➢ Pandas Getting Started With
➢ Pandas vs SQL
➢ Pandas Features
➢ Pandas Data Structure
➢ Operations on Pandas
➢ Working with text data
➢ Working with time series data
Introduction
➔ Pandas is an essential tool to data analysis and
manipulation.
Pandas Getting Started
➔ Installing Numpy: pip install pandas
➔ Import Numpy: import pandas
➔ Alias of Numpy: import pandas as pd
➔ Check Numpy version: pd.__version__
Pandas Features
Pandas Data Structure
Pandas Series
➔ a one-dimensional labeled array capable of holding any
data type
Pandas DataFrame
➔ is a 2 dimensional data structure with mutable size and
potentially heterogeneous tabular data.

data
Operations on Pandas
➔ Retrieving Data from csv file
➔ Handling of missing data
➔ Data Extraction/Filter
➔ Data Addition/Deletion
➔ Concatenation DataFrame
➔ Merging /Joining DataFrame
➔ Data Grouping
Retrieving Data from CSV
➔ CSV (comma-separated value) files are a common file format for
transferring and storing data.
➔ Using read_csv() function to retrieve data from CSV file , where
the delimiter is a comma character.
➔ Demo
Handling of missing data
Missing data
➔ a very big problem in a real-life
scenarios
➔ it exists and was not collected or it
never existed
➔ represented for None and NaN (Not a
Number) indicating missing or null
values
➔ Checking for missing values using isnull() and notnull()
➔ Filling missing values using fillna(), replace() and interpolate()
➔ Dropping missing values using dropna()
Data Filter
➔ Using filter() method
[Link](items, like, regex, axis)

● item – Takes list of axis labels that need to filter.


● like – Takes axis string label that need to filter
● regex – regular expression
● axis – {0 or ‘index’, 1 or ‘columns’, None}, default None. When
not specified it used columns.
Data Addition
➔ Adding a new column data: declare a new list as a column data
and add to a existing Dataframe

➔ Adding a new row data: concat the old dataframe with new one
Data Deletion
➔ Using the drop() method
➔ Delete a column:

➔ Deleting a new row: concat the old dataframe with new one
Data Merging/Joining
➔ Using merge()method to combine data on common columns or indices.
➔ Using join()method to combine the columns of two differently-indexed
DataFrames into a single result DataFrame based on a key column or an
index.

● with one unique key combination


on =[key]
how= ‘inner’ how= ‘left’ ● using multiple join keys
on=[key1,key2,...]

how= how= ‘outer’


‘right’
Joining Examples
Data Grouping
➔ grouping the data according to the categories and apply a function
to the categories.
➔ often involves 3 operations:
● Splitting the Data Object
● Applying a function
● Combining the results
Examples of Splitting Data Objects
Using groupby() for splitting the dataframe over some
criteria into data subsets.

● groupby('key')

● groupby(['key1','key2']
)
Statistical functions in Pandas
➔ computing a summary statistic
sum() Compute sum of column values first() Compute first of group values

last() Compute last of group values


min() Compute min of column values
count() Compute count of column values
max() Compute max of column values

mean() Compute mean of column std() Standard deviation of column

size() Compute column sizes var() Compute variance of column

describe() Generates descriptive statistics sem() Standard error of the mean of


column
Transformation Functions
➔ Returns a self-produced dataframe with transformed values after
applying the function specified in its parameter .
◆ apply()
◆ applymap()
◆ melt()
◆ transform()
Examples
applymap () function
➔ Used to apply a function to a Dataframe elementwise.
[Link]( func)
● func: Python function, returns a single value from a single value.
melt() function
➔ used to unpivot a given DataFrame from wide format to long format
[Link]([id_vars], [value_vars],var_name, value_name,
[col_level])
● [id_vars]: column(s) to use as identifier variables.
● [value_vars]: column(s) to unpivot. If not specified, uses all columns that
are not set as id_vars.
● var_name: name to use for the ‘variable’ column. If None it uses
[Link] or ‘variable’.
● value_name: name to use for the ‘value’ column.
● [col_level]: if columns are a MultiIndex then use this level to melt.
transform() function
➔ used to call function (func) on self producing a DataFrame with transformed
values and that has the same axis length as self.
[Link](func, axis, *args, **kwargs)
● func: Function to use for transforming the data
● axis: 0 or ‘index’: apply function to each column; 1 or ‘columns’: apply
function to each row.
● *args: Positional arguments to pass to func.
● **kwargs: Keyword arguments to pass to func.
Categorical Data
➔ a pandas data type corresponding to categorical variables in statistics, e.g.
gender, social class, blood type,....
➔ Using the standard pandas Categorical constructor to create categorical
object
[Link](values, categories, ordered)
Working with time series data
➔ A time series is any data set where the values are measured at
different points in time.
◆ uniformly spaced. Eg. hourly weather measurements, daily
counts of web site visits, or monthly sales totals.
◆ irregularly spaced. Eg. timestamped data in a computer
system’s event log, a history of 115 emergency calls
➔ Pandas provides useful objects in working with time series data:
◆ Timestamp Object
◆ Period Object
◆ Timedelta Object
Period Object
➔ Period object represents an interval in time used to check if a
specific event occurs within a certain period such as when
monitoring the number of flights taking off or the average stock
price during a period.
# Create time period
p1 = [Link]('2020-12-25')
# Create time stamp
t1 = [Link]('2020-12-25
18:12')
# Test Time interval
p1.start_time < t1 < p1.end_time

You might also like