Python Basic and Advanced-Day 11
Python Basic and Advanced-Day 11
1 Python Basics
o Python – Overview
2 Python Advanced
o Python - Classes/Objects
o Python - Environment Setup
Content
o Python - Basic Syntax o Python - Reg Expressions
Agenda
Python Basics
Pandas-groupby
groupby operation involves one of the following operations on the original object. They are −
o Splitting the Object
o Applying a function
o Combining the results
ipl_data = {'Team': ['Riders', 'Riders', 'Devils', 'Devils', 'Kings',
'kings', 'Kings', 'Kings', 'Riders', 'Royals', 'Royals', 'Riders'],
'Rank': [1, 2, 2, 3, 3,4 ,1 ,1,2 , 4,1,2],
'Year': [2014,2015,2014,2015,2014,2015,2016,2017,2016,2014,2015,2017],
'Points':[876,789,863,673,741,812,756,788,694,701,804,690]}
df = pd.DataFrame(ipl_data)
print(df)
print(df.groupby('Team'))
Python Basics
Pandas-groupby
print(df.groupby('Team').groups)
get_group()
ipl_data = {'Team': ['Riders', 'Riders', 'Devils', 'Devils', 'Kings',
'kings', 'Kings', 'Kings', 'Riders', 'Royals', 'Royals', 'Riders'],
'Rank': [1, 2, 2, 3, 3,4 ,1 ,1,2 , 4,1,2],
'Year': [2014,2015,2014,2015,2014,2015,2016,2017,2016,2014,2015,2017],
'Points':[876,789,863,673,741,812,756,788,694,701,804,690]}
df = pd.DataFrame(ipl_data)
grouped = df.groupby('Year')
print(grouped.get_group(2014))
Python Basics
Pandas-Merging/Joining
Pandas has full-featured, high performance in-memory join operations idiomatically very similar to relational
databases like SQL.
pd.merge(left, right, how='inner', on=None, left_on=None, right_on=None,
left_index=False, right_index=False, sort=True)
Here, we have used the following parameters −
•left − A DataFrame object.
•right − Another DataFrame object.
•on − Columns (names) to join on. Must be found in both the left and right DataFrame objects.
•left_on − Columns from the left DataFrame to use as keys. Can either be column names or arrays with length equal to the length of the DataFrame.
•right_on − Columns from the right DataFrame to use as keys. Can either be column names or arrays with length equal to the length of the DataFrame.
•left_index − If True, use the index (row labels) from the left DataFrame as its join key(s). In case of a DataFrame with a MultiIndex (hierarchical), the number of levels must match the number
of join keys from the right DataFrame.
•right_index − Same usage as left_index for the right DataFrame.
•how − One of 'left', 'right', 'outer', 'inner'. Defaults to inner. Each method has been described below.
•sort − Sort the result DataFrame by the join keys in lexicographical order. Defaults to True, setting to False will improve the performance substantially in many cases.
Python Basics
Pandas-Merging/Joining
Merge Method SQL Equivalent Description
Joins are used to combine records from two or more tables in a database. Below are the four most commonly used
joins:
Python Basics
Pandas-IO Tools
The Pandas I/O API is a set of top level reader functions accessed like pd.read_csv() that generally return a Pandas
object.
TelecomData=pd.read_csv(Comcast_telecom_complaints_data.csv')
Python Basics
Pandas-IO Tools
Path of file
Dimensionality Check Here we used to check the shape of data frame or series.
Type of Dataset
import pandas as pd
import numpy as np
Slicing and Indexing import seaborn as sns
TelecomData=pd.read_csv(‘Comcast_telecom_complaints_data.csv’)
Value Extraction
Feature Mean
Feature Median
Feature Mode
Python Basics
Data Exploration Technique
Type of Dataset
import pandas as pd
import numpy as np
Slicing and Indexing import seaborn as sns
TelecomData=pd.read_csv(‘Comcast_telecom_complaints_data.csv’)
Value Extraction
Feature Mean
Feature Median
Feature Mode
Python Basics
Data Exploration Technique
Dimensionality Check You can use the : operator with the start index on left and end index on
right of it to output the corresponding slice.
Type of Dataset Slicing a list:
list = [1,2,3,4,5]
Slicing and Indexing List[1:3]
Output- [2,3]
Slicing a Data frame (df) using iloc indexer:
Identifying unique elements TelecomData.iloc[:,1:3]
Value Extraction
Feature Mean
Feature Median
Feature Mode
Python Basics
Data Exploration Technique
Dimensionality Check Using unique ( ) on the column of interest will return a numpy array with
unique values of the column..
Type of Dataset
TelecomData['Date'].unique()
Slicing and Indexing
Value Extraction
Feature Mean
Feature Median
Feature Mode
Python Basics
Data Exploration Technique
Dimensionality Check Using value ( ) on the column of interest will return a numpy array with all
the values of the column.
Type of Dataset
TelecomData['Ticket #'].values
Slicing and Indexing
Value Extraction
Feature Mean
Feature Median
Feature Mode
Python Basics
Data Exploration Technique
Dimensionality Check Using mean( ) on the data frame will return mean of the data frame
across all the columns.
Type of Dataset
import pandas as pd
Slicing and Indexing import numpy as np
import seaborn as sns
test=pd.read_excel(‘Test.xlsx’)
Identifying unique elements test['values'].mean()
Value Extraction
Feature Mean
Feature Median
Feature Mode
Python Basics
Data Exploration Technique
Dimensionality Check Using median ( ) function on the column of interest we can have median
Value Extraction
Feature Mean
Feature Median
Feature Mode
Python Basics
Data Exploration Technique
Dimensionality Check Using mode ( ) function on the column of interest we can have mode.
Value Extraction
Feature Mean
Feature Median
Feature Mode
Python Basics
Data wrangling techniques-Missing Values
import pandas as pd
import numpy as np
import seaborn as sns
test=pd.read_excel(‘Test.xlsx’)
test.isna().any
Python Basics
Data Wrangling- Handling missing treatment (mean, median, mode)
The process of manually converting or mapping data from one raw format into another format is called data wrangling.
This includes munging and data visualization.
Example-
Outlier Detection
sns.boxplot(x=test['values'])
Python Basics
Data Wrangling- Handling missing treatment
Outlier Treatment
filter=test['values'].values<180
df1_outlier_rem=test[filter]
df1_outlier_rem
sns.boxplot(x=df1_outlier_rem['values'])
Python Basics
Data manipulation techniques
Data manipulation is the process of changing data to make it easier to read or be more organized. For example, a
log of data could be organized in alphabetical order, making individual entries easier to locate
Head() print(test.head(6))
tail() print(test.tail(6))
values() test['values'].values
print(test.groupby('values'))
groupby()
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],'B': ['B0', 'B1', 'B2', 'B3'],'C': ['C0', 'C1', 'C2', 'C3'],'D': ['D0', 'D1', 'D2', 'D3']},index=[0, 1, 2, 3])
concatination df2 = pd.DataFrame({'A': ['A4', 'A5', 'A6', 'A7'],'B': ['B4', 'B5', 'B6', 'B7'],'C': ['C4', 'C5', 'C6', 'C7'],'D': ['D4', 'D5', 'D6', 'D7']},index=[4, 5, 6, 7])
frames = [df1, df2]
result = pd.concat(frames)
import pandas as pd