0% found this document useful (0 votes)
12 views

Python Basic and Advanced-Day 11

This document provides an overview of Python basics and advanced concepts. It begins with an introduction to Python and covers fundamental topics like variables, data types, operators, control flow and more. It then discusses more advanced Python topics such as classes, objects, regular expressions, databases and networking. The document also includes sections on Pandas basics and advanced Pandas techniques like groupby operations, merging/joining dataframes, and input/output tools.

Uploaded by

Ashok Kumar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Python Basic and Advanced-Day 11

This document provides an overview of Python basics and advanced concepts. It begins with an introduction to Python and covers fundamental topics like variables, data types, operators, control flow and more. It then discusses more advanced Python topics such as classes, objects, regular expressions, databases and networking. The document also includes sections on Pandas basics and advanced Pandas techniques like groupby operations, merging/joining dataframes, and input/output tools.

Uploaded by

Ashok Kumar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Python Basics & Advanced

1 Python Basics
o Python – Overview
2 Python Advanced
o Python - Classes/Objects
o Python - Environment Setup
Content
o Python - Basic Syntax o Python - Reg Expressions

o Python - Variable Types o Python - CGI Programming

o Python - Basic Operators o Python - Database Access

o Python - Decision Making o Python - Networking


o Python - Loops o Python - Sending Email
o Python - Numbers o Python - Multithreading
o Python - Strings o Python - XML Processing
o Python - Lists o Python - GUI Programming
o Python - Tuples
o Python - Dictionary
o Python - Date & Time
o Python - Functions
o Python - Modules
o Python - Files I/O
o Python - Exception
o Quick recap
Day- o Working with Excel file with Pandas.

Agenda
Python Basics
Pandas-groupby

groupby operation involves one of the following operations on the original object. They are −
o Splitting the Object
o Applying a function
o Combining the results
ipl_data = {'Team': ['Riders', 'Riders', 'Devils', 'Devils', 'Kings',
'kings', 'Kings', 'Kings', 'Riders', 'Royals', 'Royals', 'Riders'],
'Rank': [1, 2, 2, 3, 3,4 ,1 ,1,2 , 4,1,2],
'Year': [2014,2015,2014,2015,2014,2015,2016,2017,2016,2014,2015,2017],
'Points':[876,789,863,673,741,812,756,788,694,701,804,690]}
df = pd.DataFrame(ipl_data)

print(df)

Split Data into Groups


Pandas object can be split into any of their objects. There are multiple ways to split an object like −
•obj.groupby('key')
•obj.groupby(['key1','key2'])
•obj.groupby(key,axis=1)

print(df.groupby('Team'))
Python Basics
Pandas-groupby

ipl_data = {'Team': ['Riders', 'Riders', 'Devils', 'Devils', 'Kings',


'kings', 'Kings', 'Kings', 'Riders', 'Royals', 'Royals', 'Riders'],
'Rank': [1, 2, 2, 3, 3,4 ,1 ,1,2 , 4,1,2],
'Year':
[2014,2015,2014,2015,2014,2015,2016,2017,2016,2014,2015,20
17],
'Points':[876,789,863,673,741,812,756,788,694,701,804,690]}
df = pd.DataFrame(ipl_data)

print(df.groupby('Team').groups)

get_group()
ipl_data = {'Team': ['Riders', 'Riders', 'Devils', 'Devils', 'Kings',
'kings', 'Kings', 'Kings', 'Riders', 'Royals', 'Royals', 'Riders'],
'Rank': [1, 2, 2, 3, 3,4 ,1 ,1,2 , 4,1,2],
'Year': [2014,2015,2014,2015,2014,2015,2016,2017,2016,2014,2015,2017],
'Points':[876,789,863,673,741,812,756,788,694,701,804,690]}
df = pd.DataFrame(ipl_data)
grouped = df.groupby('Year')
print(grouped.get_group(2014))
Python Basics
Pandas-Merging/Joining

Pandas has full-featured, high performance in-memory join operations idiomatically very similar to relational
databases like SQL.
pd.merge(left, right, how='inner', on=None, left_on=None, right_on=None,
left_index=False, right_index=False, sort=True)
Here, we have used the following parameters −
•left − A DataFrame object.
•right − Another DataFrame object.
•on − Columns (names) to join on. Must be found in both the left and right DataFrame objects.
•left_on − Columns from the left DataFrame to use as keys. Can either be column names or arrays with length equal to the length of the DataFrame.
•right_on − Columns from the right DataFrame to use as keys. Can either be column names or arrays with length equal to the length of the DataFrame.
•left_index − If True, use the index (row labels) from the left DataFrame as its join key(s). In case of a DataFrame with a MultiIndex (hierarchical), the number of levels must match the number
of join keys from the right DataFrame.
•right_index − Same usage as left_index for the right DataFrame.
•how − One of 'left', 'right', 'outer', 'inner'. Defaults to inner. Each method has been described below.
•sort − Sort the result DataFrame by the join keys in lexicographical order. Defaults to True, setting to False will improve the performance substantially in many cases.
Python Basics
Pandas-Merging/Joining
Merge Method SQL Equivalent Description

left LEFT OUTER JOIN Use keys from left object

right RIGHT OUTER JOIN Use keys from right object

outer FULL OUTER JOIN Use union of keys

inner INNER JOIN Use intersection of keys


id Name subject_id id Name_x subject_id_x Name_y subject_id_y
0 1 Alex sub1 Billy sub2
0 1 Alex sub1 1 2 Amy sub2 Brian sub4
1 2 Amy sub2 2 3 Allen sub4 Bran sub3
2 3 Allen sub4 3 4 Alice sub6 Bryce sub6
3 4 Alice sub6 4 5 Ayoung sub5 Betty sub5
id_x Name_x subject_id id_y Name_y
left = pd.DataFrame({ 4 5 Ayoung sub5 0 1 Alex sub1 NaN NaN
'id':[1,2,3,4,5], id Name subject_id 1 2 Amy sub2 1.0 Billy
0 1 Billy sub2 2 3 Allen sub4 2.0 Brian
'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'], 1 2 Brian sub4 3 4 Alice sub6 4.0 Bryce
4 5 Ayoung sub5 5.0 Betty
'subject_id':['sub1','sub2','sub4','sub6','sub5']}) 2 3 Bran sub3 id_x Name_x subject_id id_y Name_y
right = pd.DataFrame( 3 4 Bryce sub6 0 2.0 Amy sub2 1 Billy
4 5 Betty sub5 1 3.0 Allen sub4 2 Brian
{'id':[1,2,3,4,5], 2 4.0 Alice
3 5.0 Ayoung
sub6 4 Bryce
sub5 5 Betty
'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'], 4 NaN NaN sub3 3 Bran
id_x Name_x subject_id id_y Name_y
'subject_id':['sub2','sub4','sub3','sub6','sub5']}) 0 1.0 Alex sub1 NaN NaN
1 2.0 Amy sub2 1.0 Billy
print(left) 2 3.0 Allen sub4 2.0 Brian
print(right) 3 4.0 Alice sub6 4.0 Bryce
4 5.0 Ayoung sub5 5.0 Betty
print pd.merge(left,right,on='id’) 5 NaN NaN sub3 3.0 Bran
id_x Name_x subject_id id_y Name_y
print(pd.merge(left, right, on='subject_id', how='left')) 0 2 Amy sub2 1 Billy
print(pd.merge(left, right, on='subject_id', how='right')) 1 3 Allen
2 4 Alice
sub4 2 Brian
sub6 4 Bryce
print(pd.merge(left, right, how='outer', on='subject_id')) 3 5 Ayoung sub5 5 Betty

print(pd.merge(left, right, on='subject_id', how='inner'))


Python Basics
Pandas-Merging/Joining

Joins are used to combine records from two or more tables in a database. Below are the four most commonly used
joins:
Python Basics
Pandas-IO Tools

The Pandas I/O API is a set of top level reader functions accessed like pd.read_csv() that generally return a Pandas
object.

pandas.read_csv(filepath_or_buffer, sep=',', delimiter=None, header='infer',


names=None, index_col=None, usecols=None

TelecomData=pd.read_csv(Comcast_telecom_complaints_data.csv')
Python Basics
Pandas-IO Tools

Loading csv Files


TelecomData=pd.read_csv(r'C:\Self Creation-PPT\Machine Learning\Comcast_telecom_complaints_data.csv')

Path of file

Loading excel Files


test=pd.read_excel(r'C:\Self Creation-PPT\Machine Learning\Test.xlsx')
Python Basics
Data Exploration Technique

Dimensionality Check Here we used to check the shape of data frame or series.

Type of Dataset
import pandas as pd
import numpy as np
Slicing and Indexing import seaborn as sns
TelecomData=pd.read_csv(‘Comcast_telecom_complaints_data.csv’)

Identifying unique elements TelecomData.shape

Value Extraction

Feature Mean

Feature Median

Feature Mode
Python Basics
Data Exploration Technique

Dimensionality Check Type() function is utilized to fetch the type of object

Type of Dataset
import pandas as pd
import numpy as np
Slicing and Indexing import seaborn as sns
TelecomData=pd.read_csv(‘Comcast_telecom_complaints_data.csv’)

Identifying unique elements type(TelecomData)

Value Extraction

Feature Mean

Feature Median

Feature Mode
Python Basics
Data Exploration Technique

Dimensionality Check You can use the : operator with the start index on left and end index on
right of it to output the corresponding slice.
Type of Dataset Slicing a list:
list = [1,2,3,4,5]
Slicing and Indexing List[1:3]
Output- [2,3]
Slicing a Data frame (df) using iloc indexer:
Identifying unique elements TelecomData.iloc[:,1:3]

Value Extraction

Feature Mean

Feature Median

Feature Mode
Python Basics
Data Exploration Technique

Dimensionality Check Using unique ( ) on the column of interest will return a numpy array with
unique values of the column..
Type of Dataset
TelecomData['Date'].unique()
Slicing and Indexing

Identifying unique elements

Value Extraction

Feature Mean

Feature Median

Feature Mode
Python Basics
Data Exploration Technique

Dimensionality Check Using value ( ) on the column of interest will return a numpy array with all
the values of the column.
Type of Dataset
TelecomData['Ticket #'].values
Slicing and Indexing

Identifying unique elements

Value Extraction

Feature Mean

Feature Median

Feature Mode
Python Basics
Data Exploration Technique

Dimensionality Check Using mean( ) on the data frame will return mean of the data frame
across all the columns.
Type of Dataset
import pandas as pd
Slicing and Indexing import numpy as np
import seaborn as sns
test=pd.read_excel(‘Test.xlsx’)
Identifying unique elements test['values'].mean()

Value Extraction

Feature Mean

Feature Median

Feature Mode
Python Basics
Data Exploration Technique

Dimensionality Check Using median ( ) function on the column of interest we can have median

Type of Dataset import pandas as pd


import numpy as np
import seaborn as sns
Slicing and Indexing test=pd.read_excel(‘Test.xlsx’)
Test.median()
Identifying unique elements

Value Extraction

Feature Mean

Feature Median

Feature Mode
Python Basics
Data Exploration Technique

Dimensionality Check Using mode ( ) function on the column of interest we can have mode.

Type of Dataset import pandas as pd


import numpy as np
Slicing and Indexing import seaborn as sns
test=pd.read_excel(‘Test.xlsx’)
test['values’].mode()
Identifying unique elements

Value Extraction

Feature Mean

Feature Median

Feature Mode
Python Basics
Data wrangling techniques-Missing Values

In dataset, it may happen that values are missing


there and we need to do some treatment to avoid
or overcome this scenario which is call missing
value treatment
Python Basics
Data wrangling techniques- Detecting Missing Values

import pandas as pd
import numpy as np
import seaborn as sns
test=pd.read_excel(‘Test.xlsx’)

test.isna().any
Python Basics
Data Wrangling- Handling missing treatment (mean, median, mode)

The process of manually converting or mapping data from one raw format into another format is called data wrangling.
This includes munging and data visualization.

Example-

from sklearn.preprocessing import Imputer

missingValueImputer = Imputer(missing_values = 'NaN', strategy = 'mean', axis = 0)


missingValueImputer = missingValueImputer.fit(test[["values"]])
test[["values"]] = missingValueImputer.transform(test[["values"]])
test.isna().any
Python Basics
Data Wrangling- Handling missing treatment

Outlier Detection

sns.boxplot(x=test['values'])
Python Basics
Data Wrangling- Handling missing treatment

Outlier Treatment

filter=test['values'].values<180
df1_outlier_rem=test[filter]
df1_outlier_rem

sns.boxplot(x=df1_outlier_rem['values'])
Python Basics
Data manipulation techniques

Data manipulation is the process of changing data to make it easier to read or be more organized. For example, a
log of data could be organized in alphabetical order, making individual entries easier to locate

Head() print(test.head(6))

tail() print(test.tail(6))

values() test['values'].values

print(test.groupby('values'))
groupby()

df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],'B': ['B0', 'B1', 'B2', 'B3'],'C': ['C0', 'C1', 'C2', 'C3'],'D': ['D0', 'D1', 'D2', 'D3']},index=[0, 1, 2, 3])
concatination df2 = pd.DataFrame({'A': ['A4', 'A5', 'A6', 'A7'],'B': ['B4', 'B5', 'B6', 'B7'],'C': ['C4', 'C5', 'C6', 'C7'],'D': ['D4', 'D5', 'D6', 'D7']},index=[4, 5, 6, 7])
frames = [df1, df2]
result = pd.concat(frames)

merging result = pd.merge(df1, df2, on=[‘A', ‘B'])


Python Basics
Merging of Excel files:-

import pandas as pd

# Read all three files into pandas dataframes


marketing_analyst_names = pd.read_excel("MarketingAnalystNames.xlsx")
sales_rep_names = pd.read_excel("SalesRepNames.xlsx")
senior_leadership_names = pd.read_excel("SeniorLeadershipNames.xlsx")

# Create a list of the files in order you want them appended


all_df_list = [marketing_analyst_names, sales_rep_names, senior_leadership_names]

# Merge all the dataframes in all_df_list


# Pandas will automatically append based on similar column names
appended_df = pd.concat(all_df_list)

# Write the appended dataframe to an excel file


# Add index=False parameter to not include row numbers
appended_df.to_excel("AllCompanyNames.xlsx", index=False)
Thank you

You might also like