0% found this document useful (0 votes)
6 views

Reading Material For Data Handling Using Pandas-I

Uploaded by

rajwinder kaur
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Reading Material For Data Handling Using Pandas-I

Uploaded by

rajwinder kaur
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

Informatics Practices

(Code No. 065)


CLASS XII
2020-2021

By
B. Naresh​, PGT Computer Science
JNV Khammam
----------------------------------------------------------------------------------------------------------
Blue Print​:

Unit No Unit Name Marks

1 Data Handling using Pandas and Data 30


Visualization

2 Database Query using SQL 25

3 Introduction to Computer Networks 7

4 Societal Impacts 8

Practical 30

Total 100

Unit-1:
Data Handling using Pandas and Data Visualization

Data Handling using Pandas -I

Module:​ Module​ is a file which contains python functions. It is a .py file which has python executable
code or statements.

Creating Module​: mod_sample.py

def welcome(user):

print("Welcome to the IP Class",user)

Importing Module:​ import mod_sample

mod_sample.welcome("JNV Khammam")

o/p: Welcome to the IP Class JNV Khammam

Package:​ ​Package is a namespace which contains multiple packages or modules. It is a directory


which contains a special file __init__.py.

__init__.py file denotes Python the file that contains __init__.py as package.

Creating a package​: package1

Module: pack_sample.py

def welcome(user):

print("Welcome to the IP Class",user)

Mandatory script file:__init__.py (empty script file)

Then create package ‘package1’ by adding these two scripts to the folder package1.(Please find the
below image)

Importing functions from package​: from package1 import pack_sample

pack_sample.welcome("Package")

o/p: Welcome to the package concpet Package

Library:​ ​It is a collection of various packages. There is no difference between package and python
library conceptually.

Framework:​ ​It is a collection of various libraries which architects the code flow.

Pandas:

Pandas is the most popular open source python library used for data analysis.We can analyze the
data by using pandas in two ways-

● Series
● Dataframes

Installation of Pandas​:

Pandas can be installed by using the command: pip install pandas

Series:

Series is a 1-Dimensional array defined in python pandas to store any data type.

A series is a Pandas data structure that represents a one dimensional array like objects containing an
array of data(of any NumPy data type) and an associated array of data labels, called its index.

Creating a Series​: A series can be created by using pandas series function called Series().

Syntax:

<Series Name>=<pd>.Series(<list name>, <index name>,....)

Note: <list name> is mandatory, index name is optional. If index is not mentioned then Series will
assign indexes from 0,1,2,.... .

Creating a series from array:

In order to create a series from an array, we have to import a numpy module and have to use the
array() function.

import pandas as pd

import numpy as np

data = np.array(['q', 'w', 'r', 't', 'y'])

ser = pd.Series(data)

print("Series by using arrays:")

print(ser)

o/p:Series by using arrays:

0 q

1 w

2 r

3 t

4 y
dtype: object

Creating a series from array with index :

In order to create a series from an array with index, we have to provide an index with the same
number of elements as it is in the array.

import pandas as pd

import numpy as np

data = np.array(['q', 'w', 'r', 't', 'y'])

ser = pd.Series(data, index =[10, 11, 12, 13, 14])

print("Series by using arrays with index:")

print(ser)

o/p: Series by using arrays with index:

10 q

11 w

12 r

13 t

14 y

dtype: object

Creating a series from Lists​:

In order to create a series from a list, we have to first create a list after that we can create a series
from the list.

#Python program to create a series with index and without index.

import pandas as pd

data=[5,7,9,4]

s=pd.Series(data)

print("Series with default index:")

print(s)

index=['q','w','e','r']
si=pd.Series(data,index)

print("Series with given index:")

print(si)

o/p: Series with default index:

0 5

1 7

2 9

3 4

dtype: int64

Series with given index:

q 5

w 7

e 9

r 4

dtype: int64

# Python program to create a series using a multi dimensional array.

import pandas as pd

data=[[5,7],[4,8],[6,1]]

s=pd.Series(data)

print("Series with multi dimensional array :")

print(s)

o/p:

Series with multi dimensional array :

0 [5, 7]

1 [4, 8]

2 [6, 1]

dtype: object

Note​: For data=[[5,7],[4,8],[6,6,4,3]]


o/p:Series with multi dimensional array :

0 [5, 7]

1 [4, 8]

2 [6, 6, 4, 3]

dtype: object

Series by using dictionary :​

In order to create a series from a dictionary, we have to first create a dictionary. After that we can
make a series using a dictionary. Dictionary keys are used to construct an index.

import pandas as pd

dict1={'a':9,'b':8,'c':3}

s=pd.Series(dict1)

print("Series by using dictionary :")

print(s)

o/p: Series by using dictionary :

a 9

b 8

c 3

dtype: int64

Note: While creating a series by using a dictionary, keys of the dictionary ​are taken asindexes​ for the
series.

Creating a series from Scalar value:

In order to create a series from scalar value, an index must be provided. The scalar value will be
repeated to match the length of the index.

import pandas as pd

s=pd.Series(10, index =[0, 1, 2, 3, 4, 5])

print("Series with a scalar value and index")

print(s)

Series with a scalar value and index


0 10

1 10

2 10

3 10

4 10

5 10

dtype: int64

Note​:If index is not mentioned then it will create a Series with one value and index as 0(zero).

Mathematical operations on Series​:T​ here are some important math operations that can be
performed on a pandas series to simplify data analysis using Python and save a lot of time.

FUNCTION USE

s.sum() Returns sum of all values in the series

s.mean() Returns the mean of all values in series. Equals to s.sum()/s.count()

s.std() Returns standard deviation of all values

s.min() or s.max() Return min and max values from series

s.idxmin() or s.idxmax() Returns index of min or max value in series

s.median() Returns median of all value

s.mode() Returns mode of the series

s.value_counts() Returns series with frequency of each value

s.describe() Returns a series with information like mean, mode etc depending on
type of data passed

Note: ‘s’ in the above table indicates the name of the series.

#Python program to implement ​Mathematical operations on Series

import pandas as pd

s=pd.Series([4,2,1,5,6,2])

print("Sum of values in the series is:",s.sum())

print("Mean of values in the series is:",s.mean())


print("Standard deviation of values in the series is:",s.std())

print("Median of values in the series is:",s.median())

print("Max of values in the series is:",s.max())

print("index of min value in the series is:",s.idxmin())

print("Mode of values in the series is:",s.mode())

print("Frequency of each value in the series is:",s.value_counts())

print("Information of values in the series is:",s.describe())

o/p:

Sum of values in the series is: 20

Mean of values in the series is: 3.3333333333333335

Standard deviation of values in the series is: 1.9663841605003503

Median of values in the series is: 3.0

Max of values in the series is: 6

index of min value in the series is: 2

Mode of values in the series is: 0 2

dtype: int64

Frequency of each value in the series is: 2 2

6 1

5 1

4 1

1 1

dtype: int64

Information of values in the series is: count 6.000000

mean 3.333333

std 1.966384

min 1.000000

25% 2.000000

50% 3.000000
75% 4.750000

max 6.000000

dtype: float64

Head and Tail functions on Series​:

To view a small sample of a Series, use the head() and the tail() methods.

head() returns the first n rows(observe the index values). The default number of elements to display
is five.

Syntax​:

<Series name>.head(n)

# Python program to fetch first two rows from the given series.

import pandas as pd

data=[4,6,7,4,2,35,10,5]

s=pd.Series(data)

print("Original series is:")

print(s)

print("First two rows of series is:")

print(s.head(2))

o/p:

Original series is:

0 4

1 6

2 7

3 4

4 2

5 35

6 10

7 5

dtype: int64

First two rows of series is:

0 4

1 6

dtype: int64
Note: If no value passed in the head() function, then output will be -

Original series is:

0 4

1 6

2 7

3 4

4 2

5 35

6 10

7 5

dtype: int64

Head rows of series is:

0 4

1 6

2 7

3 4

4 2

dtype: int64

tail()​ :

tail() returns the last n rows(observe the index values). The default number of elements to display is
five.

Syntax​:

<Series name>.tail(n)

# Python program to fetch the last three rows from the given series.

import pandas as pd

data=[4,6,7,4,2,35,10,5]

s=pd.Series(data)

print("Original series is:")

print(s)

print("Last 3 rows of series is:")

print(s.tail(3))
o/p:

Original series is:

0 4

1 6

2 7

3 4

4 2

5 35

6 10

7 5

dtype: int64

Last 3 rows of series is:

5 35

6 10

7 5

dtype: int64

Note​: If no value passed in the head() function, then output will be -

Original series is:

0 4

1 6

2 7

3 4

4 2

5 35

6 10

7 5

dtype: int64

Tail rows of series is:

3 4

4 2

5 35

6 10

7 5
dtype: int64

Selection, Indexing and Slicing on Series​:

Selection​: We can select a value from the series by using its corresponding index.

Syntax:

<Series name>[<index number>]

# Python program to implement selection on pandas series

import pandas as pd

data=['Hyderabad','New Delhi','Chenai','Bangalore','Kolkata']

s=pd.Series(data)

print("second value in the series is:")

print(s[1])

o/p:

second value in the series is:

New Delhi

Indexing​:
Pandas​ ​Series.index attribute is used to get or set the index labels of the given Series object.

Syntax: <Series name>.index

# Python program to implement indexing on pandas series.

import pandas as pd

data=['Hyderabad','New Delhi','Chenai','Bangalore','Kolkata']

s=pd.Series(data)

print("Series before adding index is:")

print(s)

ind=[11,12,13,14,15]

s.index=ind

print("Series after adding index is:")

print(s)

o/p:
Series before adding index is:

0 Hyderabad

1 New Delhi

2 Chenai

3 Bangalore

4 Kolkata

dtype: object

Series after adding index is:

11 Hyderabad

12 New Delhi

13 Chenai

14 Bangalore

15 Kolkata

dtype: object

# Python program to get index labels from the pandas series

import pandas as pd

data=['Hyderabad','New Delhi','Chenai','Bangalore','Kolkata']

ind=[4,5,6,7,8]

s=pd.Series(data,ind)

print("Series is:")

print(s)

print("index labels of the series is:")

print(s.index)

o/p:

Series is:

4 Hyderabad

5 New Delhi

6 Chenai
7 Bangalore

8 Kolkata

dtype: object

index labels of the series is:

Int64Index([4, 5, 6, 7, 8], dtype='int64')

# Python program to get default index from the series

import pandas as pd

data=['Hyderabad','New Delhi','Chenai','Bangalore','Kolkata']

s=pd.Series(data)

print("Series is:")

print(s)

print("index labels of the series is:")

print(s.index)

o/p:

Series is:

0 Hyderabad

1 New Delhi

2 Chenai

3 Bangalore

4 Kolkata

dtype: object

index labels of the series is:

RangeIndex(start=0, stop=5, step=1)

Slicing​: Slicing operation on the series split the series based on the given parameters.

Syntax:

<Series name>[<start>:<stop>:<step>]

Note: start,stop,step are optional

Default values: start=0, stop=n-1, step=1


# Python program to perform slicing operation on series.

import pandas as pd

data=['Hyderabad','New Delhi','Chenai','Bangalore','Kolkata']

s=pd.Series(data)

print("Original series is:")

print(s)

print("Sliced series is:")

print(s[1:3])

o/p:Original series is:

0 Hyderabad

1 New Delhi

2 Chenai

3 Bangalore

4 Kolkata

dtype: object

Sliced series is:

1 New Delhi

2 Chenai

dtype: object
Data Frames​:

Data Frames is a two-dimensional(2-D) data structure defined in pandas which consist of rows and
columns.

Data Frames stores an ordered collection of columns that can store data of different types.

Characteristics of Data Frames​:

➢ It has two indices (​two axes​)


○ Row index (axis=0)->known as index
○ Column index (axis=1)->known as column-name
➢ Value in the Data Frame will be identifiable by the combination of row index and column
index.
➢ Indices can be of any type. They can be numbers or letters or strings.
➢ There is no condition of having all data of the same type across columns. Column can have
data of different types.
➢ Value is mutable. We can change value in the data frames.
➢ Size is mutable. We can add or delete rows or columns in a data frame.

Creation of Data Frames​:

Syntax: <Data Frame Name>=pandas.DataFrame(

<2D data structure>,

<columns=<column sequence>,

<index=<index sequence>,............)

Note: ​NaN variable in Python​:

NaN , standing for not a number, is a numeric data type used to represent any value that is undefined
or unpresentable. For example, 0/0 is undefined as a real number and is, therefore, represented by
NaN.

Creating Dataframe from List​:

import pandas as pd

name=['Ravi','Sam','Kunal']
df=pd.DataFrame(name)

print(df)

o/p:

0 Ravi

1 Sam

2 Kunal

Creating Dataframe from the array​:

import pandas as pd

import numpy as np

data=[5,7,8,9,3]

a=np.array(data)

df=pd.DataFrame(a)

print("array is:")

print(a)

print("Data Frame is:")

print(df)

o/p:

array is:

[5 7 8 9 3]

Data Frame is:

0 5

1 7

2 8

3 9

4 3
Creating Dataframe from the series​:

import pandas as pd

import numpy as np

data=[5,7,8,9,3]

a=np.array(data)

s=pd.Series(a)

df=pd.DataFrame(s)

print("Series is:")

print(s)

print("Data Frame is:")

print(df)

o/p:

Series is:

0 5

1 7

2 8

3 9

4 3

dtype: int32

Data Frame is:

0 5

1 7

2 8

3 9

4 3

Creating Dataframe from another Dataframe​:

import pandas as pd
import numpy as np

data=[5,7,8,9,3]

a=np.array(data)

df=pd.DataFrame(a)

df2=df

print("array is:")

print(a)

print("Data Frame is:")

print(df2)

o/p:

array is:

[5 7 8 9 3]

Data Frame is:

0 5

1 7

2 8

3 9

4 3

Creating Dataframe from list of dictionaries​:

Without index and without column:

import pandas as pd

d1={'name':'Ravi','age':25,'marks':99}

d2={'name':'Sam','age':20,'marks':95}

data=[d1,d2]

df=pd.DataFrame(data)

print(df)

o/p:
name age marks

0 Ravi 25 99

1 Sam 20 95

With index and without column:

import pandas as pd

d1={'name':'Ravi','age':25,'marks':99}

d2={'name':'Sam','age':20,'marks':95}

data=[d1,d2]

ind=['p','q']

df=pd.DataFrame(data,ind)

print(df)

o/p:

name age marks

p Ravi 25 99

q Sam 20 95

Without index and with column:

import pandas as pd

d1={'name':'Ravi','age':25,'marks':99}

d2={'name':'Sam','age':20,'marks':95}

data=[d1,d2]

df=pd.DataFrame(data,columns=['name','marks'])

print(df)

o/p:

name marks

0 Ravi 99

1 Sam 95

With index and with column:

import pandas as pd
d1={'name':'Ravi','age':25,'marks':99}

d2={'name':'Sam','age':20,'marks':95}

data=[d1,d2]

df=pd.DataFrame(data,index=['p','q'],columns=['name','marks'])

print(df)

o/p:

name marks

p Ravi 99

q Sam 95

Creating Dataframe from dictionary of Series​:

import pandas as pd

name=['Ravi','Sam','Kunal']

age=[20,25,21]

ser_name=pd.Series(name)

ser_age=pd.Series(age)

data={'Name':ser_name,'Age':ser_age}

df=pd.DataFrame(data)

print("Student details:")

print(df)

o/p:

Student details:

Name Age

0 Ravi 20

1 Sam 25

2 Kunal 21

Note:​ Keys of dictionary becomes columns for the dataframes.

Creation of Dataframe from Text/CSV Files :


From CSV Files:

The Pandas library provides features using which we can read the CSV file in full as well as in parts
for only a selected group of columns and rows.The CSV File will contain comma separated values.

Input as CSV File:


The csv file is a text file in which the values in the columns are separated by a comma. Let's consider
the following data present in the file named input2.csv.

You can create this file using windows notepad by copying and pasting this data. Save the file as
input2.csv using the save As All files(*.*) option in notepad.

Content-

S.No,Name,Age,City,Salary

1,Tom,28,Toronto,20000

2,Lee,32,HongKong,3000

3,Steven,43,Bay Area,8300

4,Ram,38,Hyderabad,3900

Reading a CSV File


The read_csv function of the pandas library is used read the content of a CSV file into the python
environment as a pandas DataFrame. The function can read the files from the OS by using proper
path to the file.

import pandas as pd

data=pd.read_csv('C:/Users/JNVP-39/Desktop/input2.csv')

print(data)

o/p:

S.No Name Age City Salary

0 1 Tom 28 Toronto 20000

1 2 Lee 32 HongKong 3000

2 3 Steven 43 Bay Area 8300

3 4 Ram 38 Hyderabad 3900

Note: An additional column starting with zero as an index has been created by the python.
custom index
This specifies a column in the csv file to customize the index using index_col.

import pandas as pd

data=pd.read_csv('C:/Users/JNVP-39/Desktop/sample.txt',index_col=['S.No'])

print(data)

o/p:

S.No Name Age City Salary

1 Tom 28 Toronto 20000

2 Lee 32 HongKong 3000

3 Steven 43 Bay Area 8300

4 Ram 38 Hyderabad 3900

header_names
Specify the names of the header using the names argument.

import pandas as pd

data=pd.read_csv('C:/Users/JNVP-39/Desktop/sample.txt',names=['a', 'b', 'c','d','e'])

print(data)

o/p:

a b c d e

0 S.No Name Age City Salary

1 1 Tom 28 Toronto 20000

2 2 Lee 32 HongKong 3000

3 3 Steven 43 Bay Area 8300

4 4 Ram 38 Hyderabad 3900

Observe, the header names are appended with the custom names, but the header in the file has not
been eliminated. Now, we use the header argument to remove that.

If the header is in a row other than the first, pass the row number to header. This will skip the
preceding rows.

import pandas as pd

data=pd.read_csv('C:/Users/JNVP-39/Desktop/sample.txt',names=['a','b','c','d','e'],header=0)
print(data)

o/p:

a b c d e

0 S.No Name Age City Salary

1 1 Tom 28 Toronto 20000

2 2 Lee 32 HongKong 3000

3 3 Steven 43 Bay Area 8300

4 4 Ram 38 Hyderabad 3900

skiprows
skiprows skips the number of rows specified.

import pandas as pd

data=pd.read_csv('C:/Users/JNVP-39/Desktop/sample.txt', skiprows=2)

print(data)

o/p:

2 Lee 32 HongKong 3000

0 3 Steven 43 Bay Area 8300

1 4 Ram 38 Hyderabad 3900

Accessing values in dataframe​:

Accessing a particular value: <Data frame name>[<column name>][<index>]

import pandas as pd

d1={'name':'Ravi','age':25,'marks':99}

d2={'name':'Sam','age':20,'marks':95}

data=[d1,d2]

ind=['p','q']

df=pd.DataFrame(data,ind)

print(df)

print("Accessing a particular value:")


print(df['name']['q'])

print("Accessing a row and all the columns:")

print(df.loc['p',:])

print("Accessing multiple rows and one column:")

print(df.loc['p':'q','name'])

print("Accessing multiple rows and multiple columns:")

print(df.loc['p':'q','name':'marks'])

o/p:

name age marks

p Ravi 25 99

q Sam 20 95

Accessing a particular value:

Sam

Accessing a row and all the columns:

name Ravi

age 25

marks 99

Name: p, dtype: object

Accessing multiple rows and one column:

p Ravi

q Sam

Name: name, dtype: object

Accessing multiple rows and multiple columns:

name age marks

p Ravi 25 99

q Sam 20 95

Iterating over rows and columns in Pandas DataFrame​:

Iteration is a general term for taking each item of something, one after another. Pandas DataFrame
consists of rows and columns so, in order to iterate over dataframe, we have to iterate a dataframe

like a dictionary. In a dictionary, we iterate over the keys of the object in the same way we have to

iterate in dataframe.

In Pandas Dataframe we can iterate an element in two ways:

● Iterating over rows


● Iterating over columns

Iterating over rows :​To iterate over the rows of the DataFrame, we can use the following functions −

● iterrows() − iterate over the rows as (index,series) pairs


● iteritems() − to iterate over the (key,value) pairs
● itertuples() − iterate over the rows as namedtuples

iterrows():
iterrows() returns the iterator yielding each index value along with a series containing the data in
each row.

import pandas as pd

name=['Ravi','Sam','Kunal']

age=[20,25,21]

ser_name=pd.Series(name)

ser_age=pd.Series(age)

data={'Name':ser_name,'Age':ser_age}

df=pd.DataFrame(data)

print("Original Data Frame is:")

print(df)

print("applying iterrows on Data Frame:")

for row_index,row in df.iterrows():

print(row_index,row)

o/p:

Original Data Frame is:

Name Age
0 Ravi 20

1 Sam 25

2 Kunal 21

applying iterrows on Data Frame:

0 Name Ravi

Age 20

Name: 0, dtype: object

1 Name Sam

Age 25

Name: 1, dtype: object

2 Name Kunal

Age 21

Name: 2, dtype: object

iteritems():
Iterates over each column as key, value pair with label as key and column value as a Series object.

import pandas as pd
name=['Ravi','Sam','Kunal']
age=[20,25,21]
ser_name=pd.Series(name)
ser_age=pd.Series(age)
data={'Name':ser_name,'Age':ser_age}
df=pd.DataFrame(data)
print("Original Data Frame is:")
print(df)
print("applying iteritems on Data Frame:")
for key,value in df.iteritems():
print(key,value)

o/p:
Original Data Frame is:
Name Age
0 Ravi 20
1 Sam 25
2 Kunal 21
applying iteritems on Data Frame:
Name 0 Ravi
1 Sam
2 Kunal
Name: Name, dtype: object
Age 0 20
1 25
2 21
Name: Age, dtype: int64

itertuples():
itertuples() method will return an iterator yielding a named tuple for each row in the DataFrame. The
first element of the tuple will be the row’s corresponding index value, while the remaining values are
the row values.

import pandas as pd
name=['Ravi','Sam','Kunal']
age=[20,25,21]
ser_name=pd.Series(name)
ser_age=pd.Series(age)
data={'Name':ser_name,'Age':ser_age}
df=pd.DataFrame(data)
print("Original Data Frame is:")
print(df)
print("applying iteritems on Data Frame:")
for row in df.itertuples():
print(row)
o/p:
Original Data Frame is:
Name Age
0 Ravi 20
1 Sam 25
2 Kunal 21
applying iteritems on Data Frame:
Pandas(Index=0, Name='Ravi', Age=20)
Pandas(Index=1, Name='Sam', Age=25)
Pandas(Index=2, Name='Kunal', Age=21)

Iterating over Columns :

In order to iterate over columns, we need to create a list of dataframe columns and then iterating

through that list to pull out the dataframe columns.

import pandas as pd
name=['Ravi','Sam','Kunal']
age=[20,25,21]
ser_name=pd.Series(name)
ser_age=pd.Series(age)
data={'Name':ser_name,'Age':ser_age}
df=pd.DataFrame(data)
print("Original Data Frame is:")
print(df)
columns=list(df)
print("List of columns:")
print(columns)
print("Third element of the column:")
for i in columns:
print (df[i][2])

o/p:
Original Data Frame is:
Name Age
0 Ravi 20
1 Sam 25
2 Kunal 21
List of columns:
['Name', 'Age']
Third element of the column:
Kunal
21
Operations on rows and columns​:

● add

● select

● delete

● rename

Column Selection​:

We will understand this by selecting a column from the DataFrame.

import pandas as pd

name=['Ravi','Sam','Kunal']

age=[20,25,21]

ser_name=pd.Series(name)

ser_age=pd.Series(age)

data={'Name':ser_name,'Age':ser_age}

df=pd.DataFrame(data)

print("Original Data Frame is:")

print(df)
print("Column selection:")

print(df['Age'])

o/p:

Original Data Frame is:

Name Age

0 Ravi 20

1 Sam 25

2 Kunal 21

Column selection:

0 20

1 25

2 21

Name: Age, dtype: int64

Column Addition

We will understand this by adding a new column to an existing data frame.

import pandas as pd

name=['Ravi','Sam','Kunal']

age=[20,25,21]

ser_name=pd.Series(name)

ser_age=pd.Series(age)

data={'Name':ser_name,'Age':ser_age}

df=pd.DataFrame(data)

print("Original Data Frame is:")

print(df)
marks=[98,99]

ser_marks=pd.Series(marks)

df['Marks']=ser_marks

print("Adding Marks column:")

print(df)

o/p:

Original Data Frame is:

Name Age

0 Ravi 20

1 Sam 25

2 Kunal 21

Adding Marks column:

Name Age Marks

0 Ravi 20 98.0

1 Sam 25 99.0

2 Kunal 21 NaN

Column Deletion

Columns can be deleted or popped; let us take an example to understand how.

import pandas as pd

name=['Ravi','Sam','Kunal']

age=[20,25,21]

ser_name=pd.Series(name)

ser_age=pd.Series(age)

data={'Name':ser_name,'Age':ser_age}
df=pd.DataFrame(data)

print("Original Data Frame is:")

print(df)

print("Deleting a column using del:")

del df['Age']

print(df)

print("Deleting a column using pop:")

df.pop('Name')

print(df)

o/p:

Original Data Frame is:

Name Age

0 Ravi 20

1 Sam 25

2 Kunal 21

Deleting a column using del:

Name

0 Ravi

1 Sam

2 Kunal

Deleting a column using del:

Empty DataFrame

Columns: []

Index: [0, 1, 2]
Column Rename​:

We can rename one or some or all the columns. Inorder rename one or some columns, we
will use the rename() function.

import pandas as pd
name=['Ravi','Sam','Kunal']
age=[20,25,21]
ser_name=pd.Series(name)
ser_age=pd.Series(age)
data={'Name':ser_name,'Age':ser_age}
df=pd.DataFrame(data)
print("Original Data Frame is:")
print(df)
df.rename(columns={'Name':'New Name'},inplace=True)
print("After column rename:")
print(df)
o/p:
Original Data Frame is:
Name Age
0 Ravi 20
1 Sam 25
2 Kunal 21
After column rename:
New Name Age
0 Ravi 20
1 Sam 25
2 Kunal 21

Note​:
Inorder to rename all the columns use df.columns=<list of columns>

import pandas as pd
name=['Ravi','Sam','Kunal']
age=[20,25,21]
ser_name=pd.Series(name)
ser_age=pd.Series(age)
data={'Name':ser_name,'Age':ser_age}
df=pd.DataFrame(data)
print("Original Data Frame is:")
print(df)
df.columns=['New Name','New Age']
print("After renaming all columns:")
print(df)

o/p:
Original Data Frame is:
Name Age
0 Ravi 20
1 Sam 25
2 Kunal 21
After column rename:
New Name New Age
0 Ravi 20
1 Sam 25
2 Kunal 21

Row Selection​:

Selection by Label

Rows can be selected by passing row label to a loc function.

import pandas as pd

name=['Ravi','Sam','Kunal']

age=[20,25,21]

ser_name=pd.Series(name,index=['p','q','r'])

ser_age=pd.Series(age,index=['p','q','r'])

data={'Name':ser_name,'Age':ser_age}

df=pd.DataFrame(data)

print("Original Data Frame is:")

print(df)

print("Row Selection by label:")

print(df.loc['q'])

o/p:

Original Data Frame is:

Name Age
p Ravi 20

q Sam 25

r Kunal 21

Row Selection by label:

Name Sam

Age 25

Name: q, dtype: object

Selection by integer location

Rows can be selected by passing integer location to an iloc function.

import pandas as pd

name=['Ravi','Sam','Kunal']

age=[20,25,21]

ser_name=pd.Series(name,index=['p','q','r'])

ser_age=pd.Series(age,index=['p','q','r'])

data={'Name':ser_name,'Age':ser_age}

df=pd.DataFrame(data)

print("Original Data Frame is:")

print(df)

print("Row Selection by integer location:")

print(df.iloc[2])

o/p:

Original Data Frame is:

Name Age

p Ravi 20
q Sam 25

r Kunal 21

Row Selection by integer location:

Name Kunal

Age 21

Name: r, dtype: object

Slice Rows

Multiple rows can be selected using ‘ : ’ operator.

import pandas as pd

name=['Ravi','Sam','Kunal']

age=[20,25,21]

ser_name=pd.Series(name)

ser_age=pd.Series(age)

data={'Name':ser_name,'Age':ser_age}

df=pd.DataFrame(data)

print("Original Data Frame is:")

print(df)

print("Row Slicing:")

print(df[1:2])

o/p:

Original Data Frame is:

Name Age

0 Ravi 20

1 Sam 25
2 Kunal 21

Row Slicing:

Name Age

1 Sam 25

Addition of Rows

Add new rows to a DataFrame using the append function. This function will append the rows at the

end.

import pandas as pd

name=['Ravi','Sam','Kunal']

age=[20,25,21]

ser_name=pd.Series(name)

ser_age=pd.Series(age)

data={'Name':ser_name,'Age':ser_age}

df=pd.DataFrame(data)

print("Original Data Frame is:")

print(df)

new_df=df[1:2]

df=df.append(new_df)

print("Appended Data Frame is:")

print(df)

o/p:

Original Data Frame is:

Name Age

0 Ravi 20
1 Sam 25

2 Kunal 21

Appended Data Frame is:

Name Age

0 Ravi 20

1 Sam 25

2 Kunal 21

1 Sam 25

Deletion of Rows

Use index label to delete or drop rows from a DataFrame. If label is duplicated, then multiple rows

will be dropped.

import pandas as pd

name=['Ravi','Sam','Kunal']

age=[20,25,21]

ser_name=pd.Series(name)

ser_age=pd.Series(age)

data={'Name':ser_name,'Age':ser_age}

df=pd.DataFrame(data)

print("Original Data Frame is:")

print(df)

df=df.drop(1)

print("New Data Frame is:")

print(df)

o/p:
Original Data Frame is:

Name Age

0 Ravi 20

1 Sam 25

2 Kunal 21

New Data Frame is:

Name Age

0 Ravi 20

2 Kunal 21

Row Rename​:

We can rename one or some or all the rows. Inorder rename one or some rows, we will use
the rename() function.

import pandas as pd

name=['Ravi','Sam','Kunal']

age=[20,25,21]

ser_name=pd.Series(name)

ser_age=pd.Series(age)

data={'Name':ser_name,'Age':ser_age}

df=pd.DataFrame(data)

print("Original Data Frame is:")

print(df)

df.rename(index={0:'a'},inplace=True)

print("After renaming a row:")

print(df)
o/p:

Original Data Frame is:

Name Age

0 Ravi 20

1 Sam 25

2 Kunal 21

After renaming a row:

Name Age

a Ravi 20

1 Sam 25

2 Kunal 21

Note​:
Inorder to rename all the rows we will use df.index=<list of index>

import pandas as pd

name=['Ravi','Sam','Kunal']

age=[20,25,21]

ser_name=pd.Series(name)

ser_age=pd.Series(age)

data={'Name':ser_name,'Age':ser_age}

df=pd.DataFrame(data)

print("Original Data Frame is:")

print(df)

df.index=['a','s','d']

print("After renaming all the rows:")


print(df)

o/p:

Original Data Frame is:

Name Age

0 Ravi 20

1 Sam 25

2 Kunal 21

After renaming all the rows:

Name Age

a Ravi 20

s Sam 25

d Kunal 21

Head and Tail functions in Data Frames​:

head():

To view a small sample of a DataFrame object, use the head() and tail() methods. head() returns the

first n rows (observe the index values). The default number of elements to display is five.

import pandas as pd

name=['Ravi','Sam','Kunal']

age=[20,25,21]

ser_name=pd.Series(name)

ser_age=pd.Series(age)

data={'Name':ser_name,'Age':ser_age}

df=pd.DataFrame(data)

print("Original Data Frame is:")

print(df)
print("First two rows:",df.head(2))

o/p:

Original Data Frame is:

Name Age

0 Ravi 20

1 Sam 25

2 Kunal 21

First two rows: Name Age

0 Ravi 20

1 Sam 25

tail():

tail() returns the last n rows (observe the index values). The default number of elements to display is

five.

import pandas as pd

name=['Ravi','Sam','Kunal']

age=[20,25,21]

ser_name=pd.Series(name)

ser_age=pd.Series(age)

data={'Name':ser_name,'Age':ser_age}

df=pd.DataFrame(data)

print("Original Data Frame is:")

print(df)

print("Last row:",df.tail(1))
o/p:

Original Data Frame is:

Name Age

0 Ravi 20

1 Sam 25

2 Kunal 21

Last row: Name Age

2 Kunal 21

Indexing using Labels​:

We can make one of the columns as row index label for the data frame by using the function
set_index().

import pandas as pd

name=['Ravi','Sam','Kunal']

age=[20,25,21]

ser_name=pd.Series(name)

ser_age=pd.Series(age)

data={'Name':ser_name,'Age':ser_age}

df=pd.DataFrame(data)

print("Original Data Frame is:")

print(df)

df.set_index('Name',inplace=True)

print("After changing row index Data Frame is:")

print(df)

o/p:

Original Data Frame is:

Name Age

0 Ravi 20

1 Sam 25

2 Kunal 21
After changing row index Data Frame is:

Age

Name

Ravi 20

Sam 25

Kunal 21

Boolean indexing in Data Frames​:

Boolean indexing helps us to select the data from the DataFrames using a boolean vector. We need a
DataFrame with a boolean index to use the boolean indexing. Below are the steps for boolean
indexing in data frames.

● Create a dictionary of data.


● Convert it into a DataFrame object with a boolean index as a vector.
● Now, access the data using boolean indexing.

import pandas as pd
name=['Ravi','Sam','Kunal']
age=[20,25,21]
ser_name=pd.Series(name)
ser_age=pd.Series(age)
data={'Name':ser_name,'Age':ser_age}
df=pd.DataFrame(data)
print("Original Data Frame is:")
print(df)
df.index=[True,False,True]
print("Data Frame with boolean index is:")
print(df)
print("Data Frame with True location is:")
print(df.loc[True])
o/p:
Original Data Frame is:
Name Age
0 Ravi 20
1 Sam 25
2 Kunal 21
Data Frame with boolean index is:
Name Age
True Ravi 20
False Sam 25
True Kunal 21
Data Frame with True location is:
Name Age
True Ravi 20
True Kunal 21

Joining, Merging and Concatenation on Data Frames​:

Merge
pandas.merge() method is used for merging two data frames.

It will have three arguments.

● Data frame names


● how - how will take any of the three values i.e., left,right or inner
● on - on the common column name

import pandas as pd

name=['Ravi','Sam','Kunal']

age=[20,25,21]

ser_name=pd.Series(name)

ser_age=pd.Series(age)

data={'Name':ser_name,'Age':ser_age}

df1=pd.DataFrame(data)

print("First Data Frame is:")

print(df1)

name2=['Ravi','Sam','Kunal']

marks=[99,98]

ser2_name=pd.Series(name2)

ser2_marks=pd.Series(marks)

data2={'Name':ser2_name,'Marks':ser2_marks}

df2=pd.DataFrame(data2)

print("Second Data Frame is:")

print(df2)

df=pd.merge(df1,df2,how='left',on='Name')

print("Merged Data Frame is:")

print(df)
o/p:

First Data Frame is:

Name Age

0 Ravi 20

1 Sam 25

2 Kunal 21

Second Data Frame is:

Name Marks

0 Ravi 99.0

1 Sam 98.0

2 Kunal NaN

Merged Data Frame is:

Name Age Marks

0 Ravi 20 99.0

1 Sam 25 98.0

2 Kunal 21 NaN

Note: similarly we can pass ‘right’ and ‘inner’ values to how.

Join​:

The join method uses the ​index​ of the dataframes.

Use <dataframe 1>.join​(<​dataframe 2>​) to join​ .

import pandas as pd

name=['Ravi','Sam','Kunal']

age=[20,25,21]

ser_name=pd.Series(name)

ser_age=pd.Series(age)

data={'Name':ser_name,'Age':ser_age}

df1=pd.DataFrame(data)

print("First Data Frame is:")

print(df1)

marks=[99,98]

ser_marks=pd.Series(marks)

data2={'Marks':ser_marks}

df2=pd.DataFrame(data2)
print("Second Data Frame is:")

print(df2)

df=df1.join(df2)

print("Data Frame after join is:")

print(df)

o/p:

First Data Frame is:

Name Age

0 Ravi 20

1 Sam 25

2 Kunal 21

Second Data Frame is:

Marks

0 99

1 98

Data Frame after join is:

Name Age Marks

0 Ravi 20 99.0

1 Sam 25 98.0

2 Kunal 21 NaN

Concatenate​:

Concatenate uses pandas.concat(<List of data frames>).

import pandas as pd

name=['Ravi','Sam','Kunal']

age=[20,25,21]

ser_name=pd.Series(name)

ser_age=pd.Series(age)

data={'Name':ser_name,'Age':ser_age}

df1=pd.DataFrame(data)

print("First Data Frame is:")

print(df1)

marks=[99,98]
ser_marks=pd.Series(marks)

data2={'Marks':ser_marks}

df2=pd.DataFrame(data2)

print("Second Data Frame is:")

print(df2)

df=pd.concat([df1,df2])

print("Data Frame after concatenation is:")

print(df)

o/p:

First Data Frame is:

Name Age

0 Ravi 20

1 Sam 25

2 Kunal 21

Second Data Frame is:

Marks

0 99

1 98

Data Frame after concatenation is:

Name Age Marks

0 Ravi 20.0 NaN

1 Sam 25.0 NaN

2 Kunal 21.0 NaN

0 NaN NaN 99.0

1 NaN NaN 98.0

Importing/Exporting Data between CSV files and Data Frames​:

Import data from CSV file to Data Frame​: We can import data from CSV File to Data Frame by
using read_csv() function.

Export data from Data Frame to CSV File: ​We can export data from Data Frame to CSV File by
using to_csv() function.

Syntax: <data frame name>.to_csv(<File Path>,.....)

import pandas as pd
name=['Ravi','Sam','Kunal']

age=[20,25,21]

ser_name=pd.Series(name)

ser_age=pd.Series(age)

data={'Name':ser_name,'Age':ser_age}

df=pd.DataFrame(data)

print("Data Frame is:")

print(df)

df.to_csv('C:/Users/JNVP-39/Desktop/output.csv')

o/p:

Data Frame is:

Name Age

0 Ravi 20

1 Sam 25

2 Kunal 21

Along with the above output, a CSV file will be created in the mentioned path and with the mentioned
file name.

--------- ******* --------

You might also like