Reading Material For Data Handling Using Pandas-I
Reading Material For Data Handling Using Pandas-I
By
B. Naresh, PGT Computer Science
JNV Khammam
----------------------------------------------------------------------------------------------------------
Blue Print:
4 Societal Impacts 8
Practical 30
Total 100
Unit-1:
Data Handling using Pandas and Data Visualization
Module: Module is a file which contains python functions. It is a .py file which has python executable
code or statements.
def welcome(user):
mod_sample.welcome("JNV Khammam")
__init__.py file denotes Python the file that contains __init__.py as package.
Module: pack_sample.py
def welcome(user):
Then create package ‘package1’ by adding these two scripts to the folder package1.(Please find the
below image)
pack_sample.welcome("Package")
Library: It is a collection of various packages. There is no difference between package and python
library conceptually.
Framework: It is a collection of various libraries which architects the code flow.
Pandas:
Pandas is the most popular open source python library used for data analysis.We can analyze the
data by using pandas in two ways-
● Series
● Dataframes
Installation of Pandas:
Series:
Series is a 1-Dimensional array defined in python pandas to store any data type.
A series is a Pandas data structure that represents a one dimensional array like objects containing an
array of data(of any NumPy data type) and an associated array of data labels, called its index.
Creating a Series: A series can be created by using pandas series function called Series().
Syntax:
Note: <list name> is mandatory, index name is optional. If index is not mentioned then Series will
assign indexes from 0,1,2,.... .
In order to create a series from an array, we have to import a numpy module and have to use the
array() function.
import pandas as pd
import numpy as np
ser = pd.Series(data)
print(ser)
0 q
1 w
2 r
3 t
4 y
dtype: object
In order to create a series from an array with index, we have to provide an index with the same
number of elements as it is in the array.
import pandas as pd
import numpy as np
print(ser)
10 q
11 w
12 r
13 t
14 y
dtype: object
In order to create a series from a list, we have to first create a list after that we can create a series
from the list.
import pandas as pd
data=[5,7,9,4]
s=pd.Series(data)
print(s)
index=['q','w','e','r']
si=pd.Series(data,index)
print(si)
0 5
1 7
2 9
3 4
dtype: int64
q 5
w 7
e 9
r 4
dtype: int64
import pandas as pd
data=[[5,7],[4,8],[6,1]]
s=pd.Series(data)
print(s)
o/p:
0 [5, 7]
1 [4, 8]
2 [6, 1]
dtype: object
0 [5, 7]
1 [4, 8]
2 [6, 6, 4, 3]
dtype: object
In order to create a series from a dictionary, we have to first create a dictionary. After that we can
make a series using a dictionary. Dictionary keys are used to construct an index.
import pandas as pd
dict1={'a':9,'b':8,'c':3}
s=pd.Series(dict1)
print(s)
a 9
b 8
c 3
dtype: int64
Note: While creating a series by using a dictionary, keys of the dictionary are taken asindexes for the
series.
In order to create a series from scalar value, an index must be provided. The scalar value will be
repeated to match the length of the index.
import pandas as pd
print(s)
1 10
2 10
3 10
4 10
5 10
dtype: int64
Note:If index is not mentioned then it will create a Series with one value and index as 0(zero).
Mathematical operations on Series:T here are some important math operations that can be
performed on a pandas series to simplify data analysis using Python and save a lot of time.
FUNCTION USE
s.describe() Returns a series with information like mean, mode etc depending on
type of data passed
Note: ‘s’ in the above table indicates the name of the series.
import pandas as pd
s=pd.Series([4,2,1,5,6,2])
o/p:
dtype: int64
6 1
5 1
4 1
1 1
dtype: int64
mean 3.333333
std 1.966384
min 1.000000
25% 2.000000
50% 3.000000
75% 4.750000
max 6.000000
dtype: float64
To view a small sample of a Series, use the head() and the tail() methods.
head() returns the first n rows(observe the index values). The default number of elements to display
is five.
Syntax:
<Series name>.head(n)
# Python program to fetch first two rows from the given series.
import pandas as pd
data=[4,6,7,4,2,35,10,5]
s=pd.Series(data)
print(s)
print(s.head(2))
o/p:
0 4
1 6
2 7
3 4
4 2
5 35
6 10
7 5
dtype: int64
0 4
1 6
dtype: int64
Note: If no value passed in the head() function, then output will be -
0 4
1 6
2 7
3 4
4 2
5 35
6 10
7 5
dtype: int64
0 4
1 6
2 7
3 4
4 2
dtype: int64
tail() :
tail() returns the last n rows(observe the index values). The default number of elements to display is
five.
Syntax:
<Series name>.tail(n)
# Python program to fetch the last three rows from the given series.
import pandas as pd
data=[4,6,7,4,2,35,10,5]
s=pd.Series(data)
print(s)
print(s.tail(3))
o/p:
0 4
1 6
2 7
3 4
4 2
5 35
6 10
7 5
dtype: int64
5 35
6 10
7 5
dtype: int64
0 4
1 6
2 7
3 4
4 2
5 35
6 10
7 5
dtype: int64
3 4
4 2
5 35
6 10
7 5
dtype: int64
Selection: We can select a value from the series by using its corresponding index.
Syntax:
import pandas as pd
data=['Hyderabad','New Delhi','Chenai','Bangalore','Kolkata']
s=pd.Series(data)
print(s[1])
o/p:
New Delhi
Indexing:
Pandas Series.index attribute is used to get or set the index labels of the given Series object.
import pandas as pd
data=['Hyderabad','New Delhi','Chenai','Bangalore','Kolkata']
s=pd.Series(data)
print(s)
ind=[11,12,13,14,15]
s.index=ind
print(s)
o/p:
Series before adding index is:
0 Hyderabad
1 New Delhi
2 Chenai
3 Bangalore
4 Kolkata
dtype: object
11 Hyderabad
12 New Delhi
13 Chenai
14 Bangalore
15 Kolkata
dtype: object
import pandas as pd
data=['Hyderabad','New Delhi','Chenai','Bangalore','Kolkata']
ind=[4,5,6,7,8]
s=pd.Series(data,ind)
print("Series is:")
print(s)
print(s.index)
o/p:
Series is:
4 Hyderabad
5 New Delhi
6 Chenai
7 Bangalore
8 Kolkata
dtype: object
import pandas as pd
data=['Hyderabad','New Delhi','Chenai','Bangalore','Kolkata']
s=pd.Series(data)
print("Series is:")
print(s)
print(s.index)
o/p:
Series is:
0 Hyderabad
1 New Delhi
2 Chenai
3 Bangalore
4 Kolkata
dtype: object
Slicing: Slicing operation on the series split the series based on the given parameters.
Syntax:
<Series name>[<start>:<stop>:<step>]
import pandas as pd
data=['Hyderabad','New Delhi','Chenai','Bangalore','Kolkata']
s=pd.Series(data)
print(s)
print(s[1:3])
0 Hyderabad
1 New Delhi
2 Chenai
3 Bangalore
4 Kolkata
dtype: object
1 New Delhi
2 Chenai
dtype: object
Data Frames:
Data Frames is a two-dimensional(2-D) data structure defined in pandas which consist of rows and
columns.
Data Frames stores an ordered collection of columns that can store data of different types.
<columns=<column sequence>,
<index=<index sequence>,............)
NaN , standing for not a number, is a numeric data type used to represent any value that is undefined
or unpresentable. For example, 0/0 is undefined as a real number and is, therefore, represented by
NaN.
import pandas as pd
name=['Ravi','Sam','Kunal']
df=pd.DataFrame(name)
print(df)
o/p:
0 Ravi
1 Sam
2 Kunal
import pandas as pd
import numpy as np
data=[5,7,8,9,3]
a=np.array(data)
df=pd.DataFrame(a)
print("array is:")
print(a)
print(df)
o/p:
array is:
[5 7 8 9 3]
0 5
1 7
2 8
3 9
4 3
Creating Dataframe from the series:
import pandas as pd
import numpy as np
data=[5,7,8,9,3]
a=np.array(data)
s=pd.Series(a)
df=pd.DataFrame(s)
print("Series is:")
print(s)
print(df)
o/p:
Series is:
0 5
1 7
2 8
3 9
4 3
dtype: int32
0 5
1 7
2 8
3 9
4 3
import pandas as pd
import numpy as np
data=[5,7,8,9,3]
a=np.array(data)
df=pd.DataFrame(a)
df2=df
print("array is:")
print(a)
print(df2)
o/p:
array is:
[5 7 8 9 3]
0 5
1 7
2 8
3 9
4 3
import pandas as pd
d1={'name':'Ravi','age':25,'marks':99}
d2={'name':'Sam','age':20,'marks':95}
data=[d1,d2]
df=pd.DataFrame(data)
print(df)
o/p:
name age marks
0 Ravi 25 99
1 Sam 20 95
import pandas as pd
d1={'name':'Ravi','age':25,'marks':99}
d2={'name':'Sam','age':20,'marks':95}
data=[d1,d2]
ind=['p','q']
df=pd.DataFrame(data,ind)
print(df)
o/p:
p Ravi 25 99
q Sam 20 95
import pandas as pd
d1={'name':'Ravi','age':25,'marks':99}
d2={'name':'Sam','age':20,'marks':95}
data=[d1,d2]
df=pd.DataFrame(data,columns=['name','marks'])
print(df)
o/p:
name marks
0 Ravi 99
1 Sam 95
import pandas as pd
d1={'name':'Ravi','age':25,'marks':99}
d2={'name':'Sam','age':20,'marks':95}
data=[d1,d2]
df=pd.DataFrame(data,index=['p','q'],columns=['name','marks'])
print(df)
o/p:
name marks
p Ravi 99
q Sam 95
import pandas as pd
name=['Ravi','Sam','Kunal']
age=[20,25,21]
ser_name=pd.Series(name)
ser_age=pd.Series(age)
data={'Name':ser_name,'Age':ser_age}
df=pd.DataFrame(data)
print("Student details:")
print(df)
o/p:
Student details:
Name Age
0 Ravi 20
1 Sam 25
2 Kunal 21
The Pandas library provides features using which we can read the CSV file in full as well as in parts
for only a selected group of columns and rows.The CSV File will contain comma separated values.
You can create this file using windows notepad by copying and pasting this data. Save the file as
input2.csv using the save As All files(*.*) option in notepad.
Content-
S.No,Name,Age,City,Salary
1,Tom,28,Toronto,20000
2,Lee,32,HongKong,3000
3,Steven,43,Bay Area,8300
4,Ram,38,Hyderabad,3900
import pandas as pd
data=pd.read_csv('C:/Users/JNVP-39/Desktop/input2.csv')
print(data)
o/p:
Note: An additional column starting with zero as an index has been created by the python.
custom index
This specifies a column in the csv file to customize the index using index_col.
import pandas as pd
data=pd.read_csv('C:/Users/JNVP-39/Desktop/sample.txt',index_col=['S.No'])
print(data)
o/p:
header_names
Specify the names of the header using the names argument.
import pandas as pd
print(data)
o/p:
a b c d e
Observe, the header names are appended with the custom names, but the header in the file has not
been eliminated. Now, we use the header argument to remove that.
If the header is in a row other than the first, pass the row number to header. This will skip the
preceding rows.
import pandas as pd
data=pd.read_csv('C:/Users/JNVP-39/Desktop/sample.txt',names=['a','b','c','d','e'],header=0)
print(data)
o/p:
a b c d e
skiprows
skiprows skips the number of rows specified.
import pandas as pd
data=pd.read_csv('C:/Users/JNVP-39/Desktop/sample.txt', skiprows=2)
print(data)
o/p:
import pandas as pd
d1={'name':'Ravi','age':25,'marks':99}
d2={'name':'Sam','age':20,'marks':95}
data=[d1,d2]
ind=['p','q']
df=pd.DataFrame(data,ind)
print(df)
print(df.loc['p',:])
print(df.loc['p':'q','name'])
print(df.loc['p':'q','name':'marks'])
o/p:
p Ravi 25 99
q Sam 20 95
Sam
name Ravi
age 25
marks 99
p Ravi
q Sam
p Ravi 25 99
q Sam 20 95
Iteration is a general term for taking each item of something, one after another. Pandas DataFrame
consists of rows and columns so, in order to iterate over dataframe, we have to iterate a dataframe
like a dictionary. In a dictionary, we iterate over the keys of the object in the same way we have to
iterate in dataframe.
Iterating over rows :To iterate over the rows of the DataFrame, we can use the following functions −
iterrows():
iterrows() returns the iterator yielding each index value along with a series containing the data in
each row.
import pandas as pd
name=['Ravi','Sam','Kunal']
age=[20,25,21]
ser_name=pd.Series(name)
ser_age=pd.Series(age)
data={'Name':ser_name,'Age':ser_age}
df=pd.DataFrame(data)
print(df)
print(row_index,row)
o/p:
Name Age
0 Ravi 20
1 Sam 25
2 Kunal 21
0 Name Ravi
Age 20
1 Name Sam
Age 25
2 Name Kunal
Age 21
iteritems():
Iterates over each column as key, value pair with label as key and column value as a Series object.
import pandas as pd
name=['Ravi','Sam','Kunal']
age=[20,25,21]
ser_name=pd.Series(name)
ser_age=pd.Series(age)
data={'Name':ser_name,'Age':ser_age}
df=pd.DataFrame(data)
print("Original Data Frame is:")
print(df)
print("applying iteritems on Data Frame:")
for key,value in df.iteritems():
print(key,value)
o/p:
Original Data Frame is:
Name Age
0 Ravi 20
1 Sam 25
2 Kunal 21
applying iteritems on Data Frame:
Name 0 Ravi
1 Sam
2 Kunal
Name: Name, dtype: object
Age 0 20
1 25
2 21
Name: Age, dtype: int64
itertuples():
itertuples() method will return an iterator yielding a named tuple for each row in the DataFrame. The
first element of the tuple will be the row’s corresponding index value, while the remaining values are
the row values.
import pandas as pd
name=['Ravi','Sam','Kunal']
age=[20,25,21]
ser_name=pd.Series(name)
ser_age=pd.Series(age)
data={'Name':ser_name,'Age':ser_age}
df=pd.DataFrame(data)
print("Original Data Frame is:")
print(df)
print("applying iteritems on Data Frame:")
for row in df.itertuples():
print(row)
o/p:
Original Data Frame is:
Name Age
0 Ravi 20
1 Sam 25
2 Kunal 21
applying iteritems on Data Frame:
Pandas(Index=0, Name='Ravi', Age=20)
Pandas(Index=1, Name='Sam', Age=25)
Pandas(Index=2, Name='Kunal', Age=21)
In order to iterate over columns, we need to create a list of dataframe columns and then iterating
import pandas as pd
name=['Ravi','Sam','Kunal']
age=[20,25,21]
ser_name=pd.Series(name)
ser_age=pd.Series(age)
data={'Name':ser_name,'Age':ser_age}
df=pd.DataFrame(data)
print("Original Data Frame is:")
print(df)
columns=list(df)
print("List of columns:")
print(columns)
print("Third element of the column:")
for i in columns:
print (df[i][2])
o/p:
Original Data Frame is:
Name Age
0 Ravi 20
1 Sam 25
2 Kunal 21
List of columns:
['Name', 'Age']
Third element of the column:
Kunal
21
Operations on rows and columns:
● add
● select
● delete
● rename
Column Selection:
import pandas as pd
name=['Ravi','Sam','Kunal']
age=[20,25,21]
ser_name=pd.Series(name)
ser_age=pd.Series(age)
data={'Name':ser_name,'Age':ser_age}
df=pd.DataFrame(data)
print(df)
print("Column selection:")
print(df['Age'])
o/p:
Name Age
0 Ravi 20
1 Sam 25
2 Kunal 21
Column selection:
0 20
1 25
2 21
Column Addition
import pandas as pd
name=['Ravi','Sam','Kunal']
age=[20,25,21]
ser_name=pd.Series(name)
ser_age=pd.Series(age)
data={'Name':ser_name,'Age':ser_age}
df=pd.DataFrame(data)
print(df)
marks=[98,99]
ser_marks=pd.Series(marks)
df['Marks']=ser_marks
print(df)
o/p:
Name Age
0 Ravi 20
1 Sam 25
2 Kunal 21
0 Ravi 20 98.0
1 Sam 25 99.0
2 Kunal 21 NaN
Column Deletion
import pandas as pd
name=['Ravi','Sam','Kunal']
age=[20,25,21]
ser_name=pd.Series(name)
ser_age=pd.Series(age)
data={'Name':ser_name,'Age':ser_age}
df=pd.DataFrame(data)
print(df)
del df['Age']
print(df)
df.pop('Name')
print(df)
o/p:
Name Age
0 Ravi 20
1 Sam 25
2 Kunal 21
Name
0 Ravi
1 Sam
2 Kunal
Empty DataFrame
Columns: []
Index: [0, 1, 2]
Column Rename:
We can rename one or some or all the columns. Inorder rename one or some columns, we
will use the rename() function.
import pandas as pd
name=['Ravi','Sam','Kunal']
age=[20,25,21]
ser_name=pd.Series(name)
ser_age=pd.Series(age)
data={'Name':ser_name,'Age':ser_age}
df=pd.DataFrame(data)
print("Original Data Frame is:")
print(df)
df.rename(columns={'Name':'New Name'},inplace=True)
print("After column rename:")
print(df)
o/p:
Original Data Frame is:
Name Age
0 Ravi 20
1 Sam 25
2 Kunal 21
After column rename:
New Name Age
0 Ravi 20
1 Sam 25
2 Kunal 21
Note:
Inorder to rename all the columns use df.columns=<list of columns>
import pandas as pd
name=['Ravi','Sam','Kunal']
age=[20,25,21]
ser_name=pd.Series(name)
ser_age=pd.Series(age)
data={'Name':ser_name,'Age':ser_age}
df=pd.DataFrame(data)
print("Original Data Frame is:")
print(df)
df.columns=['New Name','New Age']
print("After renaming all columns:")
print(df)
o/p:
Original Data Frame is:
Name Age
0 Ravi 20
1 Sam 25
2 Kunal 21
After column rename:
New Name New Age
0 Ravi 20
1 Sam 25
2 Kunal 21
Row Selection:
Selection by Label
import pandas as pd
name=['Ravi','Sam','Kunal']
age=[20,25,21]
ser_name=pd.Series(name,index=['p','q','r'])
ser_age=pd.Series(age,index=['p','q','r'])
data={'Name':ser_name,'Age':ser_age}
df=pd.DataFrame(data)
print(df)
print(df.loc['q'])
o/p:
Name Age
p Ravi 20
q Sam 25
r Kunal 21
Name Sam
Age 25
import pandas as pd
name=['Ravi','Sam','Kunal']
age=[20,25,21]
ser_name=pd.Series(name,index=['p','q','r'])
ser_age=pd.Series(age,index=['p','q','r'])
data={'Name':ser_name,'Age':ser_age}
df=pd.DataFrame(data)
print(df)
print(df.iloc[2])
o/p:
Name Age
p Ravi 20
q Sam 25
r Kunal 21
Name Kunal
Age 21
Slice Rows
import pandas as pd
name=['Ravi','Sam','Kunal']
age=[20,25,21]
ser_name=pd.Series(name)
ser_age=pd.Series(age)
data={'Name':ser_name,'Age':ser_age}
df=pd.DataFrame(data)
print(df)
print("Row Slicing:")
print(df[1:2])
o/p:
Name Age
0 Ravi 20
1 Sam 25
2 Kunal 21
Row Slicing:
Name Age
1 Sam 25
Addition of Rows
Add new rows to a DataFrame using the append function. This function will append the rows at the
end.
import pandas as pd
name=['Ravi','Sam','Kunal']
age=[20,25,21]
ser_name=pd.Series(name)
ser_age=pd.Series(age)
data={'Name':ser_name,'Age':ser_age}
df=pd.DataFrame(data)
print(df)
new_df=df[1:2]
df=df.append(new_df)
print(df)
o/p:
Name Age
0 Ravi 20
1 Sam 25
2 Kunal 21
Name Age
0 Ravi 20
1 Sam 25
2 Kunal 21
1 Sam 25
Deletion of Rows
Use index label to delete or drop rows from a DataFrame. If label is duplicated, then multiple rows
will be dropped.
import pandas as pd
name=['Ravi','Sam','Kunal']
age=[20,25,21]
ser_name=pd.Series(name)
ser_age=pd.Series(age)
data={'Name':ser_name,'Age':ser_age}
df=pd.DataFrame(data)
print(df)
df=df.drop(1)
print(df)
o/p:
Original Data Frame is:
Name Age
0 Ravi 20
1 Sam 25
2 Kunal 21
Name Age
0 Ravi 20
2 Kunal 21
Row Rename:
We can rename one or some or all the rows. Inorder rename one or some rows, we will use
the rename() function.
import pandas as pd
name=['Ravi','Sam','Kunal']
age=[20,25,21]
ser_name=pd.Series(name)
ser_age=pd.Series(age)
data={'Name':ser_name,'Age':ser_age}
df=pd.DataFrame(data)
print(df)
df.rename(index={0:'a'},inplace=True)
print(df)
o/p:
Name Age
0 Ravi 20
1 Sam 25
2 Kunal 21
Name Age
a Ravi 20
1 Sam 25
2 Kunal 21
Note:
Inorder to rename all the rows we will use df.index=<list of index>
import pandas as pd
name=['Ravi','Sam','Kunal']
age=[20,25,21]
ser_name=pd.Series(name)
ser_age=pd.Series(age)
data={'Name':ser_name,'Age':ser_age}
df=pd.DataFrame(data)
print(df)
df.index=['a','s','d']
o/p:
Name Age
0 Ravi 20
1 Sam 25
2 Kunal 21
Name Age
a Ravi 20
s Sam 25
d Kunal 21
head():
To view a small sample of a DataFrame object, use the head() and tail() methods. head() returns the
first n rows (observe the index values). The default number of elements to display is five.
import pandas as pd
name=['Ravi','Sam','Kunal']
age=[20,25,21]
ser_name=pd.Series(name)
ser_age=pd.Series(age)
data={'Name':ser_name,'Age':ser_age}
df=pd.DataFrame(data)
print(df)
print("First two rows:",df.head(2))
o/p:
Name Age
0 Ravi 20
1 Sam 25
2 Kunal 21
0 Ravi 20
1 Sam 25
tail():
tail() returns the last n rows (observe the index values). The default number of elements to display is
five.
import pandas as pd
name=['Ravi','Sam','Kunal']
age=[20,25,21]
ser_name=pd.Series(name)
ser_age=pd.Series(age)
data={'Name':ser_name,'Age':ser_age}
df=pd.DataFrame(data)
print(df)
print("Last row:",df.tail(1))
o/p:
Name Age
0 Ravi 20
1 Sam 25
2 Kunal 21
2 Kunal 21
We can make one of the columns as row index label for the data frame by using the function
set_index().
import pandas as pd
name=['Ravi','Sam','Kunal']
age=[20,25,21]
ser_name=pd.Series(name)
ser_age=pd.Series(age)
data={'Name':ser_name,'Age':ser_age}
df=pd.DataFrame(data)
print(df)
df.set_index('Name',inplace=True)
print(df)
o/p:
Name Age
0 Ravi 20
1 Sam 25
2 Kunal 21
After changing row index Data Frame is:
Age
Name
Ravi 20
Sam 25
Kunal 21
Boolean indexing helps us to select the data from the DataFrames using a boolean vector. We need a
DataFrame with a boolean index to use the boolean indexing. Below are the steps for boolean
indexing in data frames.
import pandas as pd
name=['Ravi','Sam','Kunal']
age=[20,25,21]
ser_name=pd.Series(name)
ser_age=pd.Series(age)
data={'Name':ser_name,'Age':ser_age}
df=pd.DataFrame(data)
print("Original Data Frame is:")
print(df)
df.index=[True,False,True]
print("Data Frame with boolean index is:")
print(df)
print("Data Frame with True location is:")
print(df.loc[True])
o/p:
Original Data Frame is:
Name Age
0 Ravi 20
1 Sam 25
2 Kunal 21
Data Frame with boolean index is:
Name Age
True Ravi 20
False Sam 25
True Kunal 21
Data Frame with True location is:
Name Age
True Ravi 20
True Kunal 21
Merge
pandas.merge() method is used for merging two data frames.
import pandas as pd
name=['Ravi','Sam','Kunal']
age=[20,25,21]
ser_name=pd.Series(name)
ser_age=pd.Series(age)
data={'Name':ser_name,'Age':ser_age}
df1=pd.DataFrame(data)
print(df1)
name2=['Ravi','Sam','Kunal']
marks=[99,98]
ser2_name=pd.Series(name2)
ser2_marks=pd.Series(marks)
data2={'Name':ser2_name,'Marks':ser2_marks}
df2=pd.DataFrame(data2)
print(df2)
df=pd.merge(df1,df2,how='left',on='Name')
print(df)
o/p:
Name Age
0 Ravi 20
1 Sam 25
2 Kunal 21
Name Marks
0 Ravi 99.0
1 Sam 98.0
2 Kunal NaN
0 Ravi 20 99.0
1 Sam 25 98.0
2 Kunal 21 NaN
Join:
import pandas as pd
name=['Ravi','Sam','Kunal']
age=[20,25,21]
ser_name=pd.Series(name)
ser_age=pd.Series(age)
data={'Name':ser_name,'Age':ser_age}
df1=pd.DataFrame(data)
print(df1)
marks=[99,98]
ser_marks=pd.Series(marks)
data2={'Marks':ser_marks}
df2=pd.DataFrame(data2)
print("Second Data Frame is:")
print(df2)
df=df1.join(df2)
print(df)
o/p:
Name Age
0 Ravi 20
1 Sam 25
2 Kunal 21
Marks
0 99
1 98
0 Ravi 20 99.0
1 Sam 25 98.0
2 Kunal 21 NaN
Concatenate:
import pandas as pd
name=['Ravi','Sam','Kunal']
age=[20,25,21]
ser_name=pd.Series(name)
ser_age=pd.Series(age)
data={'Name':ser_name,'Age':ser_age}
df1=pd.DataFrame(data)
print(df1)
marks=[99,98]
ser_marks=pd.Series(marks)
data2={'Marks':ser_marks}
df2=pd.DataFrame(data2)
print(df2)
df=pd.concat([df1,df2])
print(df)
o/p:
Name Age
0 Ravi 20
1 Sam 25
2 Kunal 21
Marks
0 99
1 98
Import data from CSV file to Data Frame: We can import data from CSV File to Data Frame by
using read_csv() function.
Export data from Data Frame to CSV File: We can export data from Data Frame to CSV File by
using to_csv() function.
import pandas as pd
name=['Ravi','Sam','Kunal']
age=[20,25,21]
ser_name=pd.Series(name)
ser_age=pd.Series(age)
data={'Name':ser_name,'Age':ser_age}
df=pd.DataFrame(data)
print(df)
df.to_csv('C:/Users/JNVP-39/Desktop/output.csv')
o/p:
Name Age
0 Ravi 20
1 Sam 25
2 Kunal 21
Along with the above output, a CSV file will be created in the mentioned path and with the mentioned
file name.