0% found this document useful (0 votes)
13 views10 pages

Data Handling Using Pandas - 1-2-1

for python programs we use datahandling with pandas

Uploaded by

sarichauhan973
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views10 pages

Data Handling Using Pandas - 1-2-1

for python programs we use datahandling with pandas

Uploaded by

sarichauhan973
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

Data Analysis: Data Analysis means analyzing a

large set of data points to get answers to questions


Data Handling using Pandas – 1
related to that dataset.
Python Libraries: Python library is a collection of
Data processing is an important part of analyzing
functions and methods that allows us to perform
the data because data is not always available in
many actions without writing our code.
the desired format. We can better explained
There are different Python libraries: through Data Life Cycle.

1) Python Imaging Library( PIL): PIL is one Data is stored in different formats- .csv file, Excel
of the core libraries for image manipulation in file or an HTML file. This data is converted into a
Python. single format and stored somewhere. This is
2) Numpy Library: This library provides math called Data Warehousing. After storing the data,
functions, scientific computing etc. analysis is done on this data. Once analysis is
3) Pandas: Pandas provides data manipulation done, we can plot this data in the form of a graph
and analysis. which is Data Visualization. All this sequences of
4) Matplotlib: This library is for data operation for data analysis can be easily and
visualization. effectively performed by Python and its libraries.

Matplotlib: Matplotlib is a 2D plotting library


that helps in visualizing features. It is used in
Python as it is a robust, free and easy library for
data visualization. It is easy to learn and
understand.

Matplotlib is a comprehensive library for creating


Pandas or Python Pandas
static, animated, and interactive visualizations in
Python. Pandas is Python’s library for data analysis.
Pandas has derived its name from “Panel Data
System”. The main author of Pandas is Wes
McKinney.

Pandas is an open source, BSD library built for


Python programming language. Pandas offers
high performance, easy to use data structures and
PANDAS: Pandas is a Python module that makes
data analysis tools.
data science or data analysis easy. It is the most
famous Python package for data science that To work with Pandas in Python, we need to
offers powerful and flexible data structures which import pandas library in our python environment
makes data analysis and manipulation easy. by writing:

import pandas as pd

Following tasks can be done using Pandas:

1. It can read or write many different data


formats like int, float, double etc.
2. Columns from a Pandas data structure can
be deleted or inserted.
3. It offers good IO capabilities as it easily size of a Series
pulls data from a MySQL database object cannot
directly into a dataframe. be changed.
4. It can easily select subsets of data from Series
bulky datasets and can even combine Series is like a one-dimensional array like structure
multiple data sets together. with homogeneous data. For example, the following
5. It has the functionality to find and fill series is a collection of integers.
missing data.
6. It supports reshaping of data into different 12 34 54 67 87 23 10
forms. A Series object has two main component:
7. It supports advanced time series
functionality.  An array of actual data
8. It supports visualization by integrating  An associated array of indexes or data labels
libraries such as matplotlib and seaborn Both components are 1- Dimensional array with the
etc. same length. The index is used to access individual
Pandas Data Structures: A Data Structure is a data values.
way of storing and organizing data in a computer
so that it can be accessed and worked within an
appropriate way. Pandas provides following three
Data structures:

1) Series: It is a one dimensional structure


Creating Series Objects
storing homogeneous mutable data.
2) Dataframes: It is a two-dimensional A Series type object can be created in many ways
structure storing heterogeneous mutable using Pandas library’s Series(). For this we need
data. to import Pandas and Numpy modules with
3) Panel : It is a three-dimensional way of import statements.
storing items.
1) Creating empty Series Object by using
We will discuss Series and Dataframes here, not Series() with no parameter: To create an
Panel. empty Series, we can just write:
<Series Object>=pandas.Series()
Difference between Series and Dataframe:
For Example:
Proper Series Dataframe import pandas as pd
ty s1=pd.Series()
Dimen Series is 1- Dataframe is 2- print(s1)
sions Dimensional. Dimensional. Output:
Type Homogeneous Heterogeneous Series([], dtype: float64)
of Datai.e. all the i.e. a DataFrame
elements must object can have
be of same elements of 2) Creating non-empty Series object: To
data type in a different datatypes. create non empty Series objects, we need
Series object. to specify argument for data and Indexes
Mutabi Value Value mutable i.e. as per following syntax:
lity mutable i.e. their elements’ <Series Object>=
their elements’ value can change. pandas.Series(data,index=idx)
value can Size-mutable i.e. Where idx is a list of Indexes and data is
change. size of a
Size- DataFrame object the Data part of the of Series object. It can
immutable i.e. can be changed. be one of the following:
a) A Python Sequence s1=pd.Series(x for x in str)
b) An ndarray print(s1)
c) A Python dictionary
d) A scalar value Output:
a) Specify data as Python Sequence: 0 I
Simplest way to create Series type object 1 n
is to give a sequence(list) of values as 2 f
attribute of Series(), 3 o
Series Object>=pandas.Series(<any 4 r
Python sequence>) 5 m
6 a
Example 1: To create a Series using a 7 t
List. 8 i
import pandas as pd 9 o
marks=[45,54,34,65] 10 n
marks_series=pd.Series(marks) dtype: object
print("CREATED SERIES IS ") b) Specify data as an ndarray : The Data
print(marks_series) attribute can be an ndarray also. For this
purpose we will import numpy also in our
Output: program.
CREATED SERIES IS Example 1: Program to create a Series
0 45 using arrange() of ndarray.
1 54 import pandas as pd
2 34 import numpy as np
3 65 a1=np.arange(1,11,2)
dtype: int64 s1=pd.Series(a1)
print(s1)
Example 2: To create a Series using a Output:
tuple. 0 1
import pandas as pd 1 3
cost=(239,438,279,376) 2 5
cost_series=pd.Series(cost) 3 7
print("CREATED SERIES IS ") 4 9
print(cost_series) dtype: int32

Output: Example 2: Program to create a Series


CREATED SERIES IS object using an ndarray that has 5
0 239 elements in the range 1 to 15.
1 438 import pandas as pd
2 279 import numpy as np
3 376 s2=pd.Series(np.linspace(1,15,5))
dtype: int64 print(s2)
Output:
Example 3: To create a Series using a 0 1.0
String 1 4.5
2 8.0
import pandas as pd 3 11.5
str="Information" 4 15.0
dtype: float64 series1=pd.Series(200,index=range(2011,
2021,2))
print(series1)
Output:
Example 3: Program to create a Series 2011 200
object using an ndarray that is created 2013 200
by tiling a list [3,5,7], twice. 2015 200
import pandas as pd 2017 200
import numpy as np 2019 200
s3=pd.Series(np.tile([3,5,7],2)) dtype: int64
print(s3)
Output: Creating Series Objects – Additional Functioning
0 3 Following are additional functionality of Series() that
1 5 we can use to create Pandas Series Objects.
2 7
3 3 i) Specifying/adding NaN values in a
4 5 Series Object: Sometimes we need to
5 7 create a series object of a certain size but
dtype: int32 we do not have complete data available at
c) Specify data as Python Dictionary: We that time. In such cases, we can fill
can also create a Series object using missing data with a NaN(Not a Number)
Dictionary. Keys of the dictionary become value.NaN is defined in NumPy module,
index of the Series and the values of so, we can use np.NaN to specify missing
dictionary become the data of Series value.
object. Example:
Example: import pandas as pd
import pandas as pd import numpy as np
d1={"JAN":31,"FEB":28,"MARCH":31," s1=pd.Series([1,2,3,4,np.NaN,5,6])
APRIL":30,"MAY":31,"JUNE":30} print(s1)
s2=pd.Series(d1) Output:
print(s2) 0 1.0
Output: 1 2.0
JAN 31 2 3.0
FEB 28 3 4.0
MARCH 31 4 NaN
APRIL 30 5 5.0
MAY 31 6 6.0
JUNE 30 dtype: float64
dtype: int64 ii) Specify index as well as data with
Series() : We can provide indexes of our
d) Specify data as a Scalar Value: The data choice along with values while creating
in a Series can be in the form of a single Series type object.
value or a scalar value. The scalar value Example:
will be repeated to match the length of import pandas as pd
index. months=["JAN","FEB","MARCH","APRI
Example: L","MAY","JUNE"]
import pandas as pd no_days=[31,28,31,30,31,30]
s1=pd.Series(no_days,index=months)
print(s1) 14 196
15 225
dtype: int32

Series Object Attributes: When we create a Series


Output: type object, all information related to it is available
JAN 31 through attributes. We can use these attributes in the
FEB 28 following format to get information about the Series
MARCH 31 object.
APRIL 30 <Series Object>.<attribute name>
MAY 31
JUNE 30 Common Attributes of Series Object :
dtype: int64
Attributes Description
iii) Specify Data Type along with data and
<Series The index of the Series.
index: We can also specify data type Object>.index
along with data and index with Series(). <Series return Series as ndarray
Example: Object>.values depending on the dtype.
import pandas as pd <Series return the dtype object of the
import numpy as np Object>.dtype underlying data.
series2=pd.Series(data=[10,20,30,40,50],i <Series return a tuple of the shape of the
ndex=['a','b','c','d','e'],dtype=np.float64) Object>.shape data
<Series return the number of bytes in the
print(series2)
Object>.nbytes data
Output: <Series return the number of dimensions
a 10.0 Object>.ndim of the data
b 20.0 <Series return the number of elements of
c 30.0 Object>.size the data
d 40.0 <Series return the size of the dtype of the
e 50.0 Object>.itemsiz item of data
e
dtype: float64
<Series return true if there are any NaN
Object>.hasnans values, false otherwise
<Series return true if the Series object is
Object>.empty empty, false otherwise
Example
iv) Using a Mathematical import pandas as pd
Function/Expression to Create Data s4=pd.Series(range(1,15,3), index=[x for x in
Array in Series: The Series() allows us to 'abcde'])
define a function or expression that can print(s4)
calculate values for data sequence. l1=[]
Example: s5=pd.Series(l1)
import numpy as np print(s5)
import pandas as pd index=s4.index
a=np.arange(11,16) val=s4.values
series1=pd.Series(index=a,data=a**2) dt=s4.dtype
print(series1) shape=s4.shape
Output: nb=s4.nbytes
11 121 nd=s4.ndim
12 144 size=s4.size
13 169 itemsize=s4.itemsize
has=s4.hasnans Example:
empty=s4.empty
empty1=s5.empty import pandas as pd
s1=pd.Series(range(1,20,2),index=(x for x
print(index) in "abcdefghij"))
print(val) print(s1)
print(dt) print(s1[5])
print(shape)
print(nb) Output:
print(nd)
print(size) a 1
print(itemsize) b 3
print(has) c 5
print(empty) d 7
print(empty1) e 9
f 11
Output: g 13
h 15
a 1 i 17
b 4 j 19
c 7 dtype: int64
d 10 11
e 13 ii) Extracting Slices from Slices Object:
dtype: int64 We can extract slice too from a Series
Series([], dtype: float64) object to retrieve subsets.
Index(['a', 'b', 'c', 'd', 'e'], dtype='object')
[ 1 4 7 10 13] Slicing takes place position wise and not
int64 the index wise in a series object.
(5,)
40 When we have to extract slice, then we
1 need to specify slices as [Start : Stop:
5 step]. Start and Stop signify the positions
8 of elements not the indexes.
False Example:
False import pandas as pd
True s1=pd.Series(range(1,20,2),index=(x for x
Accessing a Series Object and its Elements in "abcdefghij"))
print(s1)
We can access Series type Object after creation. We print(s1[5])
can access its indexes separately, its data separately s1.index=['p','q','r','s','t','u','v','w','x','y']
and even access individual elements and slices. print(s1[0:6])
print(s1[5:])
i) Accessing Individual Elements: To s1[5]=90
access individual elements of a Series print(s1[5])
object, we can give its index in square Output:
brackets ([]) along with its name. a 1
Syntax: <Series Object>[<valid index>] b 3
c 5
d 7 ii) Renaming Indexes: We can change
e 9 or rename indexes of a Series object
f 11 by assigning new index array to its
g 13 index attribute.
h 15
i 17 Example:
j 19
dtype: int64 import pandas as pd
11 series2=pd.Series(data=[10,20,30,40,5
p 1 0])
q 3 print(series2)
r 5 series2.index=['A','B','C','D','E']
s 7 print(series2)
t 9
u 11 Output:
dtype: int64
u 11 0 10
v 13 1 20
w 15 2 30
x 17 3 40
y 19 4 50
dtype: int64 dtype: int64
90 A 10
B 20
Operations on Series Object : C 30
D 40
i) Modifying Elements of Series E 50
Object : The data values of a Series dtype: int64
object can be easily modified through
item assignment:
<Series The head() and tail() function:
Object>[<index>]=<new_value>
Example: The head() is used to display first n rows from a
import pandas as pd pandas object i.. Series and tail() returns last n rows
s1=pd.Series(range(1,20,2),index=(x for x from a pandas object.
in "abcdefghij"))
print(s1) Syntax:
s1[5]=90 <pandas object>.head(<n>)
print(s1[5])
Output: <pandas object>.tail(<n>)
a 1
b 3 If you do not provide any values for n then head()
c 5 and tail() will return 1st 5 and last 5 rows respectively
d 7 of a pandas object (Series).
e 9
f 11 Vector Operations on Series Objects
g 13 Vector operations mean that if we apply a function or
h 15 expression then it is individually applied to each item
i 17 of the object.
j 19 Series objects also support Vectorized operations like
dtype: int64 ndarray.
90 Suppose we have a Series object s1 as
0 10
1 20
2 30 4 10
3 40 5 20
dtype : int64 6 30
then following are the legal operations: 7 40
s1+2,s1*3,s1**3 etc. dtype : int64

Arithmetic Operations on Series objects If we give condition as: print(s1[s1>10])


We can perform arithmetic like addition, subtraction Then output will be:
etc with two Series objects and it will calculate result 1 20
on two corresponding elements of the given two 2 30
Series objects. 3 40
dtype : int64
But the operation is performed on the matching
indexes. For example: if first object has indexes Sorting Series Values:
0,1,2, then it will perform arithmetic only with
objects having 0,1,2 indexes, for all other indexes, it We can sort the values of a Series object on the basis
will produce NaN. of values and indexes.
We can store result of arithmetic operation in another  Sorting on the basis of Values: We can sort
Series object. For example: if we give: a Series object on the basis of values using
sort_values() function as per the following
S3=s1+s2, where s1 and s2 are series object, then s3
syntax:
will also be a Series object.
<Series-Object>.sort-
values([ascending=True|False])
Filtering Entries in Series Object: We can filter out
The argument ascending is optional and it will
entries from a series object using expressions that are
take the value
of Boolean type. Syntax will be:
True by default.
<Series object>[[<Boolean exp on Series object>]]
 Sorting on the basis of Indexes: We can sort
 When we apply a comparison operator a Series object on the basis of indexes using
directly on a Pandas Series object, then this sort_index() function as per the following
condition will be applied on each and every syntax:
element of Series object. <Series-
Object>.sort_index([ascending=True|False])
The argument ascending is optional and it will
take the value
True by default.

For Example: Suppose we have a Series object s1 as Program to implement Vector Operation on
0 10 Series object
1 20
2 30 import pandas as pd
3 40 s1=pd.Series(range(1,11,2))
dtype : int64 print(s1)
print(s1*4)
If we give condition as: print(s1>10)
Then output will be:
0 False Output:
1 True 0 1
2 True 1 3
3 True 2 5
dtype: bool 3 7
4 9
 When we apply this condition with the Series dtype: int64
object inside [], we will find that it will return 0 4
filtered result containing only. 1 12
For Example: Suppose we have a Series object s1 as 2 20
3 28 0 False
4 36 1 False
dtype: int64 2 False
3 False
Program to implement Arithmetic on Series 4 True
object 5 True
6 True
import pandas as pd dtype: bool
s1=pd.Series(range(1,11,2)) 4 13
s2=pd.Series(range(11,21,2)) 5 16
print(s1+s2) 6 19
dtype: int64
Output:
0 12 Program to implement head() and tail() of a Series
1 16 Object
2 20
3 24 import pandas as pd
4 28 s1=pd.Series(range(1,20,3))
dtype: int64 print(s1.head(3))
print(s1.tail(2))
Program to implement Arithmetic on Series
object with mismatched Index print(s1.head())
print(s1.tail())
import pandas as pd
s1=pd.Series(range(1,11,2)) Output:
s2=pd.Series(range(11,21,2)) 0 1
s2.index=[1,2,3,4,5] 1 4
print(s1+s2) 2 7
dtype: int64
Output: 5 16
0 NaN 6 19
1 14.0 dtype: int64
2 18.0 0 1
3 22.0 1 4
4 26.0 2 7
5 NaN 3 10
dtype: float64 4 13
dtype: int64
Program to implement filtering conditions in a 2 7
Series object 3 10
import pandas as pd 4 13
s1=pd.Series(range(1,20,3)) 5 16
print(s1) 6 19
print(s1>10) dtype: int64
print(s1[s1>10])

Output:
0 1
1 4
2 7
3 10
4 13
5 16
6 19
dtype: int64
Program to sort a Series on the basis of values Program to sort a Series on the basis of index

import pandas as pd import pandas as pd


s1=pd.Series(range(1,10,2)) l1=[10,20,40,30,60]
print(s1) s2=pd.Series(l1)
l1=[10,20,40,30,60,50,100,80] s2.index=[1,4,3,2,5]
s2=pd.Series(l1) print(s2.sort_index())
print(s1.sort_values(ascending=False)) print(s2.sort_index(ascending=False))
print(s2.sort_values(ascending=False))
print(s2.sort_values()) Output:
Output: 1 10
0 1 2 30
1 3 3 40
2 5 4 20
3 7 5 60
4 9 dtype: int64
dtype: int64 5 60
4 9 4 20
3 7 3 40
2 5 2 30
1 3 1 10
0 1 dtype: int64
dtype: int64
6 100
7 80
4 60
5 50
2 40
3 30
1 20
0 10
dtype: int64

0 10
1 20
3 30
2 40
5 50
4 60
7 80
6 100
dtype: int64

You might also like