Panda data structures and its importance in Python.pdf

Sanjivani Rural Education Society's
Sanjivani College of Engineering, Kopargaon 423603.
-Department of Strucutral Engineering-
Course Title: (Python-SY & TY B.TECH Structure)
Pandas
By
Mr. Sumit S. Kolapkar (Assistant Professor)
Mail Id- kolapkarsumitst@sanjivani.org.in

➢ What is Data Analysis-
• Is a process of inspecting, cleansing, transforming
and modelling data with the goal of discovering useful
information, informing conclusions and supports
decision making.
➢ Python Libraries for Data Analysis-

➢ What is Pandas-
• It has a reference to both “panel data “ and “python
data analysis” and was developed by Mckinney in
2008.
• Used to working with data sets.
• It has a function of analysing, cleaning, exploring and
manipulating data.
• Read and write data in different formats like CSV, Zip,
text, Json,

➢ Excel vs. Pandas-
• Pandas shines when doing analysis work vs Excel
shines for building small applications and
presentation.
• Excel cannot handle more than 1.3~ million records,
and today most of the datasets have more than 2
million rows at least.
• But pandas/python is undoubtedly more powerful
then excel. You can work with more data, faster, and
automate a lot more.

➢ Importance of Pandas in Python-
• Pandas allows us to analyze big data and make
conclusions based on statistical theories.
• Pandas can clean messy data sets, and make them
readable and relevant.
• Easy handling of missing data (represented as NaN)
in both floating point and non-floating point data
• Size mutability: columns can be inserted and deleted
from DataFrames and higher-dimensional objects.
• Data set merging and joining. Flexible reshaping and
pivoting of data sets.

➢ Pandas data structures- Three types
• Series- One dimensional labelled array and capable
of holding data of any type (integer, string, float etc).
pd.series(data)
• Data frames- Two dimensional data structures with
column just like a table..
• Panel- A 3D container of data.
➢ Installing Pandas-
• pip install pandas
➢ Importing Pandas-
• Import pandas as pd

➢ Series data structure-
• Is a one dimensional array which capable of storing
various data types.
• Syntax: pandas.Series(data=None, index=None,
dtype=None, name=None, copy=False,
fastpath=False)
• Parameters:
• data: array- Contains data stored in Series.
• index: array-like or Index (1d)
• dtype: str, numpy.dtype, float, or ExtensionDtype,
optional
• name: str, optional
• copy: bool, default False

• import pandas as pd
a=pd.Series( )
print(a)
• Example-
import pandas as pd
X = [3,4,5,6,7,8] OUTPUT
var = pd.Series(X) 0 3
print(var) 1 4
print(var[4]) 2 5
3 6
4 7
5 8
dtype:int64
7...value of index 4

Example-
import pandas as pd
dic={"name":['python','c','c++','java'],"popularity":[90,65,7
0,85], "rank":[1.0,4.0,3.0,2.0]}
var=pd.Series(dic)
print(var)
• OUTPUT
name [python, c, c++, java]
popularity [90, 65, 70, 85]
rank [1.0, 4.0, 3.0, 2.0]
dtype: object........because we used mixed data type like
string, integer, float etc.

➢ Series data structure- Change the index
a=pd.Series( )
print(a)
• Example-
import pandas as pd
X = [3,4,5,6]
var = pd.Series(X, index=[“a”, “s”, “d”, “f”])
print(var) OUTPUT
a 3
s 4
d 5
f 6
dtype:int64

a=pd.Series( )
print(a)
• Example-
import pandas as pd
X = [3,4,5,6]
var = pd.Series(X, index=[“a”, “s”, “d”, “f”], dtype=
“float”,name=”python”)
print(var) OUTPUT
a 3.0
s 4.0
d 5.0
f 6.0
Name: python, dtype: float64

Example-
import pandas as pd
X = pd.Series(12,index=[1,2,3,4,5,6,7])
print(X)
OUTPUT
1 12
2 12
3 12
4 12
5 12
6 12
7 12
dtype: int64

Example-
import pandas as pd
X1 = pd.Series(12,index=[1,2,3,4,5,6,7])
X2 = pd.Series(12,index=[1,2,3,4])
print(X1+X2)
OUTPUT
1 24.0
2 24.0
3 24.0
4 24.0
5 NaN
6 NaN
7 NaN
dtype: float64
Note- 1.In NumPy it will show an error of broadcasting whereas in
Pandas it shows an output of NaN.
2. Pandas works on the missing data.

➢ Data frames in Pandas-
DataFrame: 1. Is a two-dimensional size-mutable,
heterogeneous tabular data structure with labeled axes
(rows and columns).
2. Pandas DataFrame consists of three principal
components, the data, rows, and columns.
3. Pandas DataFrame can be created from the lists,
dictionary, and from a list of dictionary etc.

➢ Data frames in Pandas- Using List
import pandas as pd
L = [1,2,3,4,5,6,7]
var = pd.DataFrame(L)
print(var)
OUTPUT
0
0 1
1 2
2 3
3 4
4 5
5 6
6 7

➢ Data frames in Pandas- Using Dictionary
import pandas as pd
dic = {"a":[1,2,3,4,5],"b":[1,2,3,4,5]}
var = pd.DataFrame(dic)
print(var)
OUTPUT
a b
0 1 1
1 2 2
2 3 3
3 4 4
4 5 5

➢ Data frames in Pandas- To work on column
import pandas as pd
dic = {"a":[1,2,3,4,5],"b":[1,2,3,4,5]}
var = pd.DataFrame(dic,columns=["a"])
print(var)
OUTPUT
a
0 1
1 2
2 3
3 4
4 5

import pandas as pd
dic = {"a":[1,2,3,4,5],"b":[1,2,3,4,5],1:[1,2,3,4,5]}
var = pd.DataFrame(dic,columns=["a",1])
print(var)
OUTPUT
a 1
0 1 1
1 2 2
2 3 3
3 4 4
4 5 5

import pandas as pd
dic = {"a":[1,2,3,4,5],"b":[1,2,3,4,5],1:[1,2,3,4,5]}
var=pd.DataFrame(dic,columns=["a",1],index=["s","u","m
","i","t"])
OUTPUT
a 1
s 1 1
u 2 2
m 3 3
i 4 4
t 5 5

import pandas as pd
dic = {"a":[1,2,3,4,5],"b":[1,2,3,14,5],1:[1,2,3,4,5]}
var = pd.DataFrame(dic)
print(var)
print(var["b"][3])
OUTPUT
a b 1
0 1 1 1
1 2 2 2
2 3 3 3
3 4 14 4
4 5 5 5
14

➢ Data frames in Pandas- Nested List
import pandas as pd
List_1 = [[1,2,3,4,5],[11,12,13,14,15],[21,22,23,24,25]]
var = pd.DataFrame(List_1)
print(var)
OUTPUT
0 1 2 3 4
0 1 2 3 4 5
1 11 12 13 14 15
2 21 22 23 24 25

➢ Data frames in Pandas- Using Series
import pandas as pd
sr ={"a":pd.Series([1,2,3,4]),"b":pd.Series([11,12,13,14])}
var = pd.DataFrame(sr)
print(var)
OUTPUT
a b
0 1 11
1 2 12
2 3 13
3 4 14

➢ Arithmetic Operations in Pandas-
import pandas as pd
sr = {"A":[1,2,3,4],"B":[11,12,13,14]}
var = pd.DataFrame(sr)
print(var)
OUTPUT
A B
0 1 11
1 2 12
2 3 13
3 4 14
Addition of A and B-
var["C"]=var["A"]+var["B"]
print(var)
OUTPUT
A B C
0 1 11 12
1 2 12 14
2 3 13 16
3 4 14 18

➢ Arithmetic Operations in Pandas-
import pandas as pd
sr = {"A":[1,2,3,4],"B":[11,12,13,14]}
var1 = pd.DataFrame(sr)
var1["Python"]=var1["A"]<=2
print(var1)
OUTPUT
A B Python
0 1 11 True
1 2 12 True
2 3 13 False
3 4 14 False

➢ Delete and Insert Data in Pandas-
import pandas as pd
sr = {"A":[1,2,3,4],"B":[11,12,13,14]}
print(var1)
OUTPUT
A B
0 1 11
1 2 12
2 3 13
3 4 14
var1.insert(1,"Python",var1["A"])
print(var1)
OUTPUT
A Python B
0 1 1 11
1 2 2 12
2 3 3 13
3 4 4 14
index name data to be inserted

import pandas as pd
sr = {"A":[1,2,3,4],"B":[11,12,13,14]}
print(var1)
OUTPUT
A B
0 1 11
1 2 12
2 3 13
3 4 14
var1["Python"]=var1["A"][:3]
print(var1)
OUTPUT
A B Python
0 1 11 1.0
1 2 12 2.0
2 3 13 3.0
3 4 14 NaN
slicing data upto which it is to be copied

import pandas as pd
sr = {"A":[1,2,3,4],"B":[11,12,13,14],"C":[21,22,23,24]}
print(var1)
OUTPUT
A B C
0 1 11 21
1 2 12 22
2 3 13 23
3 4 14 24
var2 = var1.pop("B")
var2
OUTPUT
0 11
1 12
2 13
3 14
Name: B, dtype: int64

➢ Creation of CSV files in Pandas-
Differences between CSV and XLS (Excel) file-
• CSV file is a plain text format in which values are
separeated by commas 9Comma Separated Values)
• XLS file format is an Excel Sheets binary file format
which holds information about all the worksheets in a
file, including both content and formatting.

import pandas as pd
sr = {"A":[1,2,3,4],"B":[11,12,13,14],"C":[21,22,23,24]}
print(var1)
var1.to_csv("python.csv")
Note- Will create new CSV file in a folder where other python files are
available
import pandas as pd
sr = {"A":[1,2,3,4],"B":[11,12,13,14],"C":[21,22,23,24]}
print(var1)
var1.to_csv("python.csv", index=False)....to remove indexing

import pandas as pd
sr = {"A":[1,2,3,4],"B":[11,12,13,14],"C":[21,22,23,24]}
print(var1)
var1.to_csv("python.csv", header=False).....to remove header
OR
import pandas as pd
sr = {"A":[1,2,3,4],"B":[11,12,13,14],"C":[21,22,23,24]}
print(var1)
var1.to_csv("python.csv", header=[11,12,13])

Panda data structures and its importance in Python.pdf

More Related Content

Similar to Panda data structures and its importance in Python.pdf

More from sumitt6_25730773

Recently uploaded

Panda data structures and its importance in Python.pdf