Sanjivani Rural Education Society's
Sanjivani College of Engineering, Kopargaon 423603.
-Department of Strucutral Engineering-
Course Title: (Python-SY & TY B.TECH Structure)
Pandas
By
Mr. Sumit S. Kolapkar (Assistant Professor)
Mail Id- kolapkarsumitst@sanjivani.org.in
➢ What is Data Analysis-
• Is a process of inspecting, cleansing, transforming
and modelling data with the goal of discovering useful
information, informing conclusions and supports
decision making.
➢ Python Libraries for Data Analysis-
➢ What is Pandas-
• It has a reference to both “panel data “ and “python
data analysis” and was developed by Mckinney in
2008.
• Used to working with data sets.
• It has a function of analysing, cleaning, exploring and
manipulating data.
• Read and write data in different formats like CSV, Zip,
text, Json,
➢ Excel vs. Pandas-
• Pandas shines when doing analysis work vs Excel
shines for building small applications and
presentation.
• Excel cannot handle more than 1.3~ million records,
and today most of the datasets have more than 2
million rows at least.
• But pandas/python is undoubtedly more powerful
then excel. You can work with more data, faster, and
automate a lot more.
➢ Importance of Pandas in Python-
• Pandas allows us to analyze big data and make
conclusions based on statistical theories.
• Pandas can clean messy data sets, and make them
readable and relevant.
• Easy handling of missing data (represented as NaN)
in both floating point and non-floating point data
• Size mutability: columns can be inserted and deleted
from DataFrames and higher-dimensional objects.
• Data set merging and joining. Flexible reshaping and
pivoting of data sets.
➢ Pandas data structures- Three types
• Series- One dimensional labelled array and capable
of holding data of any type (integer, string, float etc).
pd.series(data)
• Data frames- Two dimensional data structures with
column just like a table..
• Panel- A 3D container of data.
➢ Installing Pandas-
• pip install pandas
➢ Importing Pandas-
• Import pandas as pd
➢ Series data structure-
• Is a one dimensional array which capable of storing
various data types.
• Syntax: pandas.Series(data=None, index=None,
dtype=None, name=None, copy=False,
fastpath=False)
• Parameters:
• data: array- Contains data stored in Series.
• index: array-like or Index (1d)
• dtype: str, numpy.dtype, float, or ExtensionDtype,
optional
• name: str, optional
• copy: bool, default False
➢ Series data structure-
• import pandas as pd
a=pd.Series( )
print(a)
• Example-
import pandas as pd
X = [3,4,5,6,7,8] OUTPUT
var = pd.Series(X) 0 3
print(var) 1 4
print(var[4]) 2 5
3 6
4 7
5 8
dtype:int64
7...value of index 4
➢ Series data structure-
Example-
import pandas as pd
dic={"name":['python','c','c++','java'],"popularity":[90,65,7
0,85], "rank":[1.0,4.0,3.0,2.0]}
var=pd.Series(dic)
print(var)
• OUTPUT
name [python, c, c++, java]
popularity [90, 65, 70, 85]
rank [1.0, 4.0, 3.0, 2.0]
dtype: object........because we used mixed data type like
string, integer, float etc.
➢ Series data structure- Change the index
• import pandas as pd
a=pd.Series( )
print(a)
• Example-
import pandas as pd
X = [3,4,5,6]
var = pd.Series(X, index=[“a”, “s”, “d”, “f”])
print(var) OUTPUT
a 3
s 4
d 5
f 6
dtype:int64
➢ Series data structure- Change the index
• import pandas as pd
a=pd.Series( )
print(a)
• Example-
import pandas as pd
X = [3,4,5,6]
var = pd.Series(X, index=[“a”, “s”, “d”, “f”], dtype=
“float”,name=”python”)
print(var) OUTPUT
a 3.0
s 4.0
d 5.0
f 6.0
Name: python, dtype: float64
➢ Series data structure- Change the index
Example-
import pandas as pd
X = pd.Series(12,index=[1,2,3,4,5,6,7])
print(X)
OUTPUT
1 12
2 12
3 12
4 12
5 12
6 12
7 12
dtype: int64
➢ Series data structure- Change the index
Example-
import pandas as pd
X1 = pd.Series(12,index=[1,2,3,4,5,6,7])
X2 = pd.Series(12,index=[1,2,3,4])
print(X1+X2)
OUTPUT
1 24.0
2 24.0
3 24.0
4 24.0
5 NaN
6 NaN
7 NaN
dtype: float64
Note- 1.In NumPy it will show an error of broadcasting whereas in
Pandas it shows an output of NaN.
2. Pandas works on the missing data.
➢ Data frames in Pandas-
DataFrame: 1. Is a two-dimensional size-mutable,
heterogeneous tabular data structure with labeled axes
(rows and columns).
2. Pandas DataFrame consists of three principal
components, the data, rows, and columns.
3. Pandas DataFrame can be created from the lists,
dictionary, and from a list of dictionary etc.
➢ Data frames in Pandas- Using List
import pandas as pd
L = [1,2,3,4,5,6,7]
var = pd.DataFrame(L)
print(var)
OUTPUT
0
0 1
1 2
2 3
3 4
4 5
5 6
6 7
➢ Data frames in Pandas- Using Dictionary
import pandas as pd
dic = {"a":[1,2,3,4,5],"b":[1,2,3,4,5]}
var = pd.DataFrame(dic)
print(var)
OUTPUT
a b
0 1 1
1 2 2
2 3 3
3 4 4
4 5 5
➢ Data frames in Pandas- To work on column
import pandas as pd
dic = {"a":[1,2,3,4,5],"b":[1,2,3,4,5]}
var = pd.DataFrame(dic,columns=["a"])
print(var)
OUTPUT
a
0 1
1 2
2 3
3 4
4 5
➢ Data frames in Pandas- To work on column
import pandas as pd
dic = {"a":[1,2,3,4,5],"b":[1,2,3,4,5],1:[1,2,3,4,5]}
var = pd.DataFrame(dic,columns=["a",1])
print(var)
OUTPUT
a 1
0 1 1
1 2 2
2 3 3
3 4 4
4 5 5
➢ Data frames in Pandas- To work on column
import pandas as pd
dic = {"a":[1,2,3,4,5],"b":[1,2,3,4,5],1:[1,2,3,4,5]}
var=pd.DataFrame(dic,columns=["a",1],index=["s","u","m
","i","t"])
OUTPUT
a 1
s 1 1
u 2 2
m 3 3
i 4 4
t 5 5
➢ Data frames in Pandas- To work on column
import pandas as pd
dic = {"a":[1,2,3,4,5],"b":[1,2,3,14,5],1:[1,2,3,4,5]}
var = pd.DataFrame(dic)
print(var)
print(var["b"][3])
OUTPUT
a b 1
0 1 1 1
1 2 2 2
2 3 3 3
3 4 14 4
4 5 5 5
14
➢ Data frames in Pandas- Nested List
import pandas as pd
List_1 = [[1,2,3,4,5],[11,12,13,14,15],[21,22,23,24,25]]
var = pd.DataFrame(List_1)
print(var)
OUTPUT
0 1 2 3 4
0 1 2 3 4 5
1 11 12 13 14 15
2 21 22 23 24 25
➢ Data frames in Pandas- Using Series
import pandas as pd
sr ={"a":pd.Series([1,2,3,4]),"b":pd.Series([11,12,13,14])}
var = pd.DataFrame(sr)
print(var)
OUTPUT
a b
0 1 11
1 2 12
2 3 13
3 4 14
➢ Arithmetic Operations in Pandas-
import pandas as pd
sr = {"A":[1,2,3,4],"B":[11,12,13,14]}
var = pd.DataFrame(sr)
print(var)
OUTPUT
A B
0 1 11
1 2 12
2 3 13
3 4 14
Addition of A and B-
var["C"]=var["A"]+var["B"]
print(var)
OUTPUT
A B C
0 1 11 12
1 2 12 14
2 3 13 16
3 4 14 18
➢ Arithmetic Operations in Pandas-
import pandas as pd
sr = {"A":[1,2,3,4],"B":[11,12,13,14]}
var1 = pd.DataFrame(sr)
var1["Python"]=var1["A"]<=2
print(var1)
OUTPUT
A B Python
0 1 11 True
1 2 12 True
2 3 13 False
3 4 14 False
➢ Delete and Insert Data in Pandas-
import pandas as pd
sr = {"A":[1,2,3,4],"B":[11,12,13,14]}
var1 = pd.DataFrame(sr)
print(var1)
OUTPUT
A B
0 1 11
1 2 12
2 3 13
3 4 14
var1.insert(1,"Python",var1["A"])
print(var1)
OUTPUT
A Python B
0 1 1 11
1 2 2 12
2 3 3 13
3 4 4 14
index name data to be inserted
➢ Delete and Insert Data in Pandas-
import pandas as pd
sr = {"A":[1,2,3,4],"B":[11,12,13,14]}
var1 = pd.DataFrame(sr)
print(var1)
OUTPUT
A B
0 1 11
1 2 12
2 3 13
3 4 14
var1["Python"]=var1["A"][:3]
print(var1)
OUTPUT
A B Python
0 1 11 1.0
1 2 12 2.0
2 3 13 3.0
3 4 14 NaN
slicing data upto which it is to be copied
➢ Delete and Insert Data in Pandas-
import pandas as pd
sr = {"A":[1,2,3,4],"B":[11,12,13,14],"C":[21,22,23,24]}
var1 = pd.DataFrame(sr)
print(var1)
OUTPUT
A B C
0 1 11 21
1 2 12 22
2 3 13 23
3 4 14 24
var2 = var1.pop("B")
var2
OUTPUT
0 11
1 12
2 13
3 14
Name: B, dtype: int64
➢ Creation of CSV files in Pandas-
Differences between CSV and XLS (Excel) file-
• CSV file is a plain text format in which values are
separeated by commas 9Comma Separated Values)
• XLS file format is an Excel Sheets binary file format
which holds information about all the worksheets in a
file, including both content and formatting.
➢ Creation of CSV files in Pandas-
import pandas as pd
sr = {"A":[1,2,3,4],"B":[11,12,13,14],"C":[21,22,23,24]}
var1 = pd.DataFrame(sr)
print(var1)
var1.to_csv("python.csv")
Note- Will create new CSV file in a folder where other python files are
available
import pandas as pd
sr = {"A":[1,2,3,4],"B":[11,12,13,14],"C":[21,22,23,24]}
var1 = pd.DataFrame(sr)
print(var1)
var1.to_csv("python.csv", index=False)....to remove indexing
➢ Creation of CSV files in Pandas-
import pandas as pd
sr = {"A":[1,2,3,4],"B":[11,12,13,14],"C":[21,22,23,24]}
var1 = pd.DataFrame(sr)
print(var1)
var1.to_csv("python.csv", header=False).....to remove header
OR
import pandas as pd
sr = {"A":[1,2,3,4],"B":[11,12,13,14],"C":[21,22,23,24]}
var1 = pd.DataFrame(sr)
print(var1)
var1.to_csv("python.csv", header=[11,12,13])
THANK YOU....

Panda data structures and its importance in Python.pdf

  • 1.
    Sanjivani Rural EducationSociety's Sanjivani College of Engineering, Kopargaon 423603. -Department of Strucutral Engineering- Course Title: (Python-SY & TY B.TECH Structure) Pandas By Mr. Sumit S. Kolapkar (Assistant Professor) Mail Id- [email protected]
  • 2.
    ➢ What isData Analysis- • Is a process of inspecting, cleansing, transforming and modelling data with the goal of discovering useful information, informing conclusions and supports decision making. ➢ Python Libraries for Data Analysis-
  • 3.
    ➢ What isPandas- • It has a reference to both “panel data “ and “python data analysis” and was developed by Mckinney in 2008. • Used to working with data sets. • It has a function of analysing, cleaning, exploring and manipulating data. • Read and write data in different formats like CSV, Zip, text, Json,
  • 4.
    ➢ Excel vs.Pandas- • Pandas shines when doing analysis work vs Excel shines for building small applications and presentation. • Excel cannot handle more than 1.3~ million records, and today most of the datasets have more than 2 million rows at least. • But pandas/python is undoubtedly more powerful then excel. You can work with more data, faster, and automate a lot more.
  • 5.
    ➢ Importance ofPandas in Python- • Pandas allows us to analyze big data and make conclusions based on statistical theories. • Pandas can clean messy data sets, and make them readable and relevant. • Easy handling of missing data (represented as NaN) in both floating point and non-floating point data • Size mutability: columns can be inserted and deleted from DataFrames and higher-dimensional objects. • Data set merging and joining. Flexible reshaping and pivoting of data sets.
  • 6.
    ➢ Pandas datastructures- Three types • Series- One dimensional labelled array and capable of holding data of any type (integer, string, float etc). pd.series(data) • Data frames- Two dimensional data structures with column just like a table.. • Panel- A 3D container of data. ➢ Installing Pandas- • pip install pandas ➢ Importing Pandas- • Import pandas as pd
  • 7.
    ➢ Series datastructure- • Is a one dimensional array which capable of storing various data types. • Syntax: pandas.Series(data=None, index=None, dtype=None, name=None, copy=False, fastpath=False) • Parameters: • data: array- Contains data stored in Series. • index: array-like or Index (1d) • dtype: str, numpy.dtype, float, or ExtensionDtype, optional • name: str, optional • copy: bool, default False
  • 8.
    ➢ Series datastructure- • import pandas as pd a=pd.Series( ) print(a) • Example- import pandas as pd X = [3,4,5,6,7,8] OUTPUT var = pd.Series(X) 0 3 print(var) 1 4 print(var[4]) 2 5 3 6 4 7 5 8 dtype:int64 7...value of index 4
  • 9.
    ➢ Series datastructure- Example- import pandas as pd dic={"name":['python','c','c++','java'],"popularity":[90,65,7 0,85], "rank":[1.0,4.0,3.0,2.0]} var=pd.Series(dic) print(var) • OUTPUT name [python, c, c++, java] popularity [90, 65, 70, 85] rank [1.0, 4.0, 3.0, 2.0] dtype: object........because we used mixed data type like string, integer, float etc.
  • 10.
    ➢ Series datastructure- Change the index • import pandas as pd a=pd.Series( ) print(a) • Example- import pandas as pd X = [3,4,5,6] var = pd.Series(X, index=[“a”, “s”, “d”, “f”]) print(var) OUTPUT a 3 s 4 d 5 f 6 dtype:int64
  • 11.
    ➢ Series datastructure- Change the index • import pandas as pd a=pd.Series( ) print(a) • Example- import pandas as pd X = [3,4,5,6] var = pd.Series(X, index=[“a”, “s”, “d”, “f”], dtype= “float”,name=”python”) print(var) OUTPUT a 3.0 s 4.0 d 5.0 f 6.0 Name: python, dtype: float64
  • 12.
    ➢ Series datastructure- Change the index Example- import pandas as pd X = pd.Series(12,index=[1,2,3,4,5,6,7]) print(X) OUTPUT 1 12 2 12 3 12 4 12 5 12 6 12 7 12 dtype: int64
  • 13.
    ➢ Series datastructure- Change the index Example- import pandas as pd X1 = pd.Series(12,index=[1,2,3,4,5,6,7]) X2 = pd.Series(12,index=[1,2,3,4]) print(X1+X2) OUTPUT 1 24.0 2 24.0 3 24.0 4 24.0 5 NaN 6 NaN 7 NaN dtype: float64 Note- 1.In NumPy it will show an error of broadcasting whereas in Pandas it shows an output of NaN. 2. Pandas works on the missing data.
  • 14.
    ➢ Data framesin Pandas- DataFrame: 1. Is a two-dimensional size-mutable, heterogeneous tabular data structure with labeled axes (rows and columns). 2. Pandas DataFrame consists of three principal components, the data, rows, and columns. 3. Pandas DataFrame can be created from the lists, dictionary, and from a list of dictionary etc.
  • 15.
    ➢ Data framesin Pandas- Using List import pandas as pd L = [1,2,3,4,5,6,7] var = pd.DataFrame(L) print(var) OUTPUT 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7
  • 16.
    ➢ Data framesin Pandas- Using Dictionary import pandas as pd dic = {"a":[1,2,3,4,5],"b":[1,2,3,4,5]} var = pd.DataFrame(dic) print(var) OUTPUT a b 0 1 1 1 2 2 2 3 3 3 4 4 4 5 5
  • 17.
    ➢ Data framesin Pandas- To work on column import pandas as pd dic = {"a":[1,2,3,4,5],"b":[1,2,3,4,5]} var = pd.DataFrame(dic,columns=["a"]) print(var) OUTPUT a 0 1 1 2 2 3 3 4 4 5
  • 18.
    ➢ Data framesin Pandas- To work on column import pandas as pd dic = {"a":[1,2,3,4,5],"b":[1,2,3,4,5],1:[1,2,3,4,5]} var = pd.DataFrame(dic,columns=["a",1]) print(var) OUTPUT a 1 0 1 1 1 2 2 2 3 3 3 4 4 4 5 5
  • 19.
    ➢ Data framesin Pandas- To work on column import pandas as pd dic = {"a":[1,2,3,4,5],"b":[1,2,3,4,5],1:[1,2,3,4,5]} var=pd.DataFrame(dic,columns=["a",1],index=["s","u","m ","i","t"]) OUTPUT a 1 s 1 1 u 2 2 m 3 3 i 4 4 t 5 5
  • 20.
    ➢ Data framesin Pandas- To work on column import pandas as pd dic = {"a":[1,2,3,4,5],"b":[1,2,3,14,5],1:[1,2,3,4,5]} var = pd.DataFrame(dic) print(var) print(var["b"][3]) OUTPUT a b 1 0 1 1 1 1 2 2 2 2 3 3 3 3 4 14 4 4 5 5 5 14
  • 21.
    ➢ Data framesin Pandas- Nested List import pandas as pd List_1 = [[1,2,3,4,5],[11,12,13,14,15],[21,22,23,24,25]] var = pd.DataFrame(List_1) print(var) OUTPUT 0 1 2 3 4 0 1 2 3 4 5 1 11 12 13 14 15 2 21 22 23 24 25
  • 22.
    ➢ Data framesin Pandas- Using Series import pandas as pd sr ={"a":pd.Series([1,2,3,4]),"b":pd.Series([11,12,13,14])} var = pd.DataFrame(sr) print(var) OUTPUT a b 0 1 11 1 2 12 2 3 13 3 4 14
  • 23.
    ➢ Arithmetic Operationsin Pandas- import pandas as pd sr = {"A":[1,2,3,4],"B":[11,12,13,14]} var = pd.DataFrame(sr) print(var) OUTPUT A B 0 1 11 1 2 12 2 3 13 3 4 14 Addition of A and B- var["C"]=var["A"]+var["B"] print(var) OUTPUT A B C 0 1 11 12 1 2 12 14 2 3 13 16 3 4 14 18
  • 24.
    ➢ Arithmetic Operationsin Pandas- import pandas as pd sr = {"A":[1,2,3,4],"B":[11,12,13,14]} var1 = pd.DataFrame(sr) var1["Python"]=var1["A"]<=2 print(var1) OUTPUT A B Python 0 1 11 True 1 2 12 True 2 3 13 False 3 4 14 False
  • 25.
    ➢ Delete andInsert Data in Pandas- import pandas as pd sr = {"A":[1,2,3,4],"B":[11,12,13,14]} var1 = pd.DataFrame(sr) print(var1) OUTPUT A B 0 1 11 1 2 12 2 3 13 3 4 14 var1.insert(1,"Python",var1["A"]) print(var1) OUTPUT A Python B 0 1 1 11 1 2 2 12 2 3 3 13 3 4 4 14 index name data to be inserted
  • 26.
    ➢ Delete andInsert Data in Pandas- import pandas as pd sr = {"A":[1,2,3,4],"B":[11,12,13,14]} var1 = pd.DataFrame(sr) print(var1) OUTPUT A B 0 1 11 1 2 12 2 3 13 3 4 14 var1["Python"]=var1["A"][:3] print(var1) OUTPUT A B Python 0 1 11 1.0 1 2 12 2.0 2 3 13 3.0 3 4 14 NaN slicing data upto which it is to be copied
  • 27.
    ➢ Delete andInsert Data in Pandas- import pandas as pd sr = {"A":[1,2,3,4],"B":[11,12,13,14],"C":[21,22,23,24]} var1 = pd.DataFrame(sr) print(var1) OUTPUT A B C 0 1 11 21 1 2 12 22 2 3 13 23 3 4 14 24 var2 = var1.pop("B") var2 OUTPUT 0 11 1 12 2 13 3 14 Name: B, dtype: int64
  • 28.
    ➢ Creation ofCSV files in Pandas- Differences between CSV and XLS (Excel) file- • CSV file is a plain text format in which values are separeated by commas 9Comma Separated Values) • XLS file format is an Excel Sheets binary file format which holds information about all the worksheets in a file, including both content and formatting.
  • 29.
    ➢ Creation ofCSV files in Pandas- import pandas as pd sr = {"A":[1,2,3,4],"B":[11,12,13,14],"C":[21,22,23,24]} var1 = pd.DataFrame(sr) print(var1) var1.to_csv("python.csv") Note- Will create new CSV file in a folder where other python files are available import pandas as pd sr = {"A":[1,2,3,4],"B":[11,12,13,14],"C":[21,22,23,24]} var1 = pd.DataFrame(sr) print(var1) var1.to_csv("python.csv", index=False)....to remove indexing
  • 30.
    ➢ Creation ofCSV files in Pandas- import pandas as pd sr = {"A":[1,2,3,4],"B":[11,12,13,14],"C":[21,22,23,24]} var1 = pd.DataFrame(sr) print(var1) var1.to_csv("python.csv", header=False).....to remove header OR import pandas as pd sr = {"A":[1,2,3,4],"B":[11,12,13,14],"C":[21,22,23,24]} var1 = pd.DataFrame(sr) print(var1) var1.to_csv("python.csv", header=[11,12,13])
  • 31.