0% found this document useful (0 votes)

7 views

Data Analysis with Python

The document details an internship project by Mayank Pathak at Dr. B.R Ambedkar National Institute of Technology, focusing on data analysis using Python. It covers various libraries such as NumPy, Pandas, Matplotlib, and Seaborn, along with analyses of different datasets including weather, COVID-19, IPL, Netflix, and the 2011 India Census. The document emphasizes the importance of data collection, processing, and visualization techniques in deriving insights from data.

Uploaded by

Mayank Pathak

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

Data Analysis with Python

Uploaded by

Mayank Pathak

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 51

Dr. B.

R Ambedkar National
Institute of
Technology ,Jalandhar

Internship ( 2 Months)

26-05-2022 to 23-07-2022

Name : Mayank Pathak

Roll No : 19113054

Project Topic : Data Analysis with

Python

Instructor : K.S.

Page | 1
Rana

Acknowledgement

Page | 2
Firstly , I would like to thank Rail Coach Factory for giving me
such a great opportunity to do my internship project in their
esteemed organization at its Technical Training Centre,
Kapurthala.

I would like to express my sincere gratitude and indebtedness

to my mentor, Mr. K.S Rana for his invaluable guidance,
encouragement and comforting behaviour, which helped me
complete my internship successfully.

I couldn’t forget the books and internet which provided me with

substantive matter.

Finally , I would like to thank my family and friends for all the
support and encouragement . I would also like to thank my
fellow students for many helpful discussions and good ideas
along the way.

Page | 3
Contents

Data Analysis Using Python

……………………….. 3

Python And Libraries

……………………….. 7
 NumPy …………………… 7
 Pandas …………………… 10
 Matplotlib …………………… 11
 Seaborn …………………… 12

Weather Dataset Analysis

………………………..13

Covid-19 Dataset Analysis

………………………..19

IPL Dataset Analysis

…………………………22

Netflix Dataset Analysis

…………………………28

Census Dataset Analysis

…………………………37

Important Visualisation Tools

…………………………42

Page | 4
DATA ANALYSIS
Data analysis is a process of inspecting, cleansing,
transforming, and modelling data with the goal of
discovering useful information ,informing conclusions,
and supporting decision-making.
Data analysis has multiple facets and
approaches ,encompassing diverse techniques under a
variety of names, and is used in different business,
science, and social science domains.

Page | 5
Data requirements
The data is necessary as inputs to the analysis, which is
specified based upon the requirements of those
directing the analysis (or customers, who will use the
finished product of the analysis

Data collection
Data is collected from a variety of sources. The
requirements may be communicated by analysts to
custodians of the data; such as, Information Technology
personnel within an organization. The data may also be
collected from sensors in the environment, including
traffic cameras, satellites, recording devices, etc. It
may also be obtained through interviews, downloads
from online sources, or reading documentation.
Page | 6
Data processing
The phases of the intelligence cycle used to convert
raw information into actionable intelligence or
knowledge are conceptually similar to the phases in
data analysis.
Data cleaning
Once processed and organized, the data may be
incomplete, contain duplicates, or contain errors. The
need for data cleaning will arise from problems in the
way that the datum are entered and stored. Data
cleaning is the process of preventing and correcting
these errors. Common tasks include record matching,
identifying inaccuracy of data, overall quality of
existing data, deduplication, and column segmentation.
Exploratory data analysis
Once the datasets are cleaned, they can then be
analysed. Analysts may apply a variety of techniques,
referred to as exploratory data analysis, to begin
understanding the messages contained within the
obtained data. The process of data exploration may
result in additional data cleaning or additional requests
for data; thus, the initialization of the iterative phases
mentioned in the lead paragraph of this section.
Descriptive statistics, such as, the average or median,
can be generated to aid in understanding the data.
Data visualization is also a technique used, in which the
analyst is able to examine the data in a graphical
format in order to obtain additional insights, regarding
the messages within the data.
Modelling and algorithms

Page | 7
Mathematical formulas or models (known as
algorithms), may be applied to the data in order to
identify relationships among the variables; for example,
using correlation or causation. In general terms, models
may be developed to evaluate a specific variable based
on other variable(s) contained within the dataset, with
some residual error depending on the implemented
model's accuracy

Python And Libraries

Python is a high-level, interpreted, general-purpose
programming language. Its design philosophy
emphasizes code readability with the use of significant

Page | 8
indentation. Python is dynamically-typed and garbage-
collected.
NumPy Library
NumPy, which stands for Numerical Python, is a library
consisting of multidimensional array objects and a
collection of routines for processing those arrays. Using
NumPy, mathematical and logical operations on arrays
can be performed.
Operations using NumPy
Using NumPy, a developer can perform the following
operations −
 Mathematical and logical operations on arrays.
 Fourier transforms and routines for shape
manipulation.
 Operations related to linear algebra. NumPy has in-
built functions for linear algebra and random
number generation.

1. Converting a list to n-dimensional NumPy array

numpy_array = np.array(list_to_convert)

2. Use of np.newaxis and np.reshape

np.newaxis is used to create new dimensions of

size 1.
For Eg:
a_numpy = np.array(a)
row_vector = a_numpy[:,np.newaxis]
col_vector = a_numpy[np.newaxis,:]
Similarly, np.reshape can be used to reshape any
array. For eg:
Page | 9
a = range(0,15)
b = a.reshape(3,5)

3. Converting any data type to NumPy array

Use np.asarray. For eg

b = np.asarray(a)
4. Get an n-dimensional array of zeros.

a = np.zeros(shape,dtype=type_of_zeros)
type of zeros can be int or float as it is required
eg.
a = np.zeros((3,4), dtype = np.float16)

5. Get an n-dimensional array of ones.

Similar to np.zeros:
a = np.ones((3,4), dtype=np.int32)

6. np.full and np.empty

1.np.full(shape_as_tuple,value_to_fill,dtype=type_y
ou_want)
a = np.full((2,3),1,dtype=np.float16)
a would be:
array([[1., 1., 1.],
[1., 1., 1.]], dtype=float16)
2. np.empty(shape_as_tuple,dtype=int)
a = np.empty((2,2),dtype=np.int16)
a would be:
array([[25824, 25701],
[ 2606, 8224]], dtype=int16)

Page | 10
7. Getting an array of evenly spaced values with
np.arrange and np.linspace

linspace:
np.linspace(start,stop,num=50,endpoint=bool_valu
e,retstep=bool_value)
np.linspace(1,2,num=5,endpoint=False,retstep=Tr
ue)
np.arange(start=where_to_start,stop=where_to_st
op,step=step_size)

8. Finding the shape of the NumPy array

array.shape
9. Knowing the dimensions of the NumPy array

x = np.array([1,2,3])

10. Finding the number of elements in the NumPy

array

x = np.ones((3,2,4),dtype=np.int16)

11. Get the memory space occupied by an n-

dimensional array
x.nbytes
12. Finding the data type of elements in the
NumPy array

x = np.ones((2,3), dtype=np.int16)
x.dtype will produce
dtype('int16')

13. How to create a copy of NumPy array

Page | 11
Use np.copy

y = np.array([[1,3],[5,6]])
x = np.copy(y)

14. Get transpose of an n-d array

Use array_name.T

15. Multiply two NumPy matrices

Use numpy.matmul to take matrix product of 2-D

matrices

16. Dot product of two arrays

np.dot(matrix1, matrix2)
a = np.array([[1,2,3],[4,8,16]])

17. Get cross-product of two numpy vectors

z = np.cross(x, y)

Pandas Library
Pandas is a software library written for the Python
programming language for data manipulation and
analysis. In particular, it offers data structures and
operations for manipulating numerical tables and time
series.
Library features:

Page | 12
 Data Frame object for data manipulation with
integrated indexing.
 Tools for reading and writing data between in-
memory data structures and different file formats.
 Data alignment and integrated handling of missing
data.
 Reshaping and pivoting of data sets.
 Label-based slicing, fancy indexing, and sub
setting of large data sets.
 Data structure column insertion and deletion.
 Group by engine allowing split-apply-combine
operations on data sets.
 Data set merging and joining.
 Hierarchical axis indexing to work with high-
dimensional data in a lower-dimensional data
structure.
 Time series-functionality: Date range generation
[6] and frequency conversions, moving window
statistics, moving window linear regressions, date
shifting and lagging.
 Provides data filtration.

Matplotlib: Visualization With Python

Matplotlib is a comprehensive library for creating
static, animated, and interactive visualizations in
Python. Matplotlib makes easy things easy and
hard things possible.

 Create publication quality plots.

 Make interactive figures that can zoom, pan,
update.
 Customize visual style and layout.
 Export to many file formats .

Page | 13
 Embed in JupyterLab and Graphical User
Interfaces.
 Use a rich array of third-party packages built
on Matplotlib.

Seaborn Library
Seaborn is a Python data visualization library based on
matplotlib. It provides a high-level interface for drawing
attractive and informative statistical graphics.
It provides beautiful default styles and color
palettes to make statistical plots more attractive. It is built on
the top of matplotlib library and also closely integrated to the
data structures from pandas.
Seaborn aims to make visualization the central
part of exploring and understanding data. It provides dataset-
oriented APIs, so that we can switch between different visual
representations for same variables for better understanding of
dataset.

Different categories of plot In Seaborn

Relational plots: This plot is used to understand the
relation between two variables.
Categorical plots: This plot deals with categorical
variables and how they can be visualized.
Distribution plots: This plot is used for examining
univariate and bivariate distributions
Regression plots: The regression plots in seaborn are
primarily intended to add a visual guide that helps to
Page | 14
emphasize patterns in a dataset during exploratory
data analyses.
Matrix plots: A matrix plot is an array of scatterplots.
Multi-plot grids: It is an useful approach is to draw
multiple instances of the same plot on different subsets
of the dataset.

Weather Dataset Analysis

Weather data set is a time series data set with per hour
information about the weather conditions of a particular
location. It records temperature ,dew point
temperature ,Relative Humidity ,Visibility, windspeed ,pressure

Page | 15
and conditions. The data is available as a CSV file. We are going
to analyse this data using pandas data frame.

Page | 21
Page | 22
Page | 23
IPL 2008-2020 Dataset Analysis With
Python
Data is taken from kaggle and contains ball-by-ball information
from IPL 2008 to IPL 2022 .We are going to analyse this data
using pandas data frame.

The data used here is of 2011 India Census of each district. This
data is available as a CSV file, downloaded from kaggle.

Universal Laws Unlocking The Secrets of The Universe 7 Natural Laws of The Universe (Creed McGregor) (Z-Library)
100% (2)
Universal Laws Unlocking The Secrets of The Universe 7 Natural Laws of The Universe (Creed McGregor) (Z-Library)
30 pages
Midtermessayexam 2
No ratings yet
Midtermessayexam 2
2 pages
All India 20 Crore Database Sample
100% (1)
All India 20 Crore Database Sample
304 pages
Report
No ratings yet
Report
18 pages
Cs3361 Data Science Laboratory
No ratings yet
Cs3361 Data Science Laboratory
139 pages
Clustering in Python-Dr. Afsaneh Javadi(1)
No ratings yet
Clustering in Python-Dr. Afsaneh Javadi(1)
8 pages
Programming For Data Science
No ratings yet
Programming For Data Science
48 pages
tool and lib in Data Science
No ratings yet
tool and lib in Data Science
32 pages
Self Intoduction 1 project
No ratings yet
Self Intoduction 1 project
11 pages
data science
No ratings yet
data science
42 pages
Data Manipulation and Visualization
No ratings yet
Data Manipulation and Visualization
21 pages
ML Lab File
No ratings yet
ML Lab File
43 pages
CS3361 - Data Science Laboratory
No ratings yet
CS3361 - Data Science Laboratory
31 pages
Unit 5
No ratings yet
Unit 5
11 pages
PRACTICAL FILE DL
No ratings yet
PRACTICAL FILE DL
14 pages
fdsa lab manual final
No ratings yet
fdsa lab manual final
70 pages
What is Big Data
No ratings yet
What is Big Data
8 pages
Python For Data Science
No ratings yet
Python For Data Science
22 pages
Data Visualization
No ratings yet
Data Visualization
25 pages
733702205-DSBDA-Mini-Project-Report
No ratings yet
733702205-DSBDA-Mini-Project-Report
9 pages
DSBDA - Mini Project Report
100% (1)
DSBDA - Mini Project Report
7 pages
ML Lab Manual
No ratings yet
ML Lab Manual
38 pages
Roadmap
No ratings yet
Roadmap
27 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
42 pages
AI
No ratings yet
AI
28 pages
ML 1
No ratings yet
ML 1
6 pages
Predictive Data Analytics With Python
100% (1)
Predictive Data Analytics With Python
97 pages
Data Analytics and Reporting - Notes Unit 1 and 2
No ratings yet
Data Analytics and Reporting - Notes Unit 1 and 2
11 pages
IJERT Data Analysis Using Python
No ratings yet
IJERT Data Analysis Using Python
6 pages
ICT202B AI ML and Emerging Technologies UNIT 2 (Advanced Phython Packages)
No ratings yet
ICT202B AI ML and Emerging Technologies UNIT 2 (Advanced Phython Packages)
20 pages
PDS Exp 4 To 6
No ratings yet
PDS Exp 4 To 6
9 pages
Lesson 1-ML-Sem2
No ratings yet
Lesson 1-ML-Sem2
16 pages
Unit-1
No ratings yet
Unit-1
84 pages
Python Programming: General-Purpose Libraries; NumPy,Pandas,Matplotlib,Seaborn,Requests,os & sys: Python, #2
From Everand
Python Programming: General-Purpose Libraries; NumPy,Pandas,Matplotlib,Seaborn,Requests,os & sys: Python, #2
e3
No ratings yet
Experiment No: 1 Introduction To Data Analytics and Python Fundamentals Page-1/11
No ratings yet
Experiment No: 1 Introduction To Data Analytics and Python Fundamentals Page-1/11
8 pages
First
No ratings yet
First
35 pages
ML_lab
No ratings yet
ML_lab
30 pages
PDS Qba
No ratings yet
PDS Qba
12 pages
ML Aml Cse It Lab Manual Final
No ratings yet
ML Aml Cse It Lab Manual Final
22 pages
Exp-1
No ratings yet
Exp-1
22 pages
DSBDA Lab Manual
No ratings yet
DSBDA Lab Manual
155 pages
Advanced Python Lab
No ratings yet
Advanced Python Lab
17 pages
PDS_Exp_7_to_9
No ratings yet
PDS_Exp_7_to_9
10 pages
Final Fds Manual
No ratings yet
Final Fds Manual
77 pages
DAL Oral Question Bank
No ratings yet
DAL Oral Question Bank
7 pages
lab2report
No ratings yet
lab2report
6 pages
ds with py
No ratings yet
ds with py
39 pages
DS 2
No ratings yet
DS 2
38 pages
PP&DS UNIT III
No ratings yet
PP&DS UNIT III
26 pages
unit 5
No ratings yet
unit 5
28 pages
Data Science Notes
No ratings yet
Data Science Notes
13 pages
D P Lab Manual
No ratings yet
D P Lab Manual
54 pages
Ml record_merged (1)
No ratings yet
Ml record_merged (1)
29 pages
AI/ML python modules
No ratings yet
AI/ML python modules
17 pages
Lab - Manual FDS
No ratings yet
Lab - Manual FDS
12 pages
DSBDA Lab Manual
No ratings yet
DSBDA Lab Manual
155 pages
Ass-1 Prac
No ratings yet
Ass-1 Prac
23 pages
Types of Digital Data
No ratings yet
Types of Digital Data
22 pages
ML File Updated
No ratings yet
ML File Updated
60 pages
Fundamentals of Data Science Students
No ratings yet
Fundamentals of Data Science Students
52 pages
PDS Lab Manual_23 om
No ratings yet
PDS Lab Manual_23 om
97 pages
Ipl Data Analysis Pbl II-II
No ratings yet
Ipl Data Analysis Pbl II-II
11 pages
Data Preprocessing and Data Analysis using Python
No ratings yet
Data Preprocessing and Data Analysis using Python
32 pages
Pressure Sensor 8212 Fuel Tank Removal and Installation
No ratings yet
Pressure Sensor 8212 Fuel Tank Removal and Installation
1 page
EN1201: Introductory Mathematics: University of Colombo, Sri Lanka
100% (1)
EN1201: Introductory Mathematics: University of Colombo, Sri Lanka
6 pages
Chubb Limited A.I 18.07.2024
No ratings yet
Chubb Limited A.I 18.07.2024
66 pages
Business Trivia Quiz Bee REVIEWER Sduque
No ratings yet
Business Trivia Quiz Bee REVIEWER Sduque
12 pages
DOROTHEA OREM - (Self-Care Deficit Nursing Theory)
No ratings yet
DOROTHEA OREM - (Self-Care Deficit Nursing Theory)
30 pages
Wushu-Exalted Rules 1.5
No ratings yet
Wushu-Exalted Rules 1.5
8 pages
17347625806373EoVDqwFdaoKoSKF (1)
No ratings yet
17347625806373EoVDqwFdaoKoSKF (1)
5 pages
JBS C Index
No ratings yet
JBS C Index
3 pages
DA Portfolio Project
No ratings yet
DA Portfolio Project
16 pages
MH Digital Panel Meters
No ratings yet
MH Digital Panel Meters
4 pages
arvng
No ratings yet
arvng
4 pages
Double Cross Character Sheet
No ratings yet
Double Cross Character Sheet
9 pages
How To Teach Adverbs of Frequency in English - Wall Street English
No ratings yet
How To Teach Adverbs of Frequency in English - Wall Street English
9 pages
JPSP - 2022 - 600
No ratings yet
JPSP - 2022 - 600
7 pages
Lec 4 Expert Systems
No ratings yet
Lec 4 Expert Systems
26 pages
Unit 16. Assignment 02 - Brief
No ratings yet
Unit 16. Assignment 02 - Brief
39 pages
Vineland Adaptive Behavior Scales
100% (2)
Vineland Adaptive Behavior Scales
17 pages
Project PDF
No ratings yet
Project PDF
4 pages
FINAL_VARIABEL 3 (Rapi)_13 Feb 2025
No ratings yet
FINAL_VARIABEL 3 (Rapi)_13 Feb 2025
20 pages
(Saha, P. Taylor, P.) The Astronomers' Magic Envelope
No ratings yet
(Saha, P. Taylor, P.) The Astronomers' Magic Envelope
146 pages
MAV Exam 2 Solutions
No ratings yet
MAV Exam 2 Solutions
29 pages
Master
No ratings yet
Master
37 pages
Project Specification: Motor-Control Centers For Hvac
No ratings yet
Project Specification: Motor-Control Centers For Hvac
15 pages
Review of Related Literature
No ratings yet
Review of Related Literature
34 pages
Haitian-English Dictionary 2000
No ratings yet
Haitian-English Dictionary 2000
793 pages
11 DIY Christmas Decorations and Gift Ideas PDF
No ratings yet
11 DIY Christmas Decorations and Gift Ideas PDF
47 pages
Active
No ratings yet
Active
3 pages