JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY KAKINADA
KAKINADA – 533 003, Andhra Pradesh, India
B. Tech AI & ML (R23-COURSE STRUCTURE & SYLLABUS)
(Applicable from the academic year 2023-24 and onwards)
EXPLORATORY DATA ANALYSIS L T P C
III B. Tech I Semester WITH PYTHON 3 0 0 3
(SKILL ENHANCEMENT COURSE)
Course Objectives: The main objectives of the course are to
Introduce the fundamentals of Exploratory Data Analysis
Cover essential exploratory techniques for understanding multivariate data by
summarizing it through statistical methods and graphical methods.
Evaluate the Models and select the best model
UNIT-I: Exploratory Data Analysis Fundamentals: Understanding data science, The
significance of EDA, Steps in EDA, Making sense of data, Numerical data, Categorical data,
Measurement scales, Comparing EDA with classical and Bayesian analysis, Software tools
available for EDA, Getting started with EDA.
Sample Experiments:
1. a) Download Dataset from Kaggle using the following link :
https://www.kaggle.com/datasets/sukhmanibedi/cars4u
b) Install python libraries required for Exploratory Data Analysis (numpy, pandas, matplotlib,
seaborn)
2. Perform Numpy Array basic operations and Explore Numpy Built-in functions.
3. Loading Dataset into pandas dataframe
4. Selecting rows and columns in the dataframe
UNIT-II: Visual Aids for EDA: Technical requirements, Line chart, Bar charts, Scatter plot
using seaborn, Polar chart, Histogram, Choosing the best chart
Case Study: EDA with Personal Email, Technical requirements, Loading the dataset, Data
transformation, Data cleansing, Applying descriptive statistics, Data refactoring, Data
analysis.
Sample Experiments:
1. Apply different visualization techniques using sample dataset
a. Line Chart b. Bar Chart c. Scatter Plots d.Bubble Plot
2. Generate Scatter Plot using seaborn library for iris dataset
3. Apply following visualization Techniques for a sample dataset
a. Area Plot b. Stacked Plot c. Pie chart d. Table Chart
4. Generate the following charts for a dataset.
a. Polar Chart b. Histogram c.Lollipop chart
Case Study: Perform Exploratory Data Analysis with Personal Email Data
Page 21 of 58
JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY KAKINADA
KAKINADA – 533 003, Andhra Pradesh, India
B. Tech AI & ML (R23-COURSE STRUCTURE & SYLLABUS)
(Applicable from the academic year 2023-24 and onwards)
UNIT-III: Data Transformation: Merging database-style dataframes, Concatenating along
with an axis, Merging on index, Reshaping and pivoting, Transformation techniques,
Handling missing data, Mathematical operations with NaN, Filling missing values,
Discretization and binning, Outlier detection and filtering, Permutation and random sampling,
Benefits of data transformation, Challenges.
Sample Experiments:
1. Perform the following operations
a) Merging Dataframes b) Reshaping with Hierarchical Indexing
c) Data Deduplication d) Replacing Values
2. Apply different Missing Data handling techniques
a)NaN values in mathematical Operations b) Filling in missing data
c) Forward and Backward filling of missing values d) Filling with index values
e) Interpolation of missing values
3. Apply different data transformation techniques
a) Renaming axis indexes b)Discretization and Binning
c) Permutation and Random Sampling d) Dummy variables
UNIT-IV: Descriptive Statistics: Distribution function, Measures of central tendency,
Measures of dispersion, Types of kurtosis, Calculating percentiles, Quartiles, Grouping
Datasets, Correlation, Understanding univariate, bivariate, multivariate analysis, Time Series
Analysis
Sample Experiments:
1. Study the following Distribution Techniques on a sample data
a) Uniform Distribution b) Normal Distribution
c) Gamma Distribution d) Exponential Distribution
e) Poisson Distribution f) Binomial Distribution
2. Perform Data Cleaning on a sample dataset.
3. Compute measure of Central Tendency on a sample dataset
a) Mean b)Median c)Mode
4. Explore Measures of Dispersion on a sample dataset
a) Variance b) Standard Deviation c) Skewness d) Kurtosis
5. a) Calculating percentiles on sample dataset
b) Calculate Inter Quartile Range(IQR) and Visualize using Box Plots
6. Perform the following analysis on automobile dataset.
a) Bivariate analysis b)Multivariate analysis
7. Perform Time Series Analysis on Open Power systems dataset
UNIT-V: Model Development and Evaluation: Unified machine learning workflow, Data
pre-processing, Data preparation, Training sets and corpus creation, Model creation and
training, Model evaluation, Best model selection and evaluation, Model deployment
Case Study: EDA on Wine Quality Data Analysis
Page 22 of 58
JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY KAKINADA
KAKINADA – 533 003, Andhra Pradesh, India
B. Tech AI & ML (R23-COURSE STRUCTURE & SYLLABUS)
(Applicable from the academic year 2023-24 and onwards)
Sample Experiments:
1. Perform hypothesis testing using stats models library
a) Z-Test b)T-Test
2. Develop model and Perform Model Evaluation using different metrics such as prediction
score, R2 Score, MAE Score, MSE Score.
Case Study: Perform Exploratory Data Analysis with Wine Quality Dataset
Text Book:
1. Suresh Kumar Mukhiya, Usman Ahmed, Hands-On Exploratory Data Analysis with
Python, Packt Publishing, 2020.
Reference Books:
1. Ronald K. Pearson, Exploratory Data Analysis Using R, CRC Press, 2020
2. RadhikaDatar, Harish Garg, Hands-On Exploratory Data Analysis with R: Become an
expert in exploratory data analysis using R packages, 1st Edition, Packt Publishing,
2019
Web References:
1. https://github.com/PacktPublishing/Hands-on-Exploratory-Data-Analysis-with-Python
2. https://www.analyticsvidhya.com/blog/2022/07/step-by-step-exploratory-dataanalysis-
eda-using-python/#h-conclusion
3. https://github.com/PacktPublishing/Exploratory-Data-Analysis-with-Python-Cookbook
Page 23 of 58