0% found this document useful (0 votes)

66 views18 pages

Naan Mudhalvan Data Analytics Course For Engineering Students

The document outlines a Data Analytics course project conducted by an engineering student at Anna University, focusing on Exploratory Data Analysis (EDA) across three datasets: Global Superstore Sales, COVID-19 Global Data, and YouTube Trending Videos. Each section includes steps for data loading, cleaning, analysis, and visualization, along with insights derived from the analyses. The project aims to provide practical experience in data analytics techniques and tools.

Uploaded by

thabeswar2003

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

66 views18 pages

Naan Mudhalvan Data Analytics Course For Engineering Students

Uploaded by

thabeswar2003

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

NAAN MUDHALVAN DATA ANALYTICS

COURSE FOR ENGINEERING STUDENTS

ingage

SUBMITTED BY:
SURYA.R(au422522104305)

NM1069-DATA ANALYTICS

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Year/Semester-III/VI

UNIVERSITY COLLEGE OF ENGINEERING VILLUPURAM

(A CONSTITUENT COLLEGE OF ANNA UNIVERSITY CHENNAI)

VILLUPURAM – 605 103

ANNA UNIVERSITY: CHENNAI 600 025

APRIL 2025
UNIVERSITY COLLEGE OF ENGINEERING VILLUPURAM
(A CONSTITUENT COLLEGE OF ANNA UNIVERSITY CHENNAI)

VILLUPURAM – 605 103

Department of Computer Science and

Engineering

Bonafide record of work done in the Computer Laboratory of University College Of

Engineering Villupuram for NM1069-NAAN MUDHALVAN Data Analytics
Course by Google during the year 2024-2025
by………………………………………........Reg.No: .......................................................
Studying in the Sixth Semester B.E. (Computer Science and Engineering).

Staff In-Charge Head of the Department

Submitted for the practical examination held at University College of Engineering

Villupuram on ………………….

Internal Examiner External Examiner

INDEX
S.NO TOPIC SIGN

1. EDA ON GLOBAL SUPERSTORE SALES

DATASET

2. EDA ON COVID-19 GLOBAL DATASET

3. EDA ON YOUTUBE TRENDING VIDEOS

DATASET
EX.No:1 EDA ON GLOBAL SUPERSTORE SALES DATASET

EXPLORATORY DATA ANALYSIS (EDA):

Exploratory Data Analysis (EDA) is the process of examining and understanding a
dataset before applying any modeling or predictive techniques. It involves summarizing the
dataset’s main characteristics using statistical measures and visualizations to uncover patterns,
spot anomalies, test hypotheses, and check assumptions. EDA typically includes cleaning the
data (handling missing values and duplicates), generating descriptive statistics (like mean,
median, and standard deviation), and using plots such as histograms, bar charts, and line graphs
to visualize trends and relationships. This step is crucial for gaining insights and making
informed decisions about the direction of further analysis or modeling.

DATA SOURCE:

Dataset link: https://www.kaggle.com/datasets/fatihilhan/global-superstore-dataset

STEP 1: LOAD THE DATASET

PROGRAM:

import pandas as pd

file_path = "/content/GLOBAL DATASTORE.csv"

df = pd.read_csv(file_path)

OUTPUT:
STEP 2:DATA CLEANING

 Check and remove missing values

 Remove duplicates

PROGRAM:

df.dropna(inplace=True)

df.drop_duplicates(inplace=True)

STEP 3: SUMMARY STATISTICS

PROGRAM:

sales_summary = df["Sales"].describe()[["mean", "50%", "std"]]

profit_summary = df["Profit"].describe()[["mean", "50%", "std"]]

print("Sales Summary:\n", sales_summary)

print("Profit Summary:\n", profit_summary)

OUTPUT:

Sales Summary:
mean 246.498440
50% 85.000000
std 487.567175
Name: Sales, dtype: float64
Profit Summary:
mean 28.610982
50% 9.240000
std 174.340972
Name: Profit, dtype: float64
STEP 4: ANALYSIS
Total Sales per Region
PROGRAM:
sales_per_region = df.groupby("Region")["Sales"].sum()
print(sales_per_region)
OUTPUT:
Region
Africa 783776
Canada 66932
Caribbean 324281
Central 2822399
Central Asia 752839
EMEA 806184
East 678834
North 1248192
North Asia 848349
Oceania 1100207
South 1600960
Southeast Asia 884438
West 725514
Name: Sales, dtype: int64

Line Chart: Year-wise Sales Trend

PROGRAM:
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(style="whitegrid")
plt.figure(figsize=(10, 6))
yearly_sales = df.groupby('Year')['Sales'].sum()
sns.lineplot(x=yearly_sales.index, y=yearly_sales.values, marker='o',
color='orange')
plt.title("Year-wise Sales Trend")
plt.xlabel("Year")
plt.ylabel("Total Sales")
plt.tight_layout()
plt.show()
OUTPUT:

GOOGLE COLAB LINK:

https://colab.research.google.com/drive/12ok_SXN84wnqSQL9AzV4OA4e7kD
ohCWQ?usp=sharing
STEP 6: INSIGHTS
Bar Chart – Sales by Region:
 The West region shows the highest total sales, followed by East and
Central.
 South lags behind, indicating potential for growth or marketing focus.

Line Chart – Year-wise Sales Trend:

 Sales have shown a steady upward trend year over year.
 Indicates growing business or improved operations/logistics over time.
Ex.No:2 EDA ON COVID-19 GLOBAL DATASET

INTRODUCTION:
The COVID-19 pandemic, caused by the SARS-CoV-2 virus, has had a profound global
impact since early 2020, affecting millions of lives and disrupting economies. To better
understand the spread, trends, and regional impact of the virus, data-driven approaches such as
Exploratory Data Analysis (EDA) are essential. By exploring confirmed cases, recoveries, and
deaths, this analysis aims to uncover insights into the progression of the pandemic, identify the
most affected states, and visualize daily trends in new infections.

Dataset link: https://www.kaggle.com/datasets/ COVID-19 in India

GOOGLE COLAB LINK:

https://colab.research.google.com/drive/1VEuFN6gRCyIMnIEkwccqlMqENFi11BRv?usp=s
haring

STEP 1: LOAD AND INSPECT THE DATASET

PROGRAM:
import pandas as pd
df = pd.read_csv('path_to_covid_dataset.csv')
print(df.head())
OUTPUT:
PROGRAM:
print(df.columns)

print(df.info())

OUTPUT:

STEP 2: HANDLE MISSING DATA AND CONVERT DATES

PROGRAM:
df.fillna(0, inplace=True)
df['Date'] = pd.to_datetime(df['Date'])

STEP 3: COMPUTE METRICS

a) Total confirmed, recovered, and death cases per state:
PROGRAM:
statewise_total=df.groupby('State/UnionTerritory')[['Confirmed','Cured',
'Deaths']].max().reset_index()
print(statewise_total)

OUTPUT:
b) State with the highest number of confirmed cases:
PROGRAM:
top_state=statewise_total[statewise_total['Confirmed']==
statewise_total['Confirmed'].max()]
print("State with highest confirmed cases:\n", top_state)
OUTPUT:
State with highest confirmed cases:
State/UnionTerritory Confirmed Cured Deaths
27 Maharashtra 6363442 6159676 134201

c) Daily trend of new cases:

PROGRAM:
daily_cases = df.groupby('Date')['Confirmed'].sum().diff().fillna(0)

STEP 4: VISUALIZATIONS
a) Pie Chart: Top 5 States by Confirmed Cases
PROGRAM:
import matplotlib.pyplot as plt
top5_states = statewise_total.sort_values('Confirmed', ascending=False).head(5)
plt.figure(figsize=(8, 8))
plt.pie(top5_states['Confirmed'],labels=top5_states['State/UnionTerritory'],
autopct='%1.1f%%', startangle=140)
plt.title('Top 5 Indian States by Confirmed COVID-19 Cases')
plt.show()
OUTPUT:

b) Line Graph: Daily Trend of Confirmed Cases

PROGRAM:
plt.figure(figsize=(10, 6))
plt.plot(daily_cases.index, daily_cases.values, color='blue')
plt.title('Daily New Confirmed COVID-19 Cases in India')
plt.xlabel('Date')
plt.ylabel('New Cases')
plt.grid(True)
plt.show()
OUTPUT:

STEP 5:OBSERVATION
 Top affected states (e.g., Maharashtra, Kerala, Karnataka) account for the
majority of confirmed cases.
 Trend graph shows multiple waves—sharp increases followed by
declines.
 Lockdown periods and vaccination rollouts align with noticeable trend
changes.
 Deaths and recovery rates vary by region and wave, highlighting
healthcare disparities.
Ex.No:3 EDA ON YOUTUBE TRENDING VIDEOS DATASET

INTRODUCTION:
YouTube has become a dominant platform for video sharing, content
creation, and audience engagement worldwide. The YouTube Trending Videos
Dataset provides a snapshot of videos that were trending in various regions over
time, offering valuable insights into user preferences, content popularity, and
engagement metrics.
This Exploratory Data Analysis (EDA) aims to uncover trends in video
categories, the frequency of trending videos across different channels, and
patterns in user interactions such as views, likes, and comments. By analyzing
this data, we can better understand what makes a video trend, which content types
perform best, and how users engage with trending content.
DATA SOURCE:

Dataset link: https://www.kaggle.com/datasets/anushabellam/Trending videos on

Youtube

STEP 5:OBSERVATION
 Top Categories: Certain categories like music, entertainment, and news
dominate the trending list.
 Channel Popularity: A few channels consistently produce trending content.
 Engagement Patterns: There's a strong positive correlation between views
and likes.
 Outliers: Some videos have extremely high views but relatively low
likes/comments, suggesting passive viewing.

Naan Mudhalvan Data Analytics Course For Engineering Students
No ratings yet
Naan Mudhalvan Data Analytics Course For Engineering Students
18 pages
Naan Mudhalvan Data Analytics Course For Engineering Students
No ratings yet
Naan Mudhalvan Data Analytics Course For Engineering Students
18 pages
INDEX
No ratings yet
INDEX
16 pages
Naan Mudhalvan - Google Cloud Data Analytics
No ratings yet
Naan Mudhalvan - Google Cloud Data Analytics
33 pages
Data Analytics Course for Beginners
No ratings yet
Data Analytics Course for Beginners
34 pages
PracticalMachine Learning
No ratings yet
PracticalMachine Learning
32 pages
Exploratory Data Analysis: by Neha Mathur
No ratings yet
Exploratory Data Analysis: by Neha Mathur
14 pages
Chapter 2. Data Analysis and Processing - Full
No ratings yet
Chapter 2. Data Analysis and Processing - Full
49 pages
Knowledge Institute of Technology: (An Autonomous Institution)
No ratings yet
Knowledge Institute of Technology: (An Autonomous Institution)
33 pages
CS202 Assignment - 4 - GIKI
No ratings yet
CS202 Assignment - 4 - GIKI
3 pages
Exploratory Data Analysis: by Neha Mathur
No ratings yet
Exploratory Data Analysis: by Neha Mathur
14 pages
Data Analysis & Visualization Guide
No ratings yet
Data Analysis & Visualization Guide
9 pages
EDA Basics: Python for Data Analysis
100% (1)
EDA Basics: Python for Data Analysis
30 pages
Eda Lab Manual
No ratings yet
Eda Lab Manual
34 pages
Python Data Analysis with Numpy & Pandas
No ratings yet
Python Data Analysis with Numpy & Pandas
3 pages
ccs346 Eda Lab Manual
No ratings yet
ccs346 Eda Lab Manual
41 pages
Da Laqs Saqs
No ratings yet
Da Laqs Saqs
23 pages
Dev Record Aids
No ratings yet
Dev Record Aids
24 pages
Dev Record Final
No ratings yet
Dev Record Final
34 pages
Data Science & EDA Essentials
No ratings yet
Data Science & EDA Essentials
151 pages
STQS2223 CH 4
No ratings yet
STQS2223 CH 4
30 pages
EDA Techniques in SAS for Data Science
No ratings yet
EDA Techniques in SAS for Data Science
25 pages
Eda Lab Manual
No ratings yet
Eda Lab Manual
40 pages
Data Acquisition and EDA Techniques
No ratings yet
Data Acquisition and EDA Techniques
58 pages
Unit 1 - Intro To EDA
No ratings yet
Unit 1 - Intro To EDA
40 pages
P23MBA547 Predictive Analytics
No ratings yet
P23MBA547 Predictive Analytics
133 pages
Exploratory Data Analysis (EDA) Guide
No ratings yet
Exploratory Data Analysis (EDA) Guide
16 pages
Dev Practical List
No ratings yet
Dev Practical List
34 pages
Da Pra Week-8 (Karthik S) - 074713
No ratings yet
Da Pra Week-8 (Karthik S) - 074713
9 pages
EDA
100% (1)
EDA
9 pages
Exploratory Data Analysis-1
No ratings yet
Exploratory Data Analysis-1
10 pages
Some Exercises
No ratings yet
Some Exercises
9 pages
EXP - NO:1 Installation of Data Analysis and Visualization Tool Aim: Objectives
No ratings yet
EXP - NO:1 Installation of Data Analysis and Visualization Tool Aim: Objectives
34 pages
Guide Eda Python 2
No ratings yet
Guide Eda Python 2
30 pages
Ad3301 Set4
No ratings yet
Ad3301 Set4
4 pages
DEV Lab Material
No ratings yet
DEV Lab Material
16 pages
Data Analyst Course
No ratings yet
Data Analyst Course
8 pages
Systematic Approach To Perform Task Centric Exploratory Data Analysis With Case Study
No ratings yet
Systematic Approach To Perform Task Centric Exploratory Data Analysis With Case Study
8 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
23 pages
Dev Answer Key
No ratings yet
Dev Answer Key
21 pages
Ad3301 Set2
No ratings yet
Ad3301 Set2
3 pages
Explorato Ry: Data Analysis
No ratings yet
Explorato Ry: Data Analysis
6 pages
Ad3301 Data Exploration and Visualization
No ratings yet
Ad3301 Data Exploration and Visualization
38 pages
Big Data Report
No ratings yet
Big Data Report
6 pages
Data Visualization Exam Guide
100% (1)
Data Visualization Exam Guide
4 pages
Ad3301 Dev Splitup
No ratings yet
Ad3301 Dev Splitup
5 pages
Exploratory Data Analysis Course
100% (1)
Exploratory Data Analysis Course
139 pages
Unit 1
No ratings yet
Unit 1
52 pages
Lab07ML - f40
No ratings yet
Lab07ML - f40
13 pages
Exploratory Data Analysis Guide
No ratings yet
Exploratory Data Analysis Guide
38 pages
Machine
No ratings yet
Machine
10 pages
Ad3301 Unit 1
No ratings yet
Ad3301 Unit 1
15 pages
Data Analysis and Data Science Task - 2
No ratings yet
Data Analysis and Data Science Task - 2
3 pages
AD3301 Data Exploration and Visualization
No ratings yet
AD3301 Data Exploration and Visualization
278 pages
Phan B I Châu 19-20
100% (1)
Phan B I Châu 19-20
12 pages
COSCO Delivery Order for Customs
No ratings yet
COSCO Delivery Order for Customs
1 page
The Most Amazing Chocolate Cake - Print
No ratings yet
The Most Amazing Chocolate Cake - Print
2 pages
Cattle Reproduction and Breeding Guide
No ratings yet
Cattle Reproduction and Breeding Guide
96 pages
Ejercicios de Pasado Simple en Ingles PDF
No ratings yet
Ejercicios de Pasado Simple en Ingles PDF
4 pages
Mauritius Aquaculture Expansion and Shark Monitoring
No ratings yet
Mauritius Aquaculture Expansion and Shark Monitoring
7 pages
Apeksha - New
No ratings yet
Apeksha - New
65 pages
EMCP 4.2 Wiring - Diagrams
100% (2)
EMCP 4.2 Wiring - Diagrams
2 pages
Understanding COPD: Causes, Symptoms, and Management
No ratings yet
Understanding COPD: Causes, Symptoms, and Management
40 pages
Assignment Ansoff
100% (1)
Assignment Ansoff
3 pages
Aspiring Accountant's Journey
No ratings yet
Aspiring Accountant's Journey
2 pages
Water Cooled Chiller Plant (CP/VS) : Design Envelope Application Guide
No ratings yet
Water Cooled Chiller Plant (CP/VS) : Design Envelope Application Guide
10 pages
Practice Test 1: Performance Indicators (Pis) Division Target School Performance
No ratings yet
Practice Test 1: Performance Indicators (Pis) Division Target School Performance
6 pages
Sustainable Development, Defining The Concept: Lecture 4 (Chapter 5)
No ratings yet
Sustainable Development, Defining The Concept: Lecture 4 (Chapter 5)
20 pages
PSP Manual 10
No ratings yet
PSP Manual 10
12 pages
PMRC$MAR$VA$107 Ball Valves
No ratings yet
PMRC$MAR$VA$107 Ball Valves
13 pages
Mri Knee
80% (5)
Mri Knee
159 pages
Occupational Health and Safety Thesis PDF
100% (3)
Occupational Health and Safety Thesis PDF
6 pages
Report On Site Mapping and Community Consultation
No ratings yet
Report On Site Mapping and Community Consultation
7 pages
RBI Grade B Question Papers 2019 PDF With Answer
No ratings yet
RBI Grade B Question Papers 2019 PDF With Answer
64 pages
M4 Standard Packet
No ratings yet
M4 Standard Packet
30 pages
Differential Equation: Dr. Bulbul Jan
No ratings yet
Differential Equation: Dr. Bulbul Jan
25 pages
Course Outline Scsaas Unified
No ratings yet
Course Outline Scsaas Unified
4 pages
Pond's Relaunch Strategy
No ratings yet
Pond's Relaunch Strategy
15 pages
Mahamaya Technicaluniversity,: Syllabus For First Year of Master of Business Administration (Mba)
No ratings yet
Mahamaya Technicaluniversity,: Syllabus For First Year of Master of Business Administration (Mba)
34 pages
Isotonic Beverages Enhance Hydration
100% (1)
Isotonic Beverages Enhance Hydration
7 pages
2023 Nelson's Pediatric Antimicrobial Therapy, 29e (Feb 15, 2023) - (1610026500) - (American Academy of Pediatrics) John S. Bradley
100% (2)
2023 Nelson's Pediatric Antimicrobial Therapy, 29e (Feb 15, 2023) - (1610026500) - (American Academy of Pediatrics) John S. Bradley
47 pages
Maintenance Department Responsibilities
No ratings yet
Maintenance Department Responsibilities
3 pages
Philippine Agriculture Extension Insights
No ratings yet
Philippine Agriculture Extension Insights
50 pages
Yats BASIC INTELLIGENCE
100% (2)
Yats BASIC INTELLIGENCE
24 pages