Si 1
Si 1
on
Visualization of Covid pattern using Data Analytics
During
III Year II Semester Summer Submit-
ted to
The Department of Information Technology
In partial fulfillment of the academic requirements of Jawa-
harlal Nehru Technological University
For
The award of the degree of
Bachelor of Technol-
ogy
in Informa-
tion Technology
By
P. Devender Reddy 20311A1205
Page | 1
Department of Information Technology
Sreenidhi Institute of Science and Technol-
ogy
CERTIFICATE
This is to certify that this Summer Industry Internship –II report on “Visualization of Covid-19
Patterns using Data Analytics”, submitted by P.Devender Reddy(20311A1205) in the year
2023 in partial fulfillment of the academic requirements of Jawaharlal Nehru Technological Uni-
versity for the award of the degree of Bachelor of Technology in Computer Science and Engi-
neering, is a Bonafide work- summer industry internship that has been carried out during III B
Tech IT II semester, will be evaluated in IV B Tech IT I Semester , under our guidance. This
report has not been submitted to any other institute or university for the award of any degree.
External Examiner
Date:
Page | 2
DECLARATION
I, P. Devender Reddy(20311A1205) student at SREENIDHI INSTITUTE OF SCIENCE AND
TECHNOLOGY YAMNAMPET, GHATKESAR,
studying IVth year Ist semester, INFORMATION TECHNOLOGY solemnly declare that the
Summer Industry Internship-II report, titled “VISUALIZATION OF COVID-19 PATTERNS US-
ING DATA ANALYTICS” is submitted to SREENIDHI INSTITUTE OF SCIENCE
ANDTECHNOLOGY for partial fulfillment for the award of degree of Bachelor of technology in
INFORMATION TECHNOLOGY.
It is declared to the best of our knowledge that the work reported does not form part of any disserta-
tion submitted to any other University or Institute for award of any degree.
Page | 3
Page | 4
Page | 5
Page | 6
ACKNOWLEDGEMENT
I would like to express my gratitude to all the people behind the screen who helped me to transform an
idea into areal application.
I would like to express my heart-felt gratitude to my parents without whom I would not have been privi-
leged to achieve and fulfill my dreams. I am grateful to our principal, Dr. T. Ch. Siva Reddy, who most
ably run the institution and has had the major hand in enabling me to do my project.
I profoundly thank, Head of the Department of Computer Science & Engineering who has been an ex-
cellent guideand also a great source of inspiration to my work.
I would like to thank my Project coordinator for technical guidance, constant encouragement and sup -
port in carrying out my project at college.
The satisfaction and euphoria that accompany the successful completion of the task would be great but
incompletewithout the mention of the people who made it possible with their constant guidance and en-
couragement crowns all the efforts with success. In this context, I would like to thank all the other staff
members, both teaching and non-teaching, who have extended their timely help and eased my task.
Page | 7
INDEX Page No
Abstract 6
Libraries Used
13
UML Diagrams
14
Code 15
Output Screenshots 17
Conclusion 20
References 21
Page | 8
ABSTRACT
Moreover, the abstract emphasizes the integration of advanced technologies, such as machine
learning and artificial intelligence, in enhancing the capabilities of data analytics. These tech-
nologies empower organizations to extract meaningful insights from complex datasets, automate
decision processes, and anticipate future trends, fostering a more proactive and strategic approach
to decision-making.Furthermore, the abstract addresses the challenges associated with data ana-
lytics implementation, including data quality, privacy concerns, and the need for skilled profes-
sionals. It discusses the importance of a robust data governance framework to ensure the reliabil-
ity and ethical use of data in decision-making processes.
The abstract concludes by underscoring the far-reaching impact of data analytics on organiza-
tional efficiency, innovation, and agility. It advocates for a holistic approach to data-driven deci-
sion- making, emphasizing the need for a cultural shift within organizations to embrace a data-
centric mindset. As data analytics continues to evolve, its role in driving intelligent decision-
making will undoubtedly be a cornerstone for sustainable growth and competitive advantage in
the digital era.
Page | 9
1. INTRODUCTION
The COVID-19 pandemic has reshaped the global landscape, challenging healthcare systems,
economies, and societal norms. Amid this unprecedented crisis, understanding the multifaceted pat-
terns of the virus's spread, itsimpact on communities, and the effectiveness of interventions is pivotal.
Leveraging the power of data analytics,this project aims to unravel the intricate web of COVID-19 by
visually presenting its patterns and trends.
Objective:
The primary objective of this project is to employ data analytics techniques to analyze and interpret di-
verse COVID-19 datasets. By transforming raw data into insightful visual representations, this en-
deavor seeks to provide a comprehensive understanding of the virus's spread, its regional variations,
demographic influences, and the efficacy of mitigation strategies.
Data Aggregation Platforms: Existing systems encompass various data aggregation platforms that
compile COVID-19-related data from global health organizations, government sources, and research
institutions. Theseplatforms serve as repositories for diverse datasets, including infection rates, testing
numbers, vaccination progress, and demographic information.
Visualization Tools and Dashboards: Several visualization tools and dashboards have been devel-
oped to represent COVID-19 data in graphical formats. These tools often provide interactive
Page | 10
features,
Page | 11
allowing users toexplore data trends, geographical distributions, and comparative analyses through
maps, charts, and graphs.
Data Analytics Models: The current system involves the utilization of data analytics models to analyze
and deriveinsights from COVID-19 datasets. Techniques such as machine learning, statistical analysis,
and predictive modeling are employed to identify patterns, correlations, and predictive trends in the
data.
Public Accessibility: Efforts have been made to make COVID-19 data and
visualizations accessible to thepublic. Many existing systems prioritize
user-
friendly interfaces, enabling
individuals, healthcare professionals, policymakers, and researchers to access and interpret the
dataeasily.
PROPOSED SYSTEM:
2. Data-Driven Decision-Making:
Page | 12
Software Requirements and Hardware:
The software and hardware requirements for a project involving the visualization of COVID-19 patterns us-
ingdata analytics depend on the scale of analysis, tools, and platforms chosen. Here's a general overview:
Software Requirements:
1. Data Analysis Tools: Software like Python (with libraries like Pandas, NumPy, Mat-
plotlib, Seaborn), R, orSQL for data manipulation, analysis, and statistical modeling.
2. Visualization Libraries: Tools such as Matplotlib, Seaborn, Plotly, Tableau, or Power BI for
creatinggraphs, charts, maps, and interactive dashboards.
3. Geospatial Tools: Geographic Information System (GIS) software like QGIS or ArcGIS for
mapping andanalyzing spatial data if geographical analysis is involved.
4. Data Cleaning and Preprocessing Tools: Software like Excel, Open Refine, or Python/R pack-
ages for datacleaning, preprocessing, and transforming raw data into usable formats.
5. Version Control and Collaboration: Platforms like GitHub for version control and collaboration
if workingin a team.
Hardware Requirements:
1. Computing Power: Depending on the size of datasets and complexity of analyses, a computer with
a reasonable amount of RAM (8GB or more), a fast processor, and sufficient storage space.
Page | 13
2. Graphics Processing Unit (GPU): For larger-scale data processing or machine learning models, a
dedicated GPU can significantly speed up computations.
3. Internet Connectivity: Access to reliable internet to download data, access APIs, and collaborate if
using cloud-based services or remote data sources.
4. Storage: Sufficient storage space to store datasets and processed information.
Page | 14
Data Analytics and Decision Making
Data analytics plays a pivotal role in decision-making across various domains, including business, health-
care,and public policy. Here's a breakdown of how data analytics influences decision-making:
1. Data-Driven Insights:
- Understanding Patterns: Data analytics involves exploring and analyzing vast datasets to iden-
tify trends,correlations, and patterns that might not be apparent otherwise. This helps decisionmakers
understand current scenarios better.
- Predictive Analysis: Advanced analytics techniques enable predictive modeling, forecasting fu-
ture trendsbased on historical data. This assists in proactive decision-making rather than reactive re-
sponses.
2. Informed Decision-Making:
- Evidence-Based Decisions: Data analytics provides evidence and facts to support decisionmaking.
Insteadof relying solely on intuition or assumptions, decision-makers have empirical data to guide
them.
- Risk Assessment: Analyzing data helps in assessing risks associated with various choices.
It allowsdecision-makers to weigh potential outcomes and make informed choices that mitigate risks.
3. Optimizing Operations:
Page | 15
Steps involved in the DataAnalytics Project
Our project involves analyzing and visualizing COVID-19 data using Python or R programing lan-
guage andrelevant packages like ggplot2. Here are the steps we can consider building upon this project:
**Step 1: Data Acquisition and Preprocessing**
1. Data Collection: Gather updated COVID-19 datasets from reliable sources like government health de-
partments or organizations providing global health statistics.
2.Data Cleaning: Perform data cleaning tasks, including handling missing values, formatting dates, and
ensuring data consistency.
3. Exploratory Data Analysis (EDA): Conduct exploratory analysis to understand the structure and
contents of the data, checking for outliers or anomalies.
1. Time Series Visualization: Use ggplot2 or similar packages to create time series plots showing the
cumulative confirmed cases over time both globally and for specific regions or countries.
2. Trend Analysis: Analyze trends by fitting trend lines (linear regression or other methods) to visual-
ize the growth rate of cases, focusing on regions like China versus the rest of the world.
3. Geographical Visualization: Utilize geographical data to plot COVID-19 spread on maps using
tools likeggplot2, possibly comparing cases across different countries or regions.
4. Comparative Analysis: Compare the cumulative cases between China and other countries using
statistical methods or visualization techniques.
Page | 16
5. Top Countries Analysis: Identify and visualize the top countries with the highest total cases, possi-
bly using bar plots or other visualizations to represent the data.
6. Interpret Findings: Analyze the visualizations, trends, and comparisons to derive meaningful
insights about the spread and impact of COVID-19.
9. Documentation: Document your code, methodologies used, and any assumptions made during
analysis for future reference or sharing.
By following these steps, you can effectively conduct data analytics and visualization of COVID-19 patterns,
gaining insights into the spread and impact of the virus across different regions or countries.
Libraries Used:
readr: This library is used for reading and parsing data files, particularly CSV files in this case (read_csv
function).
ggplot2: A powerful data visualization library in R used for creating a wide variety of graphs
and plots (ggplot, geom_line,
geom_smooth,geom_vline, geom_text, scale_y_log10 functions).
dplyr: This library is employed for data manipulation tasks like filtering, grouping, summarizing, and ar-
ranging data (group_by, summarize, mutate, filter, top_n functions.
Page | 17
UML Diagrams
Fig:1.1
Activity Diagram
Fig:1.2
Page | 18
Project code:
library(readr)
library(ggplot2)
library(dplyr)
confirmed_cases_worldwide <- read_csv("datasets/confirmed_cases_worldwide.csv") con-
firmed_cases_worldwideggplot(confirmed_cases_worldwide, aes(date, cum_cases))
+geom_line()+ylab("Cumula ve confirmed cases") confirmed_cases_china_vs_world
<read_csv("datasets/confirmed_cases_china_vs_world.csv")
glimpse(confirmed_cases_china_vs_world)
plt_cum_confirmed_cases_china_vs_world <- ggplot(confirmed_cases_china_vs_world) +
geom_line(aes(date, cum_cases, group = is_china, color = is_china)) + ylab("Cumula ve confirmed cases")
plt_cum_confirmed_cases_china_vs_world who_events <tribble( ~ date, ~ event,
"2020-01-30", "Global health\nemergency de-
clared", "202003-11","Pandemic\ndeclared",
"2020-02-13", "China repor ng\nchange" ) %>% mutate(date = as.Date(date))
plt_cum_confirmed_cases_china_vs_world +
geom_vline(aes(xintercept = date), data = who_events, linetype = "dashed") +
geom_text(aes(date, label = event), data = who_events, y
= 1e5)china_a er_feb15 <confirmed_cases_china_vs_world %>
% filter(is_china == "China", date >= "2020-02-15")
# Using china_a er_feb15, draw a line plot cum_cases vs. date # Add a smooth trendline using lin-
ear regression, no
error bars ggplot(china_a
er_feb15, aes(date,
cum_cases)) + geom_line()
+
Page | 19
<- ggplot(not_china, aes(date, cum_cases)) + geom_line() +
Page | 20
geom_smooth(method = "lm", se = FALSE) + ylab("Cumula ve confirmed cases")
# See the result plt_not_china_trend_lin
# Modify the plot to use a logarithmic scale on the y-axis plt_not_china_trend_lin +
scale_y_log10()
# Run this to get the data for each country confirmed_cases_by_country <-
read_csv("datasets/confirmed_cases_by_country.csv") glimpse(con-
firmed_cases_by_country)
# Group by country, summarize to calculate total cases, find the top 7 top_countries_by_total_cases
<confirmed_cases_by_country %>% group_by(country) %>% sum-
marize(total_cases = max(cum_cases)) %>% top_n(7, total_cases) #
See the result
top_countries_by_total_cases
Page | 21
Outputs:
Fig 1.4
Fig:1.5
Page | 22
fig 1.6
Page | 23
fig 1.7
Page | 24
CONCLUSION:
This project delved into COVID-19 data using data analytics and visualization tools. Through rigorous anal-
ysis, it revealed diverse patterns and trends, highlighting the virus's differential impact across regions. Visual
representations served as crucial tools, aiding decision making and fostering public awareness. Insights
gleaned from trend analyses and statistical models offered a glimpse into the pandemic’s trajectory, support-
ing proactive measures. The project's visualizations empowered stakeholders and communities by presenting
clear, accessible information, contributing to the collective fight against the pandemic. While a snapshot in
an ongoing narrative, this project underscores the vital role of data analytics in understanding and respond -
ing to the dynamic nature of the COVID-19 crisis. Continued data-driven strategies remain pivoting.
Page | 25
BIBLIOGRAPHY
[1] Grady Booch, James Rumbaugh, Ivar Jacobson. The Unified Modeling Language User Guide.Addi-
son Wesley, Reading, Mass., 1999.
[2] www.coderanch.com.
[3] www.w3schools.com.
[4] www.wikipedia.org.
Page | 26
APPENDIX-A: UNIFIED MOD-
ELING LANGUAGE
The UML also contains organizational constructs for arranging models into packages that per-
mit software teams to partition large systems into workable pieces, to understand and control
dependencies among the packages, and to manage the versioning of model units in a complex
development environment. It contains constructs for representing implementation decisions
and for organizing run-time elements into components.
UML is not a programming language. Tools can provide code generators from UML into a
variety of programming languages, as well as construct reverse engineered models from ex-
isting programs. The UML is not a highly formal language intended for theorem proving. The
Unified Modelling Language (UML) is a general-purpose visual modelling language that is
used to specify, visualize, construct, and document the artifacts of a software system. It cap-
tures decisions and understanding about systems that must be constructed. It is used to under-
stand, design, browse, configure, maintain, and control information about such systems. It is
intended for use with all development methods, lifecycle stages, application domains, and
media. The modelling language is intended to unify past experience about modelling tech-
niques and to incorporate current software best practices into a standard approach. UML in-
cludes semantic concepts, notation, and guidelines. It has static, dynamic, environmental, and
organizational parts. It is intended to be supported by interactive visual modelling tools that
have code generators and report writers. The UML specification does not define a standard
process but is intended to be useful with an iterative development process. It is intended to
support most existing object oriented development processes. The UML captures information
about the static structure and dynamic behaviour of a system. A system is modelled as a col-
lection of discrete objects that interact to perform work that ultimately benefits an outside
user. The static structure defines the kinds of objects important to a system and to its imple-
mentation, as well as the relationships among the objects. The dynamic behaviour defines the
history of objects over time and the communications among objects to accomplish goals.
Modelling a system from several separate but related viewpoints permits it to be understood
for different purposes.
Page | 27
Page | 28
Page | 29
APPENDIX B: ABSTRACT
Batch No:
Roll No Name
Title
20311A1205 P. Devender Reddy Visualization of Covid-19 using Data Analytics
ABSTRACT
In the dynamic landscape of today's information-driven era, organizations face unprecedented volumes
of data generated from diverse sources. The strategic utilization of this vast and varied data has be-
come imperative forinformed decision-making. This abstract explores the pivotal role of data analytics
in driving intelligent decision-making across various sectors. The paper delves into the fundamental
principles of data analytics, encompassingdescriptive, diagnostic, predictive, and prescriptive analytics
techniques. It highlights the transformative powerof data analytics in uncovering patterns, trends, and
insights from structured and unstructured data, thus enablingorganizations to gain a competitive edge.
Moreover, the abstract emphasizes the integration of advanced technologies, such as machine learning
and artificial intelligence, in enhancing the capabilities of data analytics. These technologies empower
organizationsto extract meaningful insights from complex datasets, automate decision processes, and
anticipate future trends,fostering a more proactive and strategic approach to decision-making. Further-
more, the abstract addresses the challenges associated with data analytics implementation, including
data quality, privacy concerns, and the needfor skilled professionals.
The abstract concludes by underscoring the far-reaching impact of data analytics on organizational ef-
ficiency, innovation, and agility. As data analytics continues to evolve, its role in driving intelligent
decision-making will undoubtedly be a cornerstone for sustainable growthand competitive advantage
in the digital era.
Page | 30
APPENDIX C: CORRELATIONBETWEEN THE SUMMER INDUSTRY
Page | 29
APPENDIX D: DOMAIN OF INTERNSHIPAND NATURE OF INTERNSHIP
Student 1: P. DEVENDER
REDDY
Page | 30
Table 3: Domain of the Project/ Internship work (Please ck √ Appropriate
for your project)
Domain of
the
Project
Bat Artificial Computer Data Ware- Cloud Software
ch Intelligence Networks housin g, Computing, Engineering,
N ,Information Data Internet of Image Pro-
o:16 Title Mining Things cessing
Machine Security, Cy- Big
Learning bersecurity Data
And Deep Analytic
Learning s
6 VISUALIZA ✓
TION OF
COVID-19
PATTERNS
USING
DATA
ANALYTIC
S
Page |