Module 1- Introduction to Data
analytics and life cycle
• Data Analytics Lifecycle overview:
• Key Roles for a Successful Analytics
project
• Analytics, Background
• Overview of Data Analytics Lifecycle
Project
Why – Data Analytics
• Activities around the globe generate vast volumes of
data daily, in the form of log files, web servers,
transactional data, and various other data.
• In addition to this, social media websites also
generate enormous amounts of data.
• There is a need to use all of this generated data to
derive value out of it and make impactful decisions.
• Data analytics is used to drive this purpose.
• Data analytics is the process of exploring & analyzing
large datasets to find hidden patterns, unseen trends,
discover correlations, & derive valuable insights to
Data Analytics – Data Science
• Data analytics is the • Data science is the field, which
science of analyzing focuses on everything related to
raw datasets in order data cleansing, preparation, and
to derive analysis.
conclusions. • It also copes with both unstructured
and structured data.
• It enables us to
• It is the integration of mathematics,
discover patterns in
statistics, programming, data
the raw data and capturing, problem-solving, the
draw valuable capability of looking at problems in a
information from different way, as well as the things
them. like cleaning, preparing, and
Key Data Creation Statistics 2023
• 1GB of data can create 350,000 emails.
• 3.5 quintillion bytes of data is created every day.
• Skype has 3 billion minutes of calls per day.
• 5 billion Snapchat videos and photos are shared
per day.
• 333.2 billion emails are sent per day.
• 20% of people online watch online games.
• Revenue from Bing is over $7 billion.
• People spend $1 million per minute online.
Size of Data
Megabyte 10002 MB
Gigabyte 10003 GB
Terabyte 10004 TB
Petabyte 10005 PB
Exabyte 10006 EB
Zettabyte 10007 ZB
Yottabyte 10008 YB
How much data is created every day in 2023?
3.5 quintillion bytes of data is created every day.
Real Life Applications – Case Studies
1. Security
• Data analytics applications or, more specifically, predictive
analysis has also helped in dropping crime rates in certain
areas.
• In many major cities, historical and geographical data has been
used to isolate specific areas where crime rates could surge.
• On that basis, while arrests could not be made on a whim,
police patrols could be increased.
• Thus, using applications of data analytics, crime rates dropped
in these areas.
1. Security - India
• Inauguration of Crime and Criminal Tracking and Network
Systems (CCTNS) - Predictive Policing
• National Crime Records Bureau (NCRB) contains key information
which can be analysed to predict crime patterns and devise
prevention strategies.
• GIS Based Crime Mapping and Analysis: A Case Study of
Mudugiri Town Police Station Jurisdiction, Tumkur District,
Karnataka, India.
• https://www.sentinelassam.com/editorial/data-analysis-as-tool
2. Transportation
• Through data analytics it’s possible to improve vehicle performance, reduce
costs, improve processes, establish strategies, optimize routes and times,
and foresee and identify problems, among others.
• Transportation analytics takes a variety of data ecosystems, helping industry
leaders to use advanced analytical techniques such as machine learning,
Big Data and geospatial data to optimize business strategies in the sector.
• Predictive analytics in enterprises can answer questions such as “What is the
most efficient route to effective distribution?”
• Foreseeing from events that may affect transportation such as weather, road
closures, strikes, maintenance, traffic, risk zones and estimate the impact of
development projects to help identify an alternative project without
obstructing mobility.
2. Transportation India
• A mix of real-time traffic data, weather updates, and longer-term averages
such as time taken, average speeds, etc., are factored in to determine the
routes that will deliver maximum fuel and time efficiency. Intelligent route
modelling alone has the potential of reducing a sizable percentage of
emissions per vehicle per trip.
• Tamil Nadu used a web-enabled GIS-based Road Accident
Database Management System (RADMS), a ‘first of its kind’
system which maps road crashes, identifies the most crash-prone
hot spots, and pinpoints corrective action, and has now been
adopted as the template for replication at a national level for all
other states.
2. Transportation India
• Haryana Vision Zero (HVZ) was a programme launched in 2017, organised by the Haryana
Government, in collaboration with WRI India, NASSCOM and Raahgiri Foundation.
• Loosely modelled after ‘Vision Zero’ in Sweden, the project was aimed achieving zero road
fatalities in Haryana.
• Under HVZ, a team of road safety experts collaborated to analyse the data associated with
road fatalities in a particular district and then audit stretches to identify whether human
errors, engineering errors, or lack of infrastructure were the causes behind such incidents.
• The recommendations based on these analyses were sent to the public bodies concerned
to implement the changes required.
• Using the data became an effective way of identifying blackspots, organising road safety
awareness campaigns and introduce the necessary measures such as e-challaning and
building of dedicated pedestrian infrastructure in concerned areas that required them.
3. Advanced Personalization
• Millions of users utilize smartphones, smartwatches, and other
electronic devices worldwide. They all produce enormous amounts
of data. Businesses use this data to tailor different actions on the
product or app to increase sales. Based on user information and
behavior, personalization has a limitless future.
• For instance, shopping sites like Amazon and Flipkart provide
products depending on consumers' preferences, tastes, and genres.
The software includes data analysis and machine learning models
that automatically recognize user characteristics and show them
product preferences accordingly.
4. Healthcare
• 1. Medical Image Analysis
• 2. Genetics & Genomics
• 3. Drug Development
• 4. Virtual assistance for patients and
customer support
• 5. Precision Medicine
• ^. Prediction of new disease and spread
5. Smart Cities
6. Social media
Innovative Case Studies – Real Time Data
• When digging into data in search of insights, it's better to know what's going on right now
– rather than yesterday, last week, or last month. This is why real-time data is increasingly
becoming the most valuable source of information for businesses.
• Working with real-time data often requires more sophisticated data and analytics
infrastructure, which means more expense, but the benefit is that we’re able to act on
information as it happens.
• This could involve analyzing clickstream data from visitors to our website to work out
what offers and promotions to put in front of them, or in financial services, it could mean
monitoring transactions as they take place around the world to watch out for warning
signs of fraud.
• Social media sites like Facebook analyze hundreds of gigabytes of data per second for
various use cases, including serving up advertising and preventing the spread of fake
news.
Top 10 of 2022
• 1 Your recommendations
• 2 How does picture recognition work?
• 3 Data science has transformed gaming
• 4 What impact does it have on education?
• 5 Tracking locations
• 6 Advertising makes use of analytics
• 7 Your media needs are controlled by it
• 8 In what ways does it benefit sports?
• 9 It has improved healthcare
• 10 Banking is made easier with it
• 11. environmental, social, and governance (ESG) - Driving Sustainable
Innovations: Data Analytiucs for ESG Data Challenges
Data Analytics For Good
• Predicting Covid spread vaccine etc
• Fake Information identification
• Climate change, Disaster reporting
• Data Analytics for wildlife protection, ocean cleanup
etc (Distance Earth program using Digital Twins)
• Agriculture
Health Analytics
Precision medicine
Current Analytical Architecture
Current Analytical Architecture
• Data sources must be well understood
• EDW – Enterprise Data Warehouse
• From the EDW data is read by applications
• Data scientists get data for downstream analytics
processing
Emerging Data Ecosystem-New Analytics Approach
• Four main groups of players
– Data devices
• Games, smartphones, computers, etc.
– Data collectors
• Phone and TV companies, Internet, Gov’t, etc.
– Data aggregators – make sense of data
• Websites, credit bureaus, media archives, etc.
– Data users and buyers
• Banks, law enforcement, marketers, employers, etc.
Emerging Data Analytics Ecosystem
Key Roles for the
New Data Analytics Ecosystem
1. Deep analytical talent
– Advanced training in quantitative
disciplines – e.g., math, statistics,
machine learning
2. Data savvy professionals
– Savvy but less technical than
group 1
3. Technology and data enablers
– Support people – e.g., DB
admins, programmers, etc.
Recurring Data Analytics Activities
1. Reframe challenges and problems in the
system as analytics challenges
2. Design, implement, and deploy statistical
models and data mining techniques on Data
3. Develop insights that lead to actionable
recommendations
Profile of Data Analytics - 5 Main Skills
• Quantitative skill – e.g., math, statistics
• Technical aptitude – e.g., software
engineering, programming
• Skeptical mindset and critical thinking –
ability to examine work critically
• Curious and creative – passionate about
data and finding creative solutions
• Communicative and collaborative – can
articulate ideas, can work with others