Unit :- 3 DSBDA
Data Analytics Life cycle (all Phases)
Data Analytics Life Cycle
The Data Analytics Life Cycle is a step-by-step process used to solve
data-related problems using analytics. It helps transform raw data into
valuable insights for decision-making. It consists of six phases:
1. Discovery
● This is the first phase, where the goal is to understand the
problem.
● Involves identifying:
○ Business goals
○ Key stakeholders
○ Available resources
● Analysts gather domain knowledge, define objectives, and assess
tools and technologies.
📌 Example: A bank wants to reduce customer churn. In this phase,
analysts study why customers are leaving.
2. Data Preparation
● Also called Data Preprocessing or Data Wrangling.
● In this phase, raw data is:
○ Collected
○ Cleaned
○ Integrated from multiple sources
● Data is transformed into a usable format for analysis.
📌 Example: Removing missing values, correcting data types, and
merging data from different departments.
3. Model Planning
● This phase focuses on deciding:
○ Which analytical techniques to use
○ Which algorithms or models might fit best
● Tools like R, Python, or SQL can be used to explore data visually
and statistically.
📌 Example: Choosing regression for prediction, or clustering to group
similar customers.
4. Model Building
● Actual creation of models happens here using:
○ Machine learning
○ Statistical methods
● Training data is used to build models; test data is used to
evaluate them.
📌 Example: Building a classification model to predict if a customer will
churn.
5. Communicate Results
● Findings are shared with stakeholders using:
○ Charts
○ Dashboards
○ Reports
● Helps business users understand the impact and take action.
📌 Example: A dashboard showing which customer segments are at
highest risk of leaving
6. Operationalize
● Final phase where the model is:
○ Deployed into production
○ Integrated into business operations
● Also involves creating final reports and monitoring the model over
time.
📌 Example: The churn prediction model is integrated into the CRM
system to alert sales teams.
✅ Conclusion
The Data Analytics Life Cycle ensures a structured and effective way
to convert data into actionable insights. Each phase plays a crucial role
in solving real-world business problems with data science.
Driving Data Deluge
The term "Data Deluge" refers to the rapid and massive increase in
data being generated from various sources every second. This
explosion of data is also known as Big Data.
There are several key drivers that contribute to this data deluge:
1. Increase in Digital Devices
● Devices like smartphones, laptops, tablets, and wearables
constantly generate data.
● These devices are connected to the internet and produce
real-time data such as location, messages, videos, etc.
📌 Example: A smartphone using GPS, fitness tracking, and apps all at
once.
2. Internet of Things (IoT)
● Billions of devices like smart TVs, smartwatches, cars, home
appliances, and industrial machines are connected to the internet.
● These devices collect, transmit, and store huge amounts of
sensor and machine data.
📌 Example: A smart refrigerator monitoring temperature and sending
alerts.
3. Social Media Platforms
● Sites like Facebook, Instagram, Twitter, YouTube generate
vast data through posts, comments, likes, shares, and videos.
● This data is unstructured, large in volume, and produced at high
speed.
📌 Example: 500+ hours of video are uploaded to YouTube every
minute.
4. Cloud Computing
● Cloud platforms offer unlimited storage and processing power,
encouraging organizations to store and analyze more data than
ever before.
📌 Example: Amazon Web Services (AWS), Google Cloud, and
Microsoft Azure hosting enterprise data.
5. Digital Transactions
● Online activities like banking, e-commerce, ticket bookings,
online payments generate structured data.
● This includes timestamps, transaction IDs, user data, etc.
📌 Example: Every swipe of a credit card creates a new data record.
6. Multimedia Content
● High-definition videos, images, and audio shared online consume
large storage.
● Content from cameras, CCTV, drones, and video streaming adds to
the data deluge.
📌 Example: Netflix streaming or Zoom video meetings across the
world.
7. Business and Government Data
● Enterprises and governments collect vast amounts of data for
decision-making.
● This includes customer data, healthcare records, census data,
security footage, etc.
📌 Example: A city using data from sensors and traffic cameras to
manage congestion.
✅ Conclusion
The data deluge is driven by the growing number of connected
devices, online activity, and digital services. Managing and analyzing
this massive data is crucial for organizations to stay competitive, make
decisions, and innovate.
Data Science (Business Intelligence And Data Science)
✅ Definition of Data Science:
Data Science is the process of extracting meaningful insights and
knowledge from structured and unstructured data using mathematics,
statistics, machine learning, programming, and domain knowledge.
🔍 What is Business Intelligence (BI)?
Business Intelligence (BI) involves:
● Collecting, processing, and analyzing historical data
● Generating reports, dashboards, and visualizations
● Helping organizations monitor performance and make decisions
📌 BI answers questions like: "What happened?" and "Why did it
happen?"
🤖 What is Data Science?
Data Science is a more advanced approach that includes:
● BI techniques plus
● Predictive analytics, machine learning, and AI
● Used for forecasting, pattern detection, and decision
automation
📌 Data Science answers: "What will happen?" and "What should we do
next?"
🆚 Difference Between Business Intelligence and Data Science:
Feature Business Data Science
Intelligence (BI)
Data Type Structured Structured + Unstructured
Focus Descriptive Predictive and Prescriptive (Future)
(Past/Present)
Tools Used SQL, Excel, Tableau Python, R, Hadoop, Machine Learning
Techniques Reporting, Statistics, AI, ML, Data Mining
Dashboards
Main "What happened?" "What will happen?" and "Why?"
Question
User Type Business Analysts Data Scientists, Analysts
✅ How They Work Together:
● BI helps in understanding past performance.
● Data Science builds on BI to predict trends, automate
processes, and optimize decisions.
● Many organizations use BI for reporting and Data Science for
advanced analytics.
📌 Example:
● BI shows sales dropped last month.
● Data Science predicts sales for next month and suggests
marketing strategies.
📝 Conclusion:
Business Intelligence and Data Science are both essential parts of
modern data-driven decision-making.
● BI helps in understanding the "what and why" of past data.
● Data Science provides deeper insights and helps in predicting and
optimizing the "future."
Big Data (Characteristics,Sources)
Big Data refers to extremely large and complex datasets that
traditional data processing tools cannot handle efficiently. It includes
data from various sources, in various formats, and at very high speed.
✅ Characteristics of Big Data (The 5 Vs)
Big Data is typically described using 5 key characteristics, often
called the 5 Vs:
1. Volume
● Refers to the huge amount of data generated every second.
📌 Example: Facebook generates terabytes of data daily.
2. Velocity
● Describes the speed at which data is generated and processed.
📌 Example: Stock market data changes every second.
3. Variety
● Refers to the different types of data:
○ Structured (tables)
○ Semi-structured (XML, JSON)
○ Unstructured (videos, emails, images)
📌 Example: Tweets, videos, logs, GPS data, etc.
4. Veracity
● Refers to the trustworthiness and quality of the data.
● Big data may have inconsistencies, noise, or errors.
📌 Example: User-entered data may be inaccurate or missing.
5. Value
● The most important "V" — extracting meaningful insights from
data to make decisions and create business value.
📌 Example: Predicting customer behavior to increase sales.
🔍 Sources of Big Data
Big Data comes from multiple sources, including:
1. Social Media
● Platforms like Facebook, Instagram, Twitter generate huge
amounts of unstructured data: images, videos, comments.
2. Sensor Data / IoT Devices
● Smart devices, GPS, wearables, home automation systems collect
real-time sensor data.
3. Transaction Data
● Online purchases, bank records, ATM transactions, e-commerce
data — mostly structured.
4. Machine Data / Logs
● Web servers, network devices, system logs provide valuable data
for analysis.
5. Multimedia Content
● Audio, video, CCTV, satellite images — data with large volume and
storage needs.
6. Government and Public Data
● Weather reports, census data, healthcare records, public
services generate bulk data.
✅ Conclusion
Big Data is defined by its Volume, Velocity, Variety, Veracity, and
Value, and is sourced from social media, sensors, transactions, logs,
and more. Managing and analyzing this data helps in decision-making,
innovation, and improving services across industries.
Descriptive , Diagnostic , Predictive Analysis
Types of Data Analysis
Data Analytics is broadly divided into different types based on the
purpose and nature of insights. The three most commonly used types
are:
🔷 1. Descriptive Analysis – “What happened?”
● This type of analysis helps in summarizing and understanding
past data.
● It answers what has occurred in a business or system using
historical data.
● Used for reporting, dashboards, and KPIs.
✅ Key Features:
● Based on historical data
● Uses graphs, charts, and tables
● Helps monitor performance and trends
📌 Example: Monthly sales report, website traffic, attendance reports.
🔷 2. Diagnostic Analysis – “Why did it happen?”
● It goes one step beyond descriptive analysis.
● Focuses on finding the root cause of a problem or event.
● Uses data mining, correlation analysis, and drill-down
techniques.
✅ Key Features:
● Explains causes and reasons
● Involves comparison and data relationships
● Helps in understanding failures or successes
📌 Example: Analyzing why sales dropped in a particular region last
month.
🔷 3. Predictive Analysis – “What is likely to happen?”
● Uses historical data + statistical models + machine learning to
predict future outcomes.
● Helps in forecasting trends, behaviors, and risks.
✅ Key Features:
● Uses predictive models and algorithms
● Helps in proactive decision-making
● Improves business planning
📌 Example: Predicting customer churn, forecasting demand, or
detecting fraud.
📊 Comparison Table:
Type Main Focus Techniques Example
Question Used
Descriptive What Past data Reporting, Monthly sales report
happened? summary dashboards
Diagnostic Why did it Root Drill-down, Analyzing customer
happen? cause correlation, complaints
analysis queries
Predictive What will Forecasti ML models, Predicting product
happen? ng future regression, demand
time series
✅ Conclusion
These three types of analysis—Descriptive, Diagnostic, and
Predictive—play a vital role in the Data Analytics Life Cycle.
● Descriptive tells what happened,
● Diagnostic tells why it happened, and
● Predictive tells what is likely to happen next.
Together, they help businesses make smarter, data-driven
decisions.