0% found this document useful (0 votes)

10 views11 pages

BDA notes part 1

Big Data refers to vast and complex datasets that traditional data processing tools struggle to handle, with 90% of today's data generated in the last three years. It is crucial for organizations as it enables better decision-making, innovation, and efficiency across various sectors, including healthcare and education. The document outlines the characteristics, benefits, challenges, and analytics types of Big Data, emphasizing its growing importance in the modern data-driven landscape.

Uploaded by

dharnamittal07

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views11 pages

BDA notes part 1

Uploaded by

dharnamittal07

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 11

Introduction to Big Data:

Big Data is a collection of data that is huge in volume, growing exponentially with time. Data
in Peta bytes (1015 bytes) is called Big Data. It is stated that almost 90% of today’s data has
been generated in the past 3 years.

Big Data is bringing about changes in our lives because it allows diverse and heterogeneous
data to be fully integrated and analysed to help us make decisions.

Big Data is the term for collection of data sets so large and complex that it becomes difficult
to process using on-hand database system tools or traditional data processing applications.

Examples:

 90% of the world’s data has been created in last two years.
 Walmart handles more than 1 million customer transactions every hour.
 Facebook stores, accesses and analyses 30+ Peta bytes of user generated data.
 230+ millions of tweets are created every day.

Why Big Data is so important?

Big data is very important for organizations or companies varying from medium-size to large-
size because it enables them to gather, store, manage and manipulate extremely large
amounts of data, extremely high velocity of data and extremely wide variety of data.

 At the right speed.

 At the right time.
 To get the required business value.

Difference b/w Traditional data and Big data:

Traditional data Big Data

 Here the data is “structured” data.  Here the data is “Unstructured or
Semi-structured” data
 The size of the data is very small.  The size is more than the traditional
data size.
 Here the data is centralized.  Here the data is distributed.
 It is easy to work or manipulate.  It is difficult to handle the data.
 Normal system configuration is  High system configuration is
sufficient to process. required to process the data.
 Traditional database tools are  Special kind of tools are required.
enough.
 Normal functions are enough to  Requires special kind of functions to
manipulate the data. manipulate the data.
 SQL server, Oracle designed for  Hadoop, MapReduce
structured data.

5 V’s of Big Data:

1. Volume:
 Refers to the amount of data generated and stored.
 Big Data deals with huge datasets ranging from terabytes to petabytes and beyond.
 Example: Social media platforms like Facebook generate hundreds of terabytes of data
daily from posts, images and videos.
2. Velocity:
 Represents the speed at which data is generated, collected and processed.
 Many applications require real-time or near real-time data processing for quick
decision-making.
 Example: Stock market trading systems process millions of transactions per second.
3. Variety:
 Denotes the different types of data formats (structured, semi-structured,
unstructured).
 Traditional databases handle structured data, while big data includes text, images,
videos, logs, sensor data etc.
 Example: Healthcare data comes from patient records, MRI scans, wearable devices
and doctor’s notes.
4. Veracity:
 Refers to the quality, accuracy and reliability of data.
 Since data comes from multiple sources, it may be incomplete, inconsistent or
biased, requiring cleansing and validation.
 Example: Fake news on social media can lead to misinformation, affecting public
perception.
5. Value:
 Represents the usefulness and insights derived from data.
 Collecting data is meaningless unless it provides business benefits, improves
efficiency or drives innovation.
 Example: E-commerce platforms like Amazon use big data to recommend products
based on user behaviour, increasing sales.
Benefits of Big Data:

1. Better decision making:

Rather than anonymously making decisions, companies are considering big data analytics
before concluding to any decision. Big Data Analytics is that it has boosted the decision-
making process to a great extent.

2. Big Data in greater innovations:

Big Data Analytics is used by various firms to create new products and services for their
customers. Companies through big data, analyse different customer’s opinions about their
products and how their product is perceived.
3. Big Data in Educational Sector:

Big Data benefits educational sector in managing the data related to students. Analysis of
the capabilities of students based on the data can help teachers in nurturing their future in a
better way.

4. Big Data in product price optimization:

Through big data, companies analyse the prices that have yielded the maximum profits to
them under various historic market conditions. Through big data solutions, they set their
product’s price according to the customer’s willingness to pay under different circumstances.

5. Big Data in Recommendation Engine:

Online searching has been made easy with the help of recommendation engines by using Big
Data Analytics. Companies analyse every customer’s data and then recommend them
accordingly. These recommendations are majorly based on the activities the customer did
when he last visited the platform and his real-time activities.

6. Big Data in Healthcare Industry:

Big data enhances overall operational efficiency of healthcare companies. Big Data Analytics
would allow them to find a better cure for a disease by recognizing unknown connections
and hidden patterns.

7. Fraud Detection:

Customer information can be analysed to predict general trends and spot fraudulent
behaviour.

8. Agriculture:

Big Data provides granular data on rainfall patterns, water cycles and enables farmers to
make smart decisions such as what crops to plant for better profitability and when to
harvest.

Types of Big Data Analytics:

Big Data Analytics is the use of advanced analytical techniques against very large, diverse
datasets that include structured, semi-structured and unstructured data from diff. sources
and diff. size.

1. Descriptive Analysis:

As the name suggests, description is there. Explains what is happening based on incoming
data.

e.g. Details filled in the form are descriptive.

2. Predictive Analysis:
As the name suggests, prediction is there. Forecasts what might happen in the future based
on data trends and patterns.

3. Prescriptive Analysis:

Determines the best course of action based on data insights. It goes beyond prediction by
recommending actions to achieve desired outcomes.

e.g. Google’s self-driving cars (analyses sensor data, traffic patterns and road conditions to
make real-time driving decisions. If an obstacle is detected, the system prescribes actions
like slowing down, changing lanes, or stopping to ensure safety).

4. Diagnostic Analysis:

Diagnose and detailed information is there in this analysis.

Big Data Architecture:

It is designed to handle the ingestion processing and analysis of data that is too large or
complex for traditional database systems.

Ingestion: Used in data capture. Collects diff. types of data from diff. sources or platforms.
Analyse data whether that is structured or unstructured and where data comes from.

Internal data: Built-in data systems.

External data: Data in pendrive/external sources.

Streaming data: Data without storage but live data is there.

Data Storage: It is used to store data whereas real-time message ingestion is used to store
real data.

Batch Processing: Data stored is shared with batch processing and divided into different
batches. It passes the data to the analytical data store for analysis before forwarding it for
further processing or insights.
e.g. When a 50 MB video recording from a camera is uploaded as a WhatsApp status, it is
automatically compressed to 5-6 MB due to processing in the analytical data store. This
happens because an algorithm or compression technique is applied which reduces the file
size while maintaining acceptable quality.

Machine Learning: It processes both batch and streaming data. It analyses data in batches at
scheduled intervals and also processes streaming data for instant insights.

During streaming, if internet speed drops or data runs out, the system automatically lowers
video quality to ensure smooth playback. Afterward, during photo upload, if a photo fails but
shows as "processing," data analytics and reporting tools help track details like device,
location and upload time.

Orchestration: Automates workflows (eliminates the need for manual intervention), ensures
that the tasks run in the correct sequence, coordination and management.

Big Data Components:

1. Data Capture: It refers to the process of collecting data from a variety of sources. This
includes everything from social media to sensor reading.
2. Data Storage: It is a process of storing the data in a way that makes it accessible for the
future analysis.
3. Data Processing: This is where algorithms are used to analyse the data and extract
insights.
4. Data Visualization: It is a process of representing the data in a way that is easy for
humans to understand.
e.g. Flow chart problem, use-case diagram, graph or chart is there in data visualization.

Challenges of Big Data:

1. Quick Data Growth: The amount of data being stored in data centers and databases of
companies is increasing rapidly. As these datasets grow exponentially with time, it gets
extremely difficult to handle.
2. Storage: Such large amount of data is difficult to store and manage by organizations
without any appropriate tool and technology.
3. Syncing across data sources: When organization imports data from different data
sources, data from one source might not be upto date as compared to data from another
source.
4. Security: Securing these huge datasets is one of the daunting challenges of Big Data.
Some big data stores can be attractive targets for hackers or advanced persistent threats.
5. Unreliable Data: Big data cannot be completely accurate and may contain some
redundant or incomplete data.
6. Miscellaneous Challenges: More challenges exist such as generating insights in timely
manner or recruiting and retaining big data professionals.
Data Stream Management System (DSMS):

It is a specialized system designed to process and manage continuous data streams in real-
time. Unlike traditional database management systems (DBMS) that store and process static
data, a DSMS continuously ingests, analyses, and queries dynamic data streams.

Key features:

 Handles continuous data streams in real-time with low latency.

 Queries run continuously on incoming data instead of one-time execution.
 Ensures reliable processing even in cases of failures.
 Can process large-scale data streams efficiently.
 Works with various data sources like IoT devices, social media feeds, and transaction
logs.

Components:

1. Data Stream: A continuous flow of data coming from sources like sensors, social media,
or transactions. It never stops and keeps updating in real-time.
2. Stream processor: The brain of the DSMS. It processes incoming data, applies filters,
aggregates information, and runs computations in real-time.
3. Standing queries: Queries that run continuously on streaming data, updating results as
new data arrives. Example: A query that always shows the average temperature from
sensors.
4. Adhoc queries: One-time queries that analyse the current data stream. Example: A user
asks, "What was the peak website traffic in the last hour?"
5. Archival storage: A place where old data is stored permanently for historical analysis and
backup. Example: A database keeping records of all financial transactions.
6. Limited working storage: A small temporary memory space used to process real-time
data, as storing everything is impossible. Example: Only keeping the last 10 minutes of
sensor readings to detect trends.
Drivers of Big Data:

Big data is driven by several key factors that make it grow and become more important.

 More Data Sources: Every day, people and machines create huge amounts of data
through social media, online shopping, smart devices, and sensors. The more sources we
have, the bigger the data gets.
 Faster Internet & Technology: With better internet speeds and advanced technologies
like cloud computing, data can be collected, stored, and processed quickly.
 Cheaper Storage: Storing large amounts of data used to be expensive, but now it's much
cheaper, allowing companies to keep and analyse more information.
 Artificial Intelligence (AI) & Machine Learning: AI systems learn from big data,
improving their accuracy and making predictions, which in turn drives the need for even
more data.
 The Internet of Things (IoT): Smart devices like fitness trackers, home assistants, and
self-driving cars are constantly generating data, adding to the big data explosion.

Data Stream Models:

A data stream model is a way to handle and process continuous, fast-flowing data in real
time. Unlike traditional databases, where data is stored and then analysed, data stream
models analyse data as it arrives.

Types of Data Stream Models:

1. Time-Based Model:

 Data is processed based on time intervals (e.g., every 10 seconds).

 Example: Stock market price updates every second.

2. Count-Based Model:

 Processes data after receiving a fixed number of items.

 Example: Analysing customer reviews after every 100 entries.

3. Sliding Window Model:

 Keeps a limited amount of recent data for analysis.

 Example: Monitoring website visitors in the last 10 minutes.

4. Tumbling Window Model:

 Divides data into fixed chunks and processes each batch separately.

 Example: Analysing sales every hour without overlap.

5. Sketch-Based Model:
 Uses approximations to handle large data streams efficiently.

 Example: Estimating the number of unique visitors to a website.

Streaming Methods:

Streaming methods are techniques used to process and analyse continuous data streams in
real-time. Instead of storing data first and then analysing it, these methods handle data as it
arrives.

Types of Streaming Methods:

1. Batch Processing:

o Data is collected over a period and then processed together.

o Example: A company analyses daily sales reports at midnight.

2. Real-Time (Event-Driven) Processing:

o Data is processed as soon as it arrives.

o Example: Fraud detection in banking transactions instantly flags suspicious activity.

3. Micro-Batch Processing:

o A mix of batch and real-time processing where small chunks of data are processed
frequently.

o Example: Social media analytics updating every few minutes.

4. Window-Based Processing:

o Processes data within a specific time or count-based window.

o Example: Monitoring website traffic in the last 10 minutes.

Data Synopsis:

Data synopsis is a technique used to create a small, summarized version of large data sets. It
helps in quickly analysing and processing data without storing or handling the full dataset.
This is especially useful in real-time data streams, where data is too large to store entirely.

Why is Data Synopsis Important?

 Saves storage space by keeping only key information.

 Speeds up data analysis without needing full data.

 Helps in real-time decision-making for large-scale systems.

Types of Data Synopsis:

1. Sampling:

o Takes a small portion of data to represent the whole dataset.

o Example: Checking 100 customer reviews instead of 1 million.

2. Sketching:

o Uses mathematical techniques to estimate data properties.

o Example: Estimating the number of unique visitors on a website without storing all
IP addresses.

3. Histogram:

o Divides data into ranges and counts how many values fall into each range.

o Example: Tracking the number of customers in different age groups.

4. Wavelet Transform:

o Compresses data while keeping important patterns.

o Example: Identifying trends in stock market prices over time.

5. Sliding Windows:

o Keeps only the most recent data for analysis.

o Example: Monitoring temperature readings from sensors in the last 10 minutes.

Summarization Techniques:

When dealing with large amounts of data, it’s not always possible to store or analyse
everything. These techniques help by reducing data while keeping the most important
information.

1. Sampling means selecting a small part of the data that represents the whole dataset.
Instead of analysing every piece of data, we work with a smaller, manageable sample.

🔹 Why Use Sampling?

 Saves time and storage.

 Speeds up data processing.

 Works well when full data analysis isn’t necessary.

🔹 Example:
Imagine a company receives 1 million customer reviews. Instead of analysing all, they
randomly pick 10,000 reviews to understand customer sentiment.
2. Filtering removes irrelevant or unnecessary data, keeping only what is important.

🔹 Why Use Filtering?

 Reduces noise and irrelevant information.

 Helps focus on useful data.
 Improves accuracy of analysis.

🔹 Example:
A weather monitoring system collects temperature, humidity, and wind speed data. If a
researcher is only interested in temperature, they filter out the other data.

LLM Cheat Sheetpdf
No ratings yet
LLM Cheat Sheetpdf
7 pages
ATAP Sourcing Metrics Whitepaper
No ratings yet
ATAP Sourcing Metrics Whitepaper
23 pages
CS5228 Project 2 Twitter Sentiment Analysis Group No.: 29: 1 Problem Statement
No ratings yet
CS5228 Project 2 Twitter Sentiment Analysis Group No.: 29: 1 Problem Statement
15 pages
BDA Notes
No ratings yet
BDA Notes
96 pages
Sem Csen1301
No ratings yet
Sem Csen1301
12 pages
UNIT_1 BDA
No ratings yet
UNIT_1 BDA
14 pages
Unit 1 Big Data Analytics Full
No ratings yet
Unit 1 Big Data Analytics Full
29 pages
Seminar Report Alisha
No ratings yet
Seminar Report Alisha
22 pages
DBMS Unit1
No ratings yet
DBMS Unit1
30 pages
Chapter 1
No ratings yet
Chapter 1
21 pages
BD 1
No ratings yet
BD 1
15 pages
Big Data Ashish
No ratings yet
Big Data Ashish
7 pages
1.big Data and Its Importance
No ratings yet
1.big Data and Its Importance
17 pages
Unit 1 and Unit 2 notes bda
No ratings yet
Unit 1 and Unit 2 notes bda
11 pages
Unit 1_BDS_DS307
No ratings yet
Unit 1_BDS_DS307
47 pages
big data notes
No ratings yet
big data notes
89 pages
Big Data Analysis by deshbandhu
No ratings yet
Big Data Analysis by deshbandhu
368 pages
Chapter_1_ffd48bbc461e45cfa49fe08c0fbf7c2e_1712934164765
No ratings yet
Chapter_1_ffd48bbc461e45cfa49fe08c0fbf7c2e_1712934164765
18 pages
Now To Be Data
No ratings yet
Now To Be Data
16 pages
BDA U1 copy
No ratings yet
BDA U1 copy
78 pages
Module 6_Big Data and NOSQL
No ratings yet
Module 6_Big Data and NOSQL
63 pages
What's is Big D-WPS Office
No ratings yet
What's is Big D-WPS Office
3 pages
BD unit 1
No ratings yet
BD unit 1
5 pages
Big Data
No ratings yet
Big Data
16 pages
Introduction To Bda
No ratings yet
Introduction To Bda
67 pages
Big Data Analytics
No ratings yet
Big Data Analytics
45 pages
ETB 1 (Big data)
No ratings yet
ETB 1 (Big data)
28 pages
UNIT-1:Overview of Big Data
No ratings yet
UNIT-1:Overview of Big Data
10 pages
BDA UNIT-1 NOTES
No ratings yet
BDA UNIT-1 NOTES
10 pages
BDA-1st unit
No ratings yet
BDA-1st unit
39 pages
Bid Data Analytics - Google Docs
No ratings yet
Bid Data Analytics - Google Docs
5 pages
Microsoft Word - Lecture 1.Docx
No ratings yet
Microsoft Word - Lecture 1.Docx
55 pages
BD U-1 (Anupam Sir)
No ratings yet
BD U-1 (Anupam Sir)
20 pages
(15) Big Data
No ratings yet
(15) Big Data
10 pages
IMP Questions pdf in Big Data
No ratings yet
IMP Questions pdf in Big Data
15 pages
Big Data (Analytics) in Power Systems
No ratings yet
Big Data (Analytics) in Power Systems
20 pages
Big Data
No ratings yet
Big Data
190 pages
Big Data Seminar Report Rahul Jain
No ratings yet
Big Data Seminar Report Rahul Jain
41 pages
What Is Big Data?
No ratings yet
What Is Big Data?
3 pages
BIG DATA ANALTICS (UNIT 1)
No ratings yet
BIG DATA ANALTICS (UNIT 1)
31 pages
Big Data Analytics
No ratings yet
Big Data Analytics
14 pages
FUNDAMENTALS OF BIG DATA ANALYTICS Digital Notes
No ratings yet
FUNDAMENTALS OF BIG DATA ANALYTICS Digital Notes
121 pages
BDAV Question Bank Solution
No ratings yet
BDAV Question Bank Solution
63 pages
Report On Bigdata
No ratings yet
Report On Bigdata
3 pages
Kwasu-csc204 Big Data Computing and Security-1
No ratings yet
Kwasu-csc204 Big Data Computing and Security-1
57 pages
BA ppt
No ratings yet
BA ppt
17 pages
Introduction To Big Data
No ratings yet
Introduction To Big Data
29 pages
ET-Ext
No ratings yet
ET-Ext
217 pages
Big Data 1 - 1
No ratings yet
Big Data 1 - 1
98 pages
Kwasu-csc204 Module 1 Big Data Computing and Security 2
No ratings yet
Kwasu-csc204 Module 1 Big Data Computing and Security 2
22 pages
Introductions: What Are The 5 Vs of Big Data/ Characteristics of Big Data or Nature of Data
No ratings yet
Introductions: What Are The 5 Vs of Big Data/ Characteristics of Big Data or Nature of Data
75 pages
BIG DATA ANALYTICS
No ratings yet
BIG DATA ANALYTICS
10 pages
Hadoop 2 & 3 Units Final
No ratings yet
Hadoop 2 & 3 Units Final
27 pages
Big Data
No ratings yet
Big Data
18 pages
Self Prepared
No ratings yet
Self Prepared
147 pages
Big Data and Hadoop Self Notes
No ratings yet
Big Data and Hadoop Self Notes
16 pages
Ccs 334
No ratings yet
Ccs 334
16 pages
Unit-1 Final sgs
No ratings yet
Unit-1 Final sgs
24 pages
Big Data All Unit by Study4sub
No ratings yet
Big Data All Unit by Study4sub
161 pages
big data 1
No ratings yet
big data 1
28 pages
BDA Answerbank
No ratings yet
BDA Answerbank
71 pages
Big Data Report
No ratings yet
Big Data Report
10 pages
Data Analytics with Python: Data Analytics in Python Using Pandas
From Everand
Data Analytics with Python: Data Analytics in Python Using Pandas
Frank Millstein
3/5 (1)
Microsoft PowerPoint - Fuzzy Lecture-01.Ppt (Compatibility Mode)
100% (1)
Microsoft PowerPoint - Fuzzy Lecture-01.Ppt (Compatibility Mode)
61 pages
T Sne Implementation R Python
No ratings yet
T Sne Implementation R Python
19 pages
Impact of Artificial Intelligence in Modern Management
0% (1)
Impact of Artificial Intelligence in Modern Management
5 pages
ĐỀ 4 - ÔN GIỮA KỲ 2 2025 (1)
No ratings yet
ĐỀ 4 - ÔN GIỮA KỲ 2 2025 (1)
9 pages
Project HRT Report
No ratings yet
Project HRT Report
25 pages
29 3B Paper2 Fizazi Hanane Amir
No ratings yet
29 3B Paper2 Fizazi Hanane Amir
9 pages
Emerging Technologies and Business Innovation-II PDF
No ratings yet
Emerging Technologies and Business Innovation-II PDF
4 pages
Asa Best Dissertation Award
100% (2)
Asa Best Dissertation Award
5 pages
Introduction to Artificial Intelligence for Security Professionals 1st Edition The Cylance Data Science Team - Download the ebook today to explore every detail
100% (1)
Introduction to Artificial Intelligence for Security Professionals 1st Edition The Cylance Data Science Team - Download the ebook today to explore every detail
59 pages
Rehman End-To-End Trained CNN Encoder-Decoder Networks for Image Steganography ECCVW 2018 Paper
No ratings yet
Rehman End-To-End Trained CNN Encoder-Decoder Networks for Image Steganography ECCVW 2018 Paper
6 pages
Simple Neural Nets For Pattern Classification
No ratings yet
Simple Neural Nets For Pattern Classification
68 pages
Introduction-to-Anomaly-Detection-with-Machine-Learning
No ratings yet
Introduction-to-Anomaly-Detection-with-Machine-Learning
12 pages
Supplier Diversity Program
No ratings yet
Supplier Diversity Program
7 pages
AI in Astronomy
No ratings yet
AI in Astronomy
12 pages
Sample Project Report Product Based-1
No ratings yet
Sample Project Report Product Based-1
40 pages
ICT - Artificial Intelligence
No ratings yet
ICT - Artificial Intelligence
14 pages
Class 9 - Syllabus - Half Yearly Examination
No ratings yet
Class 9 - Syllabus - Half Yearly Examination
5 pages
Piskopani Et Al 2023 Responsible Ai and
No ratings yet
Piskopani Et Al 2023 Responsible Ai and
5 pages
BS ( Data Science )
No ratings yet
BS ( Data Science )
31 pages
Unit 1 Notes
No ratings yet
Unit 1 Notes
68 pages
Total Quality Management and Six Sigma
No ratings yet
Total Quality Management and Six Sigma
306 pages
Artificial Intelligence MCQ
No ratings yet
Artificial Intelligence MCQ
15 pages
Title: AI in Art and Music Generation
No ratings yet
Title: AI in Art and Music Generation
8 pages
Cheat Sheet
No ratings yet
Cheat Sheet
15 pages
Design and Development of an Electric Multipurpose Sieving Machine
No ratings yet
Design and Development of an Electric Multipurpose Sieving Machine
8 pages
CSCI 512 - EENG 512 Computer Vision
No ratings yet
CSCI 512 - EENG 512 Computer Vision
4 pages
How AI Interviews Can Benefit Learners - Revolutionizing AI Learning
No ratings yet
How AI Interviews Can Benefit Learners - Revolutionizing AI Learning
6 pages

BDA notes part 1

Uploaded by

BDA notes part 1

Uploaded by

Introduction to Big Data:

Why Big Data is so important?

 At the right speed.

Difference b/w Traditional data and Big data:

Traditional data Big Data

5 V’s of Big Data:

1. Better decision making:

2. Big Data in greater innovations:

4. Big Data in product price optimization:

5. Big Data in Recommendation Engine:

6. Big Data in Healthcare Industry:

Types of Big Data Analytics:

e.g. Details filled in the form are descriptive.

Diagnose and detailed information is there in this analysis.

Big Data Architecture:

Internal data: Built-in data systems.

External data: Data in pendrive/external sources.

Streaming data: Data without storage but live data is there.

Big Data Components:

Challenges of Big Data:

 Handles continuous data streams in real-time with low latency.

Data Stream Models:

Types of Data Stream Models:

 Data is processed based on time intervals (e.g., every 10 seconds).

 Example: Stock market price updates every second.

 Processes data after receiving a fixed number of items.

 Example: Analysing customer reviews after every 100 entries.

3. Sliding Window Model:

 Keeps a limited amount of recent data for analysis.

 Example: Monitoring website visitors in the last 10 minutes.

4. Tumbling Window Model:

 Example: Analysing sales every hour without overlap.

 Example: Estimating the number of unique visitors to a website.

Types of Streaming Methods:

o Data is collected over a period and then processed together.

o Example: A company analyses daily sales reports at midnight.

2. Real-Time (Event-Driven) Processing:

o Data is processed as soon as it arrives.

o Example: Fraud detection in banking transactions instantly flags suspicious activity.

o Example: Social media analytics updating every few minutes.

o Processes data within a specific time or count-based window.

o Example: Monitoring website traffic in the last 10 minutes.

Why is Data Synopsis Important?

 Saves storage space by keeping only key information.

 Speeds up data analysis without needing full data.

 Helps in real-time decision-making for large-scale systems.

o Takes a small portion of data to represent the whole dataset.

o Example: Checking 100 customer reviews instead of 1 million.

o Uses mathematical techniques to estimate data properties.

o Example: Tracking the number of customers in different age groups.

o Compresses data while keeping important patterns.

o Example: Identifying trends in stock market prices over time.

o Keeps only the most recent data for analysis.

o Example: Monitoring temperature readings from sensors in the last 10 minutes.

🔹 Why Use Sampling?

 Saves time and storage.

 Speeds up data processing.

 Works well when full data analysis isn’t necessary.

🔹 Why Use Filtering?

 Reduces noise and irrelevant information.

You might also like