0% found this document useful (0 votes)

10 views14 pages

BDA Unit2

The document discusses stream processing and mining data streams in big data analytics, emphasizing the importance of real-time data analysis for applications like fraud detection and predictive maintenance. It outlines the components of stream processing systems, including data sources, processing engines, and sinks, and highlights techniques such as filtering, aggregation, and sampling used to manage continuous data streams. Additionally, it covers challenges in mining data streams, such as high volume and velocity, and introduces algorithms like HyperLogLog and Count-Min Sketch for counting distinct elements.

Uploaded by

Sharmila Adari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views14 pages

BDA Unit2

Uploaded by

Sharmila Adari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Faculty Name: Ravula Kartheek M.Tech. (Ph.

UNIT-2
STREAM PROCESSING
Stream Processing :
Stream processing is a data processing method in big data analytics that involves the real-time
analysis of continuous data streams. With the growth of the Internet of Things (IoT), social media,
and other sources of real-time data, the volume of data being generated has increased
exponentially. Stream processing provides a way to handle this data by allowing for the
processing of data in real-time, as it is being generated, rather than processing data in batches.

Stream processing systems typically consist of three components: sources of data, processing
engines, and sinks. The data sources can be sensors, social media feeds, or any other source that
provides a continuous stream of data. The processing engines are responsible for processing and
analyzing the data streams in real-time, and the sinks are the endpoints where the processed data
is stored or sent for further analysis.

Stream processing is used in a variety of applications, including fraud detection, predictive

maintenance, real-time monitoring, and sentiment analysis. It is particularly useful in applications
where time is of the essence, and decisions need to be made quickly based on the data. By
processing data in real-time, organizations can respond to events as they happen, rather than
after the fact, which can lead to more effective decision-making.

Stream processing systems use various techniques such as filtering, aggregation, and pattern
matching to process the data streams. These techniques are applied in real-time, and the results
are continuously updated, providing real-time insights into the data. Stream processing systems
are typically designed to be scalable and fault-tolerant, allowing them to handle large volumes of
data and operate reliably in the face of hardware failures or other issues.

In summary, stream processing is a powerful data processing method in big data analytics that
allows organizations to process data in real-time, as it is being generated. It is particularly useful
in applications where time is of the essence and decisions need to be made quickly based on the
data. Stream processing systems use various techniques to process data streams, and they are
designed to be scalable and fault-tolerant to handle large volumes of data and operate reliably.
Faculty Name: Ravula Kartheek M.Tech. (Ph.D)

Mining Data Streams:

Mining data streams is the process of extracting knowledge and insights from continuous data
streams in real-time or near-real-time. In big data analytics, data streams can come from various
sources, such as sensors, social media feeds, log files, and other data sources.

Mining data streams involves the application of various data mining techniques to analyze data
streams and identify patterns, trends, and anomalies. The data mining techniques used in stream
mining are similar to those used in batch processing, but they are adapted to work with
continuous data streams.

One of the primary challenges in mining data streams is the high volume and velocity of data.
Data streams can produce an enormous amount of data, making it challenging to store and
process the data efficiently. Additionally, data streams can be dynamic, meaning that the
distribution of the data can change over time, requiring the use of adaptive algorithms to handle
the changing data distribution.

Another challenge in mining data streams is the need for real-time analysis of the data. Real-time
analysis of data streams is essential in applications such as fraud detection, predictive
maintenance, and real-time monitoring, where timely decisions need to be made based on the
data.

To address these challenges, stream mining systems are designed to be scalable, adaptive, and
efficient. These systems use various techniques, such as sliding windows, sampling, and
approximate algorithms, to process data streams efficiently. Additionally, stream mining systems
use adaptive algorithms that can adjust to the changing data distribution over time.

In summary, mining data streams in big data analytics is the process of extracting knowledge
and insights from continuous data streams in real-time or near-real-time. The challenges of
mining data streams include the high volume and velocity of data, the dynamic nature of data
streams, and the need for real-time analysis of the data. Stream mining systems are designed to
be scalable, adaptive, and efficient, using various techniques to process data streams efficiently
and adapt to changing data distributions.
Faculty Name: Ravula Kartheek M.Tech. (Ph.D)

Introduction to Streams Concepts:

In big data analytics, a stream is a continuous flow of data generated from various sources, such
as sensors, social media feeds, log files, and other data sources. Streams can be unbounded or
bounded in size, and they can be either real-time or near-real-time.

Stream processing refers to the continuous processing of data streams in real-time or near-real-
time, where data is processed as it arrives rather than waiting for the entire dataset to be
available. Stream processing is often used in applications where time is critical, and real-time
analysis of data is required, such as fraud detection, stock market analysis, and real-time
monitoring.

Streams are different from batch processing, where data is processed in batches after the data is
collected. Batch processing can be used to process large volumes of data, but it is not suitable
for real-time analysis of data. In contrast, stream processing enables the processing of data as
soon as it is generated, providing real-time insights into the data.

Stream processing systems typically consist of three components: sources of data, processing
engines, and sinks. The sources of data can be any data source that provides a continuous
stream of data. The processing engines are responsible for processing and analyzing the data
streams in real-time, and the sinks are the endpoints where the processed data is stored or sent
for further analysis.

Stream processing systems use various techniques such as filtering, aggregation, and pattern
matching to process the data streams. These techniques are applied in real-time, and the results
are continuously updated, providing real-time insights into the data. Stream processing systems
are typically designed to be scalable and fault-tolerant, allowing them to handle large volumes
of data and operate reliably in the face of hardware failures or other issues.

In summary, streams in big data analytics are a continuous flow of data generated from various
sources. Stream processing refers to the continuous processing of data streams in real-time or
near-real-time, enabling real-time analysis of data. Stream processing systems consist of sources
of data, processing engines, and sinks, and use various techniques to process data streams.
Stream processing systems are designed to be scalable and fault-tolerant, allowing them to
handle large volumes of data and operate reliably.

Stream Data Model and Architecture:

The stream data model is a way of representing and processing continuous data streams in big
data analytics. The stream data model is different from traditional batch processing models,
which process data in large batches after the data has been collected.
Faculty Name: Ravula Kartheek M.Tech. (Ph.D)

In the stream data model, data is processed as it arrives in real-time or near-real-time, providing
immediate insights into the data. The stream data model is often used in applications where
time is critical, such as fraud detection, stock market analysis, and real-time monitoring.

The stream data architecture consists of three components: sources of data, processing engines,
and sinks. The sources of data can be any data source that provides a continuous stream of
data. The processing engines are responsible for processing and analyzing the data streams in
real-time, and the sinks are the endpoints where the processed data is stored or sent for further
analysis.

The processing engines in the stream data architecture use various techniques such as filtering,
aggregation, and pattern matching to process the data streams. These techniques are applied in
real-time, and the results are continuously updated, providing real-time insights into the data.
Stream processing systems are typically designed to be scalable and fault-tolerant, allowing
them to handle large volumes of data and operate reliably in the face of hardware failures or
other issues.

The stream data model can be represented using various data structures, such as streams,
tuples, and windows. Streams are continuous sequences of data, and tuples are individual pieces
of data within the stream. Windows are subsets of the data stream that are used for analysis and
processing.

The stream data architecture can be implemented using various open-source stream processing
platforms such as Apache Flink, Apache Kafka, and Apache Storm. These platforms provide a
scalable and fault-tolerant infrastructure for stream processing, allowing developers to focus on
the analysis and processing of the data streams.

In summary, the stream data model is a way of representing and processing continuous data
streams in real-time or near-real-time. The stream data architecture consists of sources of data,
processing engines, and sinks, and uses various techniques such as filtering, aggregation, and
pattern matching to process the data streams. The stream data model can be represented using
various data structures such as streams, tuples, and windows, and can be implemented using
open-source stream processing platforms such as Apache Flink, Apache Kafka, and Apache
Storm.
Faculty Name: Ravula Kartheek M.Tech. (Ph.D)

Stream Computing:
Stream computing is a type of real-time data processing that involves analyzing and acting on
data as it flows through a system, rather than processing it in batch mode. In big data analytics,
stream computing is used to analyze continuous data streams from various sources, including
social media, sensors, and other sources.

Stream computing involves several key components, including data ingestion, data processing,
and data analysis. Data ingestion involves receiving and processing data streams from various
sources. Data processing involves transforming the data streams into a format that can be
analyzed and acted upon. Data analysis involves using various techniques, such as machine
learning, statistical analysis, and data visualization, to gain insights from the data streams.

Stream computing systems are designed to handle large volumes of data in real-time or near-
real-time. These systems use distributed architectures to scale horizontally, allowing them to
handle large volumes of data and processing tasks simultaneously. Stream computing systems
also use complex event processing (CEP) algorithms to detect patterns and anomalies in the
data streams, allowing analysts to respond to events as they occur.

Stream computing has several advantages over traditional batch processing. Firstly, stream
computing provides real-time insights into data, allowing analysts to make informed decisions
quickly. Secondly, stream computing is highly scalable, allowing organizations to process large
volumes of data from multiple sources simultaneously. Finally, stream computing is adaptable,
allowing analysts to adjust the data processing and analysis algorithms as needed.

Several open-source stream computing platforms are available, including Apache Flink, Apache
Spark Streaming, and Apache Storm. These platforms provide a scalable, distributed
infrastructure for stream computing, allowing organizations to process and analyze large
volumes of data in real-time.

In summary, stream computing is a type of real-time data processing that involves analyzing
and acting on data as it flows through a system. Stream computing systems are designed to
handle large volumes of data in real-time or near-real-time, and use distributed architectures
and complex event processing algorithms to scale horizontally and detect patterns and
anomalies in data streams. Stream computing has several advantages over traditional batch
processing, including real-time insights, scalability, and adaptability, and several open-source
platforms are available for implementing stream computing in big data analytics.
Faculty Name: Ravula Kartheek M.Tech. (Ph.D)

Big Stream Computing in Health Care

Sampling Data in a Stream:

Sampling is a technique used in big data analytics to extract a subset of data from a larger data
stream. Sampling involves selecting a representative subset of the data stream, which can be
analyzed to gain insights about the entire data stream.

Sampling is often used when it is impractical or impossible to analyze the entire data stream in
real-time, due to limitations in processing power, network bandwidth, or storage capacity.

There are several techniques for sampling data in a stream. One common technique is simple
random sampling, where a random subset of data is selected from the data stream. Another
technique is stratified sampling, where the data stream is partitioned into several subsets based
on certain criteria, and a sample is selected from each subset. Stratified sampling can improve
the accuracy of the sample by ensuring that it includes data from all segments of the stream.

Sampling can be performed at different points in the data processing pipeline. In some cases,
sampling is performed on the raw data stream before any processing is performed, while in
other cases, sampling is performed on the processed data stream. Sampling at different points
in the pipeline can affect the accuracy of the sample and the processing time required.

Sampling is often used in conjunction with other techniques for processing data streams, such
as filtering and aggregation. Filtering involves selecting a subset of data from the data stream
based on certain criteria, while aggregation involves combining multiple data points into a
single data point. Sampling, filtering, and aggregation can be used together to process large
volumes of data in real-time, providing insights into the data stream without requiring the
processing of the entire data stream.

In summary, sampling is a technique used in big data analytics to extract a subset of data from a
larger data stream. Sampling can be performed using different techniques, such as simple
random sampling and stratified sampling, and can be performed at different points in the data
Faculty Name: Ravula Kartheek M.Tech. (Ph.D)

processing pipeline. Sampling is often used in conjunction with other techniques, such as
filtering and aggregation, to process large volumes of data in real-time and gain insights into
the data stream.

Filtering Streams:
Filtering is a technique used in big data analytics to extract relevant information from a stream
of data. Filtering involves selecting a subset of data from a larger data stream based on certain
criteria. This technique is often used to identify and extract data that meets specific conditions,
which can then be further analyzed to gain insights about the data stream.

Filtering in a data stream can be performed using different types of filters, such as time-based
filters, value-based filters, and pattern-based filters. Time-based filters are used to select data
within a specific time range, such as data that was collected during a certain time period. Value-
based filters are used to select data based on certain attribute values, such as selecting data
from a specific location or data with certain characteristics. Pattern-based filters are used to
select data that matches a specific pattern, such as detecting anomalies or specific events in the
data stream.

Filtering can be performed on the raw data stream or on the processed data stream. Filtering on
the raw data stream can reduce the amount of data that needs to be processed, while filtering
on the processed data stream can provide more precise results.

Filtering can be performed using different tools and technologies, such as stream processing
platforms, database management systems, and programming languages. Stream processing
platforms, such as Apache Flink and Apache Kafka, provide built-in support for filtering data
streams. Database management systems, such as Apache Cassandra and MongoDB, provide
filtering capabilities for querying large volumes of data. Programming languages, such as
Python and Java, provide libraries and frameworks for filtering data streams.
Faculty Name: Ravula Kartheek M.Tech. (Ph.D)

Filtering is a key technique in big data analytics, as it allows organizations to extract relevant
information from large volumes of data. Filtering can be used in conjunction with other
techniques, such as sampling and aggregation, to process and analyze data streams in real-time.
By filtering data streams, organizations can identify patterns, trends, and anomalies, which can
be used to gain insights and make informed decisions.

Counting Distinct Elements in a Stream:

Counting distinct elements in a stream is a common problem in big data analytics. It involves
determining the number of unique elements in a data stream, which can be useful for
identifying patterns, detecting anomalies, and performing other types of analysis.

Counting distinct elements in a stream can be challenging due to the large volume and high
velocity of data. Traditional methods for counting distinct elements, such as maintaining a list of
all elements and counting unique elements, are not feasible for large data streams.

There are several techniques for counting distinct elements in a stream that are designed for big
data analytics. One popular technique is the HyperLogLog algorithm, which uses a probabilistic
data structure to estimate the number of distinct elements in a data stream. The HyperLogLog
algorithm can provide accurate estimates with low memory usage, making it well-suited for
processing large data streams.

Another technique for counting distinct elements in a stream is the Count-Min Sketch algorithm,
which uses a similar probabilistic data structure to estimate the frequency of each element in a
data stream. The Count-Min Sketch algorithm can be used to estimate the number of distinct
elements by counting the number of elements with a frequency of one.
Faculty Name: Ravula Kartheek M.Tech. (Ph.D)

Both the HyperLogLog and Count-Min Sketch algorithms are designed to work with data
streams that have high velocity and large volume, and are well-suited for big data analytics.
These algorithms can provide accurate estimates of the number of distinct elements in a data
stream, while requiring significantly less memory than traditional methods.

In summary, counting distinct elements in a stream is a common problem in big data analytics.
Traditional methods for counting distinct elements are not feasible for large data streams, so
probabilistic algorithms such as the HyperLogLog and Count-Min Sketch algorithms are used
instead. These algorithms are designed to work with large data streams, provide accurate
estimates of the number of distinct elements, and require less memory than traditional methods.

Estimating Moments:
In big data analytics, estimating moments is a technique used to summarize and understand
the statistical properties of a data stream. Moments are a set of mathematical measures that
provide information about the shape, center, and spread of a probability distribution. Common
moments include the mean, variance, skewness, and kurtosis.

Estimating moments in big data analytics is challenging due to the large volume and high
velocity of data. Traditional methods for estimating moments, such as computing the sample
mean and variance, are not feasible for large data streams.

There are several techniques for estimating moments in big data analytics, including the
following:

 Counting-based methods: These methods use a probabilistic data structure, such as a

Bloom filter, to estimate the count of each element in a data stream. The estimated
counts can be used to compute moments, such as the mean and variance, of the data
stream.
 Sketch-based methods: These methods use a data structure, such as a Count-Min
Sketch or a Space-Saving algorithm, to estimate the frequency of each element in a data
Faculty Name: Ravula Kartheek M.Tech. (Ph.D)

stream. The estimated frequencies can be used to compute moments of the data
stream.
 Sampling-based methods: These methods randomly sample a subset of the data stream
and use the sampled data to estimate moments of the entire data stream. Sampling-
based methods can provide accurate estimates of moments with less computational
resources than other methods, but may introduce sampling error.
 Streaming algorithms: These are more complex algorithms that can estimate moments
of the data stream without storing all the data. Examples of such algorithms include the
t-digest algorithm and the Greenwald-Khanna algorithm. These algorithms work by
maintaining a summary of the data stream that can be used to estimate moments.

Estimating moments in big data analytics is important for understanding the statistical
properties of data streams. By estimating moments, organizations can identify patterns, trends,
and anomalies in the data stream, and make informed decisions based on this information.
However, due to the large volume and high velocity of data, traditional methods for estimating
moments are not feasible, and specialized techniques such as counting-based, sketch-based,
sampling-based, and streaming algorithms are used instead.

Counting Oneness in a window:

Counting ones in a window is a technique used in big data analytics to count the number of
times a specific event occurs within a specified time window. This technique is commonly used
for monitoring and analyzing real-time data streams, such as social media feeds, sensor data,
or log files.

The process of counting ones in a window involves defining a time window, which is a fixed
period of time during which the counting will take place. The window can be of any duration,
such as 10 seconds, 1 minute, or 1 hour, depending on the requirements of the analysis.

Once the window has been defined, the data within the window is analyzed to count the
number of times a specific event occurs. The event is typically represented as a binary value,
where a "one" indicates that the event occurred and a "zero" indicates that it did not occur.

The counting can be performed using a sliding window or a tumbling window, depending on
the requirements of the analysis.

A sliding window moves along the data stream, and at each point in time, counts the number
of ones within the window. As the window slides along the data stream, the count is updated
accordingly.

A tumbling window divides the data stream into fixed-length segments, or buckets, and counts
the number of ones within each bucket. The buckets do not overlap, and each bucket
represents a separate time window.
Faculty Name: Ravula Kartheek M.Tech. (Ph.D)

Counting ones in a window is a useful technique for monitoring and analyzing real-time data
streams. It can be used to detect anomalies, identify patterns, and track trends in the data. The
technique is particularly useful when dealing with large volumes of data, where it is not feasible
to analyze the entire data stream in real-time. By focusing on a specific time window, the
analysis can be performed more efficiently, while still providing valuable insights into the data
stream.

Decaying Window:
A decaying window is a technique used in big data analytics to analyze a data stream by giving
more weight to recent events and less weight to older events. This technique is commonly used
for monitoring and analyzing data streams that exhibit time-varying patterns or trends, such as
stock prices, website traffic, or social media feeds.

The process of using a decaying window involves defining a time window, similar to the process
of counting ones in a window. However, in a decaying window, the weight given to each event
within the window decreases over time. This means that recent events have a higher weight than
older events.

The weight assigned to each event within the window can be determined using a decay
function, which is a mathematical function that assigns a weight to each event based on its age.
Commonly used decay functions include exponential decay, linear decay, and logarithmic decay.

Once the weight for each event has been determined, the data within the window is analyzed to
detect patterns, trends, or anomalies. The analysis can be performed using various techniques,
such as machine learning algorithms, statistical models, or time-series analysis.

Decaying windows are useful for analyzing data streams that exhibit time-varying patterns or
trends, where recent events are more important than older events. By giving more weight to
recent events, the analysis can be more sensitive to changes in the data stream, allowing
organizations to detect trends or anomalies in real-time. However, it is important to choose an
Faculty Name: Ravula Kartheek M.Tech. (Ph.D)

appropriate decay function and time window size to ensure that the analysis is accurate and
efficient.

Decaying Window Algorithm

Real Time Analytics Platform (RTAP) Applications:

Real-time analytics platforms are applications used in big data analytics to provide real-time
insights and analysis on streaming data. These platforms are designed to process large volumes
of data in real-time, enabling organizations to make faster and more informed decisions based
on current data.

Real-time analytics platforms have a wide range of applications in big data analytics, including:

 Fraud detection: Real-time analytics platforms can be used to detect fraudulent activities,
such as credit card fraud, in real-time. The platform can analyze transactions as they
occur and identify any suspicious patterns or activities.
 Predictive maintenance: Real-time analytics platforms can be used to monitor equipment
and machinery in real-time and predict when maintenance is required. This can help
prevent equipment breakdowns and reduce downtime.
 Customer experience: Real-time analytics platforms can be used to analyze customer
behavior in real-time, such as website clicks or social media interactions. This can provide
organizations with insights into customer preferences and behavior, enabling them to
improve the customer experience.
 Supply chain management: Real-time analytics platforms can be used to monitor
inventory levels and track shipments in real-time. This can help organizations optimize
their supply chain and reduce costs.
 Energy management: Real-time analytics platforms can be used to monitor energy
consumption in real-time and identify areas where energy can be saved. This can help
organizations reduce their energy costs and improve their sustainability.
Faculty Name: Ravula Kartheek M.Tech. (Ph.D)

Overall, real-time analytics platforms are essential tools for organizations that require real-time
insights and analysis on streaming data. These platforms can help organizations improve their
operational efficiency, reduce costs, and enhance the customer experience.

Case Study – Real Time Sentiment Analysis:

Real-time sentiment analysis is the process of using natural language processing (NLP) and
machine learning algorithms to analyze text data and determine the sentiment expressed within
it. This technique has many applications in big data analytics, including analyzing social media
feeds, customer reviews, and customer support interactions. Here are a few case studies that
showcase real-time sentiment analysis in action:

 Twitter sentiment analysis for product launch: A company that was launching a new
product used real-time sentiment analysis to monitor the Twitter feeds of users
discussing the product. The sentiment analysis helped the company understand how
customers were reacting to the product and identify any issues or concerns they had. This
enabled the company to make changes to the product before its official launch,
improving its chances of success.
 Customer support sentiment analysis: A company that provides customer support
services used real-time sentiment analysis to analyze customer interactions and identify
any negative sentiment expressed by customers. The sentiment analysis helped the
company to quickly address customer issues and improve the overall customer
experience.
 Political sentiment analysis: During an election campaign, a political party used real-time
sentiment analysis to monitor social media feeds and determine the sentiment expressed
towards their candidate. The sentiment analysis helped the party to adjust their campaign
strategy in real-time, improving their chances of success.
 News sentiment analysis: A news organization used real-time sentiment analysis to
monitor news stories and determine the sentiment expressed towards different topics.
The sentiment analysis helped the organization to identify trending topics and create
content that was relevant and engaging to their audience.
 E-commerce sentiment analysis: An e-commerce company used real-time sentiment
analysis to analyze customer reviews and feedback. The sentiment analysis helped the
company to identify product issues and improve their product offerings based on
customer feedback.

Overall, real-time sentiment analysis is a powerful tool that can provide organizations with
valuable insights into customer sentiment and behavior. By analyzing data in real-time,
organizations can quickly respond to issues and make data-driven decisions that improve the
customer experience and increase customer satisfaction.

Case Study – Stock Market Predictions :

Faculty Name: Ravula Kartheek M.Tech. (Ph.D)

Predicting stock market trends and movements is a complex and challenging task, but big data
analytics can help by analyzing large amounts of data to identify patterns and correlations. Here
are a few case studies that showcase the application of big data analytics in stock market
predictions:

 Stock price prediction using machine learning algorithms: A financial services company
used big data analytics to predict stock prices using machine learning algorithms. The
company analyzed a variety of data sources, including financial reports, news articles, and
social media feeds, to identify patterns and correlations that could help predict future
stock prices. The predictive models were then used to make investment decisions.
 Sentiment analysis for stock market prediction: A hedge fund used big data analytics and
sentiment analysis to predict stock market trends. The fund analyzed social media feeds
and news articles to determine the sentiment expressed towards different companies and
industries. The sentiment analysis helped the fund to make investment decisions based
on market sentiment.
 Time-series analysis for stock market prediction: A financial institution used big data
analytics and time-series analysis to predict stock market trends. The institution analyzed
historical stock market data to identify patterns and trends that could be used to predict
future movements in the stock market. The predictive models were then used to make
investment decisions.
 Market basket analysis for stock market prediction: A data analytics firm used market
basket analysis, a technique used in retail to identify items frequently purchased together,
to predict stock market movements. The firm analyzed historical stock market data to
identify patterns and correlations between different stocks. The analysis helped the firm
to identify stocks that were likely to move together, enabling them to make investment
decisions based on market trends.

Overall, big data analytics can provide valuable insights into stock market trends and
movements. By analyzing large amounts of data from various sources, organizations can identify
patterns and correlations that can be used to predict future movements in the stock market.
However, it is important to note that stock market prediction is a complex and uncertain task,
and organizations should exercise caution when making investment decisions based on
predictive models.

Big Data Analytics Unit-2
100% (1)
Big Data Analytics Unit-2
11 pages
Big Data Analytics - Unit 2 Notes
No ratings yet
Big Data Analytics - Unit 2 Notes
44 pages
Unit 2 BD Mining Data Streams
No ratings yet
Unit 2 BD Mining Data Streams
34 pages
Big Data IV Nit
No ratings yet
Big Data IV Nit
15 pages
Uint 4miningdatastream 230810162429 9d7c02a7
No ratings yet
Uint 4miningdatastream 230810162429 9d7c02a7
11 pages
Unit-3 Notes
No ratings yet
Unit-3 Notes
10 pages
BDA Unit-4
No ratings yet
BDA Unit-4
12 pages
Bda 2
No ratings yet
Bda 2
16 pages
Stream Processing for IT/CSE Students
No ratings yet
Stream Processing for IT/CSE Students
57 pages
Rajat AIML
No ratings yet
Rajat AIML
8 pages
Bigdata Unit-Ii
No ratings yet
Bigdata Unit-Ii
33 pages
Big Data 3rd Unit
No ratings yet
Big Data 3rd Unit
16 pages
Lec 19
No ratings yet
Lec 19
23 pages
Unit Iv
No ratings yet
Unit Iv
5 pages
Data Stream in Data Analytics
No ratings yet
Data Stream in Data Analytics
4 pages
Unit Iv
No ratings yet
Unit Iv
11 pages
Big Data
No ratings yet
Big Data
37 pages
Big Data Notes
No ratings yet
Big Data Notes
37 pages
A Deep Dive Into Data Stream Processing
No ratings yet
A Deep Dive Into Data Stream Processing
10 pages
What Is Stream Processing
No ratings yet
What Is Stream Processing
11 pages
Stream Processing and Website Tracking
No ratings yet
Stream Processing and Website Tracking
2 pages
Mining Data Streams
No ratings yet
Mining Data Streams
17 pages
Mining Data Streams in Data Analytics Refers To The Process of Extracting Useful Patterns
No ratings yet
Mining Data Streams in Data Analytics Refers To The Process of Extracting Useful Patterns
30 pages
Stream Computing
No ratings yet
Stream Computing
18 pages
Big Data PDF
No ratings yet
Big Data PDF
10 pages
What Is Stream Processing
No ratings yet
What Is Stream Processing
3 pages
Batch Processing Vs Stream Processing
No ratings yet
Batch Processing Vs Stream Processing
3 pages
Lec 19
No ratings yet
Lec 19
24 pages
UNIT-2 (Big Data)
No ratings yet
UNIT-2 (Big Data)
30 pages
Module-2-MINING DATA STREAMS
100% (3)
Module-2-MINING DATA STREAMS
17 pages
Unit 2
No ratings yet
Unit 2
10 pages
Streaming Analytics Has Emerged As A Pivotal Technology For Processing and Analyzing Large
No ratings yet
Streaming Analytics Has Emerged As A Pivotal Technology For Processing and Analyzing Large
14 pages
Real Time Data Stream Processing Engine
No ratings yet
Real Time Data Stream Processing Engine
13 pages
Unit II (Big Data)
No ratings yet
Unit II (Big Data)
19 pages
Mod4 DWDM BTECH
No ratings yet
Mod4 DWDM BTECH
9 pages
HDRLC STRCTR LL
No ratings yet
HDRLC STRCTR LL
13 pages
Unit 3-6
No ratings yet
Unit 3-6
14 pages
6 - Streaming Part 1
No ratings yet
6 - Streaming Part 1
44 pages
SA Unit 1 PPT 2
No ratings yet
SA Unit 1 PPT 2
27 pages
Unit 2 Bda
No ratings yet
Unit 2 Bda
13 pages
BDA Mod 3
No ratings yet
BDA Mod 3
57 pages
009.1 - Why Is Stream Processing Needed
No ratings yet
009.1 - Why Is Stream Processing Needed
2 pages
Mining Data Streams Partial Guide
No ratings yet
Mining Data Streams Partial Guide
4 pages
Reference Guide To Stream Processing
No ratings yet
Reference Guide To Stream Processing
14 pages
Stream Processing
No ratings yet
Stream Processing
33 pages
Mining Data Streams
No ratings yet
Mining Data Streams
37 pages
Short Notes On Unit 4 - Data Mining and Data Wareho
No ratings yet
Short Notes On Unit 4 - Data Mining and Data Wareho
7 pages
Document 15
No ratings yet
Document 15
15 pages
Unit-II BDA
No ratings yet
Unit-II BDA
19 pages
Unit-II (Big Data)
No ratings yet
Unit-II (Big Data)
20 pages
Unit-Ii 30-1-24
No ratings yet
Unit-Ii 30-1-24
162 pages
Chapter 1-1
No ratings yet
Chapter 1-1
34 pages
Data Stream Mining Essentials
No ratings yet
Data Stream Mining Essentials
33 pages
Swe2011 Bda - III
No ratings yet
Swe2011 Bda - III
50 pages
Ade Mod 1 Incremental Processing With Spark Structured Streaming
No ratings yet
Ade Mod 1 Incremental Processing With Spark Structured Streaming
73 pages
Data Mining Unit-V
No ratings yet
Data Mining Unit-V
19 pages
Indemnity Bond by Students
No ratings yet
Indemnity Bond by Students
2 pages
From Market To Home
No ratings yet
From Market To Home
2 pages
IB1703 Group-2 IEI301 - Group-Assignment Report
No ratings yet
IB1703 Group-2 IEI301 - Group-Assignment Report
41 pages
Acpl Item Discarded: Roofing Handbook
100% (2)
Acpl Item Discarded: Roofing Handbook
568 pages
Gen 006 Ethics P3 Reviewer
100% (1)
Gen 006 Ethics P3 Reviewer
15 pages
CH 2 Activity-Based Costing
100% (1)
CH 2 Activity-Based Costing
45 pages
List of Software Companies in Bangalore
No ratings yet
List of Software Companies in Bangalore
54 pages
Consumer Rights
100% (1)
Consumer Rights
6 pages
Hotel Dafam Cilacap
100% (2)
Hotel Dafam Cilacap
3 pages
BS en 124-3-2015
No ratings yet
BS en 124-3-2015
40 pages
Loan Statement As On 27/07/2023
No ratings yet
Loan Statement As On 27/07/2023
1 page
算法的陷阱：超级平台、算法垄断与场景欺骗 (PDFDrive)
100% (1)
算法的陷阱：超级平台、算法垄断与场景欺骗 (PDFDrive)
222 pages
Bank Reconciliation Statement
No ratings yet
Bank Reconciliation Statement
12 pages
Seeing Around Corners Rita Mcgrath
No ratings yet
Seeing Around Corners Rita Mcgrath
10 pages
Updtaed CV - Y.vikas Singla-2018
No ratings yet
Updtaed CV - Y.vikas Singla-2018
6 pages
Valuation Courses for Land & Building
No ratings yet
Valuation Courses for Land & Building
9 pages
Effects of Materials Management On The Productivity of An Organisation
No ratings yet
Effects of Materials Management On The Productivity of An Organisation
7 pages
Six Sigma Master Black Belt Competencies
No ratings yet
Six Sigma Master Black Belt Competencies
392 pages
Bouviers Law Dictionary and Concise Encyclopedia Vol II 1914 PDF
100% (3)
Bouviers Law Dictionary and Concise Encyclopedia Vol II 1914 PDF
1,130 pages
LS210-15M、LS360-12M type auger manual
No ratings yet
LS210-15M、LS360-12M type auger manual
15 pages
25 Vol 102 No 22
No ratings yet
25 Vol 102 No 22
13 pages
FINS3616 - TutorialWeek2 - Solutions
100% (1)
FINS3616 - TutorialWeek2 - Solutions
6 pages
E-Business Practicum Presentation On Ride Sharing Application
No ratings yet
E-Business Practicum Presentation On Ride Sharing Application
11 pages
HRBP Roles for Telecom Success
No ratings yet
HRBP Roles for Telecom Success
17 pages
Ththa00981250000019883 2024
No ratings yet
Ththa00981250000019883 2024
2 pages
Indian Standard: Specification For Collapsible Gates
No ratings yet
Indian Standard: Specification For Collapsible Gates
8 pages
Principles of Accounting Review: Professor Alicia Davis, Eds Spring 2021
No ratings yet
Principles of Accounting Review: Professor Alicia Davis, Eds Spring 2021
3 pages
KMP Remuneration ICSI
No ratings yet
KMP Remuneration ICSI
21 pages
Fire Insurance Basics and Policies
No ratings yet
Fire Insurance Basics and Policies
14 pages
Adidas Motivation Theory's Revenue Impact
No ratings yet
Adidas Motivation Theory's Revenue Impact
3 pages

BDA Unit2

Uploaded by

BDA Unit2

Uploaded by

Faculty Name: Ravula Kartheek M.Tech. (Ph.

Stream processing is used in a variety of applications, including fraud detection, predictive

Mining Data Streams:

Introduction to Streams Concepts:

Stream Data Model and Architecture:

Big Stream Computing in Health Care

Sampling Data in a Stream:

Counting Distinct Elements in a Stream:

 Counting-based methods: These methods use a probabilistic data structure, such as a

Counting Oneness in a window:

Decaying Window Algorithm

Real Time Analytics Platform (RTAP) Applications:

Case Study – Real Time Sentiment Analysis:

Case Study – Stock Market Predictions :

You might also like