Unit 1 Windowing

Windowing in big data involves processing unbounded data streams by dividing them into finite sets based on criteria like time or tuple count. Different types of windows include fixed, sliding, tumbling, and hopping, each serving specific analytical needs. Event streaming allows for real-time data processing and insights, supported by components like event producers, message brokers, and consumers, while addressing challenges such as data volume and complexity.

Uploaded by

Ameryn Ameryn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views23 pages

Unit 1 Windowing

Uploaded by

Ameryn Ameryn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Concept of Windowing

in Big Data
•Windowing is one of the most frequently
used processing methods for streams of data.
• An unbounded stream of data (events) is
split into finite sets, or windows, based on
specified criteria, such as time.
•A window can be conceptualized as an in-
memory table in which events are added and
removed based on a set of policies
Time Based Windows
● A window defined by a rowtime interval. The window’s defining criteria
specify a finite set of rows, using a rowtime-based specification.
● In time-based windowing, data is grouped based on specific time intervals.
This could be, for example, 1-minute, 5-minute, or 1-hour windows.
● As new data arrives, it is assigned to the appropriate time interval, and
computations are performed on the data within that interval. This type of
windowing is useful for analyzing trends and patterns over time.
Tuple Count based Window
● In Tuple count-based windowing, data is grouped based on a fixed number of data
points. For example, you could define a window size of 100 data points, and as
soon as 100 data points arrive, the computation is performed on that batch of
data. This type of windowing is useful when you want to process data in chunks of
a certain size.
● Tuples are grouped in a single window based on time or count. Any tuple belongs
to only one of the windows.
● Storm core has support for processing a group of tuples that falls within a window.
Windows are specified with the following two parameters,
1. Window length - the length or duration of the window
2. Sliding interval - the interval at which the windowing slides
Movement of Windows

1) Fixed
2) Sliding
3) Trumbling
4) Hoping
Windows can be:
•fixed/tumbling: time is partitioned into same-
length, non-overlapping chunks. Each event
belongs to exactly one window.
•sliding: windows have fixed lengths, but are
separated by a time interval (step). Typically the
window interval is a multiplicity of the step. Each
event belongs to a number of windows.
•session: windows have various sizes and are
defined based on data, which should carry some
session identifiers
Fixed Window Movement
● Fixed windows, often referred to as "tumbling windows," are non-overlapping
windows that are defined by a fixed size or time interval.
● Data is grouped into these windows, and each window is processed
independently.
● When the window size or time interval is reached, the window "tumbles" to the
next set of data points.
● Fixed windows are particularly useful for discrete, non-overlapping analyses on
distinct chunks of data.
Sliding Window Movement
● Sliding windows are overlapping windows that continuously move forward
in time by a specified increment, also known as the "slide" or "hop."
● These windows allow for capturing trends and patterns that might span
multiple time intervals.
● As new data arrives, the window slides by the specified step size,
incorporating new data while also retaining some overlap with previous
data.
● This overlap enables more comprehensive analysis of evolving trends.
Trumbling Window Movement
● A “tumbling window” is a collection of rows that are aggregated to produce a fewer
number of output rows, such as “the sum of the last twenty rows” or “the sum of the
rows in the last hour”. One row is returned for every group of rows.
● Tumbling windows are the same as fixed windows
● They are non-overlapping windows defined by a fixed size or time interval.
● The data is partitioned into these windows, and computations are performed
independently on each window. Once the window's boundary is reached, it "tumbles" to
the next set of data points.
Hoping Window Movement
● Hopping windows are a variation of sliding windows that have fixed-size intervals
with regular gaps or hops between them.
● These windows maintain overlap between adjacent windows, similar to sliding
windows, but with a defined gap.
● The hop size determines how frequently the windows move.
● Hopping windows are useful when you want to balance overlap for comprehensive
analysis and efficient processing.
EVENT STREAMING
● Event streaming refers to the practice of processing and analyzing real-time
data as a continuous stream of events.
● These events could be anything from user interactions on a website to
sensor readings in an industrial setting.
● Event streaming systems enable organizations to capture, process, and
respond to events as they happen, allowing for real-time insights, decision-
making, and actions.
Key concepts

● Events: Events are discrete pieces of data that represent occurrences in the
system or the external environment. They can be generated by applications,
sensors, devices, users, or any other source.

● Event Stream: An event stream is a sequence of events that occur over

time. Event streams can vary in volume, velocity, and variety, depending on
the sources and use cases.
Components of Event Streaming
➔ Event Producers: These are the sources that generate and emit events. They can
be applications, sensors, databases, IoT devices, social media feeds, etc.
➔ Message Broker: The message broker serves as an intermediary that accepts
events from producers and delivers them to consumers (subscribers). Popular
message broker technologies include Apache Kafka, RabbitMQ, and Amazon
Kinesis.
➔ Event Consumers: Consumers subscribe to specific event types and receive
events from the message broker. Consumers can be applications, analytics
systems, real-time dashboards, or any component that processes events.
Benefits
● Real-Time Insights: Event streaming enables organizations to gain insights
from data as events occur, leading to better decision-making and faster
responses.
● Continuous Processing: Event streaming supports continuous processing
of data, enabling businesses to react to changing conditions immediately.
● Flexibility: Event streaming platforms can handle a variety of data types and
sources, making them suitable for diverse use cases.
● IoT and Monitoring: Event streaming is ideal for IoT applications and real-
time monitoring of systems, enabling proactive maintenance and rapid
issue resolution.
● Event-Driven Architectures: Event streaming supports event-driven
architectures, which allow applications to react to events without needing to
constantly poll for updates.
● Data Integration: Event streaming can help integrate data from various
sources and systems, providing a unified view of operations.
Challenges

● Data Volume and Velocity: Managing high volumes of events and ensuring low-latency
processing can be challenging.
● Data Quality: Ensuring the accuracy and reliability of events is crucial for making
informed decisions.
● Complexity: Designing, deploying, and maintaining event streaming systems can be
complex, especially for organizations new to this approach.
● Scalability and Resource Management: As event loads increase, scaling the system
while managing resources becomes important.
Architecture
● Event Producers: Event producers are the sources that generate and emit events.
These can be applications, sensors, devices, databases, IoT devices, or any other data
source that generates events. Producers send events to the event streaming platform
for processing.
● Event Ingestion Layer: The event ingestion layer is responsible for receiving events
from producers and preparing them for processing. This layer might include
components like message brokers, event gateways, or data ingestion pipelines.
Popular message broker technologies like Apache Kafka, RabbitMQ, or cloud-based
services like Amazon Kinesis can be used here.
● Event Stream Processing: This layer processes the incoming events in real time. It
involves various components that analyze, filter, transform, enrich, and aggregate the
events. Complex event processing (CEP) engines, stream processing frameworks like
Apache Flink, Apache Spark Streaming, or even custom applications can be used for
this purpose.
Contd.
● State Management: For stateful event processing, a state management
component stores and maintains the state required for processing events over
time. This can be a key-value store, a database, or an in-memory data grid.
Stateful processing is useful for maintaining context and aggregating data over
windows of time.
● Event Consumers: Consumers subscribe to specific event types or topics and
receive processed events for further action or analysis. Consumers can be
applications, microservices, analytics platforms, real-time dashboards, or
downstream systems that require the processed event data.
● Event Schemas and Metadata: Managing event schemas and metadata is crucial
for maintaining data consistency and compatibility as events evolve over time.
Schema registries can be used to enforce schema validation and compatibility
checks.
stream processing

● Stream processing is a data management technique that involves ingesting

a continuous data stream to quickly analyze, filter, transform, or enhance
the data in real-time. Once processed, the data is passed off to an
application, data store, or another stream processing engine.

● Stream processing services and architectures are growing in popularity

because they allow enterprises to combine data feed from various sources.
Sources can include transactions, stock feeds, website analytics, connected
devices, operational databases, weather reports, and other commercial
services.
How does stream processing work?
● Stream processing architectures help simplify the data management tasks
required to consume, process and publish the data securely and reliably. Stream
processing starts by ingesting data from a publish-subscribe service, performs an
action on it and then publishes the results back to the publish-subscribe service
or another data store. These actions can include processes such as analyzing,
filtering, transforming, combining or cleaning data.
● Stream processing commonly connotes the notion of real-time analytics, which is
a relative term. Real time could mean five minutes for a weather analytics app,
millionths of a second for an algorithmic trading app or a billionth of a second for
a physics researcher.
How does stream processing work?
● However, this notion of real-time points to something important about how the stream
processing engine packages up bunches of data for different applications. The stream
processing engine organizes data events arriving in short batches and presents them to
other applications as a continuous feed. This simplifies the logic for application
developers combining and recombining data from various sources and from different
time scales.
Why is stream processing needed?

● Stream processing is needed to:

● Develop adaptive and responsive applications
● Help enterprises improve real-time business analytics
● Facilitate faster decisions
● Accelerate decision-making
● Improve decision-making with increased context
● Improve the user experience
● Create new applications that use a wider variety of data sources

Streaming 101
No ratings yet
Streaming 101
3 pages
Stream Processing
No ratings yet
Stream Processing
33 pages
BDA Lect Data Streams SA
No ratings yet
BDA Lect Data Streams SA
175 pages
Bigdata Unit-Ii
No ratings yet
Bigdata Unit-Ii
33 pages
6 - Streaming Part 1
No ratings yet
6 - Streaming Part 1
44 pages
Bigdata-Mining Data Streams
No ratings yet
Bigdata-Mining Data Streams
19 pages
TRabl StreamProcessing
No ratings yet
TRabl StreamProcessing
79 pages
Unit 3
No ratings yet
Unit 3
30 pages
Bda Mid Ans
No ratings yet
Bda Mid Ans
18 pages
Big Data Analytics Unit-2
100% (1)
Big Data Analytics Unit-2
11 pages
Question Bank
No ratings yet
Question Bank
15 pages
Lecture #7.1 - Introducing Streaming Data
No ratings yet
Lecture #7.1 - Introducing Streaming Data
24 pages
Bigdata Unit II
No ratings yet
Bigdata Unit II
19 pages
Module-2-MINING DATA STREAMS
100% (3)
Module-2-MINING DATA STREAMS
17 pages
Unit 2 BD Mining Data Streams
No ratings yet
Unit 2 BD Mining Data Streams
34 pages
Mining Data Streams
No ratings yet
Mining Data Streams
37 pages
Stream Processing for IT/CSE Students
No ratings yet
Stream Processing for IT/CSE Students
57 pages
Data Analytics Unit 3
No ratings yet
Data Analytics Unit 3
14 pages
SA Unit 1 PPT 2
No ratings yet
SA Unit 1 PPT 2
27 pages
Uint 4miningdatastream 230810162429 9d7c02a7
No ratings yet
Uint 4miningdatastream 230810162429 9d7c02a7
11 pages
Big Data Stream Processing Guide
No ratings yet
Big Data Stream Processing Guide
22 pages
Big Data Analytics - Unit 2 Notes
No ratings yet
Big Data Analytics - Unit 2 Notes
44 pages
Big Data 3rd Unit
No ratings yet
Big Data 3rd Unit
16 pages
Data Stream Unit4
No ratings yet
Data Stream Unit4
20 pages
BDA Lec10
No ratings yet
BDA Lec10
33 pages
Mining Data Streams in Data Analytics Refers To The Process of Extracting Useful Patterns
No ratings yet
Mining Data Streams in Data Analytics Refers To The Process of Extracting Useful Patterns
30 pages
Real-Time Data Stream Applications
No ratings yet
Real-Time Data Stream Applications
18 pages
Big Data IV Nit
No ratings yet
Big Data IV Nit
15 pages
ECS765P - W11 - Stream Processing II
No ratings yet
ECS765P - W11 - Stream Processing II
47 pages
Mining Data Streams
No ratings yet
Mining Data Streams
17 pages
Understanding Data Streams
No ratings yet
Understanding Data Streams
10 pages
Unit-II BDA
No ratings yet
Unit-II BDA
19 pages
Unit-II (Big Data)
No ratings yet
Unit-II (Big Data)
20 pages
Unit 5 (Big Data Analytics)
No ratings yet
Unit 5 (Big Data Analytics)
11 pages
BDA Unit 3
No ratings yet
BDA Unit 3
42 pages
Big Data Analytics - Unit 3
No ratings yet
Big Data Analytics - Unit 3
64 pages
DataStreaming L-4
No ratings yet
DataStreaming L-4
16 pages
Big Data
No ratings yet
Big Data
37 pages
Spark Streaming for Data Engineers
No ratings yet
Spark Streaming for Data Engineers
22 pages
Unit II (Big Data)
No ratings yet
Unit II (Big Data)
19 pages
BigData Mod2
No ratings yet
BigData Mod2
12 pages
Lecture 11
No ratings yet
Lecture 11
31 pages
Bda M4
No ratings yet
Bda M4
57 pages
Chapter 6
No ratings yet
Chapter 6
26 pages
BDS Session 14 StreamWindowing
No ratings yet
BDS Session 14 StreamWindowing
12 pages
02data Stream Processing With Apache Flink
No ratings yet
02data Stream Processing With Apache Flink
61 pages
Data Stream in Data Analytics
No ratings yet
Data Stream in Data Analytics
4 pages
Streaming Data Ingestion v1 181001151203
No ratings yet
Streaming Data Ingestion v1 181001151203
59 pages
Big Data Notes
No ratings yet
Big Data Notes
37 pages
Introduction To Stream Concepts - Stream Data Model and Architecture
100% (1)
Introduction To Stream Concepts - Stream Data Model and Architecture
8 pages
Unit4 2
No ratings yet
Unit4 2
40 pages
4 Building Blocks of A Streaming Data Architecture
No ratings yet
4 Building Blocks of A Streaming Data Architecture
11 pages
Mining&Data Stream Unit-3 - Removed
No ratings yet
Mining&Data Stream Unit-3 - Removed
50 pages
Stream Mining
No ratings yet
Stream Mining
65 pages
Full Download Event Streams in Action Real Time Event Systems With Kafka and Kinesis 1st Edition Alexander Dean Valentin Crettaz PDF
100% (11)
Full Download Event Streams in Action Real Time Event Systems With Kafka and Kinesis 1st Edition Alexander Dean Valentin Crettaz PDF
49 pages
Big Data Topic5 (Streaming Analytics) (Thanh Binh Nguyen) .TextMark
No ratings yet
Big Data Topic5 (Streaming Analytics) (Thanh Binh Nguyen) .TextMark
48 pages
Unit 3
No ratings yet
Unit 3
56 pages
U2 - Hub Spoke
No ratings yet
U2 - Hub Spoke
17 pages
Stream Processing and Website Tracking
No ratings yet
Stream Processing and Website Tracking
2 pages
Bda U2
No ratings yet
Bda U2
44 pages
02 - Unit 2klystron Megatron
No ratings yet
02 - Unit 2klystron Megatron
30 pages
HMC 960
No ratings yet
HMC 960
20 pages
DD 213-1993
No ratings yet
DD 213-1993
16 pages
Java Date and Time
No ratings yet
Java Date and Time
7 pages
Bending and Shear Stresses in Beams
100% (1)
Bending and Shear Stresses in Beams
28 pages
Measures of Dispersion
No ratings yet
Measures of Dispersion
3 pages
Crushing & Screening Solutions
No ratings yet
Crushing & Screening Solutions
34 pages
Literature Review On Box Culvert Design
100% (2)
Literature Review On Box Culvert Design
5 pages
Resource 1 Quadratic Equation
No ratings yet
Resource 1 Quadratic Equation
10 pages
Geotechnical Engineering-I: Special Thanks To: Dr. Muhammad Irfan (U.E.T, LHR)
No ratings yet
Geotechnical Engineering-I: Special Thanks To: Dr. Muhammad Irfan (U.E.T, LHR)
28 pages
Technical Specifications Mysql
No ratings yet
Technical Specifications Mysql
2 pages
Full Manual - Scripting Language Lab
No ratings yet
Full Manual - Scripting Language Lab
29 pages
PB Q and A
No ratings yet
PB Q and A
30 pages
Screenshot 2024-06-22 at 8.38.24 PM
No ratings yet
Screenshot 2024-06-22 at 8.38.24 PM
59 pages
Fenomenologia Do Amor
No ratings yet
Fenomenologia Do Amor
9 pages
Td-Ecowatt: Theoretical Working Point Performance Chart
No ratings yet
Td-Ecowatt: Theoretical Working Point Performance Chart
10 pages
Aesthetic Symbols Wiki My Hero Academia Amino
No ratings yet
Aesthetic Symbols Wiki My Hero Academia Amino
1 page
Math Concepts and Formulas Guide
No ratings yet
Math Concepts and Formulas Guide
4 pages
BMW R1200 GS - Touratech Catalog
100% (1)
BMW R1200 GS - Touratech Catalog
75 pages
ATR72-500 GPWS Graphic Requests
No ratings yet
ATR72-500 GPWS Graphic Requests
9 pages
Signal Classification in SDR Systems
No ratings yet
Signal Classification in SDR Systems
1 page
Chromatography Noise Analysis
No ratings yet
Chromatography Noise Analysis
2 pages
Stress-Strain in Biomechanics
No ratings yet
Stress-Strain in Biomechanics
22 pages
Knowledge Formation, Processing & Organisation (UNIT-V) Practice Set-1 (Library & Information Science)
No ratings yet
Knowledge Formation, Processing & Organisation (UNIT-V) Practice Set-1 (Library & Information Science)
6 pages
001-052 Vibration Damper, Viscous
No ratings yet
001-052 Vibration Damper, Viscous
4 pages
3RD Quarter Stat 102044
No ratings yet
3RD Quarter Stat 102044
54 pages
Grade 3 Science Weather Practice Answers
100% (1)
Grade 3 Science Weather Practice Answers
7 pages
Digital Systems Design
No ratings yet
Digital Systems Design
20 pages
Aspergillus: Structure and Reproduction
No ratings yet
Aspergillus: Structure and Reproduction
13 pages
Thermogravimetric Analysis Guide
No ratings yet
Thermogravimetric Analysis Guide
7 pages
1200i Engl Screen 04x
No ratings yet
1200i Engl Screen 04x
14 pages

Unit 1 Windowing

Uploaded by

Unit 1 Windowing

Uploaded by

Concept of Windowing

● Event Stream: An event stream is a sequence of events that occur over

● Stream processing is a data management technique that involves ingesting

● Stream processing services and architectures are growing in popularity

● Stream processing is needed to:

You might also like