SlideShare a Scribd company logo
Timing is Everything
Understanding Event-Time Processing in Flink SQL
Sharon Xie,Flink Babysitter
Founding Engineer @ Decodable
Agenda
- Flink’s event time processing model
- Use cases in Flink SQL
What is Apache Flink
Stateful Computations over Data Streams
● Highly Scalable
● Exactly-once processing semantics
● Event time semantics and watermarks
● Layered APIs: Streaming SQL (easy to use) ↔ DataStream (expressive)
Context
● Flink streaming mode
● Flink SQL
● Current Flink version - 1.18 1.19
When is event time used?
● Monitoring and alerting
● Time-based compute or analytics
Event
Immutable record containing
the detail of something that
happened at some point in
time.
Time in Flink
Event Time
● The time at which the event happened
Processing Time
● The time at which the event is observed by
Flink
Event time vs Processing Time
● Event time is <
processing time
● The lag is arbitrary
● Events can be
out-of-order
Challenges
How do you know
when all of the events
are received for a
particular window?
Watermark
● Measures the progress of event time
● Tracks the maximum event time seen
● Indicates the completeness of the event time
Timing is Everything: Understanding Event-Time Processing in Flink SQL
Create table sensors (
id bigint,
`value` integer
_time timestamp(3),
watermark for _time as _time - interval '3' minutes
) WITH (
'scan.watermark.emit.strategy'='on-event',
...
);
Define Watermark
Watermark Generation (on-event)
There is a
window that
ends at 1:05.
When can the
window close?
Quiz
There is a
window that
ends at 1:05.
When can the
window close?
Quiz - Answer
Multiple sources/partitions
Idle source/partition
● If a partition is idle (no events), the watermark
will not advance
● No result will be produced
● Solutions
○ Configure source idle timeout
■ set table.exec.source.idle-timeout = 1m
○ Balance the partitions
Implications
● Tradeoff between Correctness and Latency
● Latency
○ Results of a window is only seen after the window
closes
● Correctness
○ Late arriving events are discarded after the window
is closed
Correctness VS Latency
In general:
Alerting and monitoring: latency
Timely analytics: correctness
But…can I have both?
● Yes! Flink can process & emit
“updates” (changelog)
● No watermark is needed
● Downstream system must support
“updates”
● It’s costly - need to store global state
Trade-offs
Quick Summary
● Timely response & analytics are based on event time
● Flink uses watermark to account for out-of-order
events
● Watermark allows trade-off between accuracy and
latency
Flink SQL Event-time Usage
● Windowed Aggregations
● Window join
● Temporal join
Windowing
Put unbounded events into finite-sized temporal
buckets, over which computation is applied.
Window Types
● Tumble / Fixed
● Hop / Sliding
● Cumulative
● Session
Window Types - Tumble/Fixed
Ref: https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/dev/table/sql/queries/window-tvf/
● Fixed window
size
● No overlapping
● Each event
belongs to
exactly 1 window
Flink SQL (Window TVF)
● TVF - Table-Valued Function
● Returns a new relation with all columns of original stream and
additional 3 columns:
○ window_start, window_end, window_time
Example - Tumble Window
Window Types - Hop/Slide
Ref: https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/dev/table/sql/queries/window-tvf/
● Fixed
window size
● Overlaps
when slide <
window size
Window Types - Cumulative
Ref: https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/dev/table/sql/queries/window-tvf/
● Similar to tumble
window, but with
early firing at the
defined interval
● Defined by max
window size and
window step
Window Types - Session
😃 Supported in Flink
1.19
● A new window is
created when the
consecutive event
time > session gap
Ref: https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/dev/table/sql/queries/window-tvf/#session
Window Join
● A window join adds the dimension of time into
the join criteria themselves.
● Use case: compute click-through events
Example - tumble window join
Example - hop window join
Temporal Join
● Enrich a stream with the value of the joined
record at the event time.
● Example: Continuously computing the price for
each order based on the exchange rate
happened when the order is placed
Example - temporal join
Ref: https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/dev/table/sql/queries/joins/#temporal-joins
Timing is Everything: Understanding Event-Time Processing in Flink SQL
Summary
● Event time is essential for timely response and
analytics
● Watermark and windowing are the key concepts
● Flink SQL simplifies event time processing
Timing is Everything: Understanding Event-Time Processing in Flink SQL
Timing is Everything: Understanding Event-Time Processing in Flink SQL
Q&A
@sharon_rxie

More Related Content

Similar to Timing is Everything: Understanding Event-Time Processing in Flink SQL (20)

PDF
Big Data Warsaw
Maximilian Michels
 
PDF
Stream processing with Apache Flink - Maximilian Michels Data Artisans
Evention
 
PDF
Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing ...
Ververica
 
PDF
Apache flink
pranay kumar
 
PPTX
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
StreamNative
 
PPTX
Flink-window-function-basic
Preetdeep Kumar
 
PDF
Apache Flink
Mike Frampton
 
PPTX
Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...
Ververica
 
PDF
Stream processing with Apache Flink (Timo Walther - Ververica)
KafkaZone
 
PDF
Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...
Timo Walther
 
PDF
Why Serverless Flink Matters - Blazing Fast Stream Processing Made Scalable
HostedbyConfluent
 
PDF
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Ververica
 
PPTX
Data Stream Processing with Apache Flink
Fabian Hueske
 
PDF
Unlocking the Power of Apache Flink: An Introduction in 4 Acts
HostedbyConfluent
 
PDF
Introduction To Flink
Knoldus Inc.
 
PDF
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Big Data Spain
 
PPTX
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
Ververica
 
PPTX
Flink 0.10 - Upcoming Features
Aljoscha Krettek
 
PDF
When Streaming Needs Batch With Konstantin Knauf | Current 2022
HostedbyConfluent
 
PDF
Apache Flink Training Workshop @ HadoopCon2016 - #1 System Overview
Apache Flink Taiwan User Group
 
Big Data Warsaw
Maximilian Michels
 
Stream processing with Apache Flink - Maximilian Michels Data Artisans
Evention
 
Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing ...
Ververica
 
Apache flink
pranay kumar
 
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
StreamNative
 
Flink-window-function-basic
Preetdeep Kumar
 
Apache Flink
Mike Frampton
 
Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...
Ververica
 
Stream processing with Apache Flink (Timo Walther - Ververica)
KafkaZone
 
Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...
Timo Walther
 
Why Serverless Flink Matters - Blazing Fast Stream Processing Made Scalable
HostedbyConfluent
 
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Ververica
 
Data Stream Processing with Apache Flink
Fabian Hueske
 
Unlocking the Power of Apache Flink: An Introduction in 4 Acts
HostedbyConfluent
 
Introduction To Flink
Knoldus Inc.
 
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Big Data Spain
 
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
Ververica
 
Flink 0.10 - Upcoming Features
Aljoscha Krettek
 
When Streaming Needs Batch With Konstantin Knauf | Current 2022
HostedbyConfluent
 
Apache Flink Training Workshop @ HadoopCon2016 - #1 System Overview
Apache Flink Taiwan User Group
 

More from HostedbyConfluent (20)

PDF
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
HostedbyConfluent
 
PDF
Renaming a Kafka Topic | Kafka Summit London
HostedbyConfluent
 
PDF
Evolution of NRT Data Ingestion Pipeline at Trendyol
HostedbyConfluent
 
PDF
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
HostedbyConfluent
 
PDF
Exactly-once Stream Processing with Arroyo and Kafka
HostedbyConfluent
 
PDF
Fish Plays Pokemon | Kafka Summit London
HostedbyConfluent
 
PDF
Tiered Storage 101 | Kafla Summit London
HostedbyConfluent
 
PDF
Building a Self-Service Stream Processing Portal: How And Why
HostedbyConfluent
 
PDF
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
HostedbyConfluent
 
PDF
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
HostedbyConfluent
 
PDF
Navigating Private Network Connectivity Options for Kafka Clusters
HostedbyConfluent
 
PDF
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
HostedbyConfluent
 
PDF
Explaining How Real-Time GenAI Works in a Noisy Pub
HostedbyConfluent
 
PDF
TL;DR Kafka Metrics | Kafka Summit London
HostedbyConfluent
 
PDF
A Window Into Your Kafka Streams Tasks | KSL
HostedbyConfluent
 
PDF
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
HostedbyConfluent
 
PDF
Data Contracts Management: Schema Registry and Beyond
HostedbyConfluent
 
PDF
Code-First Approach: Crafting Efficient Flink Apps
HostedbyConfluent
 
PDF
Debezium vs. the World: An Overview of the CDC Ecosystem
HostedbyConfluent
 
PDF
Beyond Tiered Storage: Serverless Kafka with No Local Disks
HostedbyConfluent
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
HostedbyConfluent
 
Renaming a Kafka Topic | Kafka Summit London
HostedbyConfluent
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
HostedbyConfluent
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
HostedbyConfluent
 
Exactly-once Stream Processing with Arroyo and Kafka
HostedbyConfluent
 
Fish Plays Pokemon | Kafka Summit London
HostedbyConfluent
 
Tiered Storage 101 | Kafla Summit London
HostedbyConfluent
 
Building a Self-Service Stream Processing Portal: How And Why
HostedbyConfluent
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
HostedbyConfluent
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
HostedbyConfluent
 
Navigating Private Network Connectivity Options for Kafka Clusters
HostedbyConfluent
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
HostedbyConfluent
 
Explaining How Real-Time GenAI Works in a Noisy Pub
HostedbyConfluent
 
TL;DR Kafka Metrics | Kafka Summit London
HostedbyConfluent
 
A Window Into Your Kafka Streams Tasks | KSL
HostedbyConfluent
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
HostedbyConfluent
 
Data Contracts Management: Schema Registry and Beyond
HostedbyConfluent
 
Code-First Approach: Crafting Efficient Flink Apps
HostedbyConfluent
 
Debezium vs. the World: An Overview of the CDC Ecosystem
HostedbyConfluent
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
HostedbyConfluent
 
Ad

Recently uploaded (20)

PPTX
Smarter Governance with AI: What Every Board Needs to Know
OnBoard
 
PDF
5 Things to Consider When Deploying AI in Your Enterprise
Safe Software
 
PPTX
Paycifi - Programmable Trust_Breakfast_PPTXT
FinTech Belgium
 
PDF
99 Bottles of Trust on the Wall — Operational Principles for Trust in Cyber C...
treyka
 
PDF
My Journey from CAD to BIM: A True Underdog Story
Safe Software
 
PPTX
Enabling the Digital Artisan – keynote at ICOCI 2025
Alan Dix
 
PDF
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
Edge AI and Vision Alliance
 
PPTX
Mastering Authorization: Integrating Authentication and Authorization Data in...
Hitachi, Ltd. OSS Solution Center.
 
PDF
Next level data operations using Power Automate magic
Andries den Haan
 
PDF
''Taming Explosive Growth: Building Resilience in a Hyper-Scaled Financial Pl...
Fwdays
 
PDF
Hello I'm "AI" Your New _________________
Dr. Tathagat Varma
 
PDF
Simplify Your FME Flow Setup: Fault-Tolerant Deployment Made Easy with Packer...
Safe Software
 
PDF
Plugging AI into everything: Model Context Protocol Simplified.pdf
Abati Adewale
 
PPTX
MARTSIA: A Tool for Confidential Data Exchange via Public Blockchain - Poster...
Michele Kryston
 
PDF
Darley - FIRST Copenhagen Lightning Talk (2025-06-26) Epochalypse 2038 - Time...
treyka
 
PDF
Unlocking FME Flow’s Potential: Architecture Design for Modern Enterprises
Safe Software
 
PDF
LLM Search Readiness Audit - Dentsu x SEO Square - June 2025.pdf
Nick Samuel
 
PDF
Optimizing the trajectory of a wheel loader working in short loading cycles
Reno Filla
 
PDF
Kubernetes - Architecture & Components.pdf
geethak285
 
PDF
Understanding AI Optimization AIO, LLMO, and GEO
CoDigital
 
Smarter Governance with AI: What Every Board Needs to Know
OnBoard
 
5 Things to Consider When Deploying AI in Your Enterprise
Safe Software
 
Paycifi - Programmable Trust_Breakfast_PPTXT
FinTech Belgium
 
99 Bottles of Trust on the Wall — Operational Principles for Trust in Cyber C...
treyka
 
My Journey from CAD to BIM: A True Underdog Story
Safe Software
 
Enabling the Digital Artisan – keynote at ICOCI 2025
Alan Dix
 
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
Edge AI and Vision Alliance
 
Mastering Authorization: Integrating Authentication and Authorization Data in...
Hitachi, Ltd. OSS Solution Center.
 
Next level data operations using Power Automate magic
Andries den Haan
 
''Taming Explosive Growth: Building Resilience in a Hyper-Scaled Financial Pl...
Fwdays
 
Hello I'm "AI" Your New _________________
Dr. Tathagat Varma
 
Simplify Your FME Flow Setup: Fault-Tolerant Deployment Made Easy with Packer...
Safe Software
 
Plugging AI into everything: Model Context Protocol Simplified.pdf
Abati Adewale
 
MARTSIA: A Tool for Confidential Data Exchange via Public Blockchain - Poster...
Michele Kryston
 
Darley - FIRST Copenhagen Lightning Talk (2025-06-26) Epochalypse 2038 - Time...
treyka
 
Unlocking FME Flow’s Potential: Architecture Design for Modern Enterprises
Safe Software
 
LLM Search Readiness Audit - Dentsu x SEO Square - June 2025.pdf
Nick Samuel
 
Optimizing the trajectory of a wheel loader working in short loading cycles
Reno Filla
 
Kubernetes - Architecture & Components.pdf
geethak285
 
Understanding AI Optimization AIO, LLMO, and GEO
CoDigital
 
Ad

Timing is Everything: Understanding Event-Time Processing in Flink SQL

  • 1. Timing is Everything Understanding Event-Time Processing in Flink SQL Sharon Xie,Flink Babysitter Founding Engineer @ Decodable
  • 2. Agenda - Flink’s event time processing model - Use cases in Flink SQL
  • 3. What is Apache Flink Stateful Computations over Data Streams ● Highly Scalable ● Exactly-once processing semantics ● Event time semantics and watermarks ● Layered APIs: Streaming SQL (easy to use) ↔ DataStream (expressive)
  • 4. Context ● Flink streaming mode ● Flink SQL ● Current Flink version - 1.18 1.19
  • 5. When is event time used? ● Monitoring and alerting ● Time-based compute or analytics
  • 6. Event Immutable record containing the detail of something that happened at some point in time.
  • 7. Time in Flink Event Time ● The time at which the event happened Processing Time ● The time at which the event is observed by Flink
  • 8. Event time vs Processing Time ● Event time is < processing time ● The lag is arbitrary ● Events can be out-of-order
  • 9. Challenges How do you know when all of the events are received for a particular window?
  • 10. Watermark ● Measures the progress of event time ● Tracks the maximum event time seen ● Indicates the completeness of the event time
  • 12. Create table sensors ( id bigint, `value` integer _time timestamp(3), watermark for _time as _time - interval '3' minutes ) WITH ( 'scan.watermark.emit.strategy'='on-event', ... ); Define Watermark
  • 14. There is a window that ends at 1:05. When can the window close? Quiz
  • 15. There is a window that ends at 1:05. When can the window close? Quiz - Answer
  • 17. Idle source/partition ● If a partition is idle (no events), the watermark will not advance ● No result will be produced ● Solutions ○ Configure source idle timeout ■ set table.exec.source.idle-timeout = 1m ○ Balance the partitions
  • 18. Implications ● Tradeoff between Correctness and Latency ● Latency ○ Results of a window is only seen after the window closes ● Correctness ○ Late arriving events are discarded after the window is closed
  • 19. Correctness VS Latency In general: Alerting and monitoring: latency Timely analytics: correctness
  • 20. But…can I have both? ● Yes! Flink can process & emit “updates” (changelog) ● No watermark is needed ● Downstream system must support “updates” ● It’s costly - need to store global state
  • 22. Quick Summary ● Timely response & analytics are based on event time ● Flink uses watermark to account for out-of-order events ● Watermark allows trade-off between accuracy and latency
  • 23. Flink SQL Event-time Usage ● Windowed Aggregations ● Window join ● Temporal join
  • 24. Windowing Put unbounded events into finite-sized temporal buckets, over which computation is applied.
  • 25. Window Types ● Tumble / Fixed ● Hop / Sliding ● Cumulative ● Session
  • 26. Window Types - Tumble/Fixed Ref: https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/dev/table/sql/queries/window-tvf/ ● Fixed window size ● No overlapping ● Each event belongs to exactly 1 window
  • 27. Flink SQL (Window TVF) ● TVF - Table-Valued Function ● Returns a new relation with all columns of original stream and additional 3 columns: ○ window_start, window_end, window_time
  • 29. Window Types - Hop/Slide Ref: https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/dev/table/sql/queries/window-tvf/ ● Fixed window size ● Overlaps when slide < window size
  • 30. Window Types - Cumulative Ref: https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/dev/table/sql/queries/window-tvf/ ● Similar to tumble window, but with early firing at the defined interval ● Defined by max window size and window step
  • 31. Window Types - Session 😃 Supported in Flink 1.19 ● A new window is created when the consecutive event time > session gap Ref: https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/dev/table/sql/queries/window-tvf/#session
  • 32. Window Join ● A window join adds the dimension of time into the join criteria themselves. ● Use case: compute click-through events
  • 33. Example - tumble window join
  • 34. Example - hop window join
  • 35. Temporal Join ● Enrich a stream with the value of the joined record at the event time. ● Example: Continuously computing the price for each order based on the exchange rate happened when the order is placed
  • 36. Example - temporal join Ref: https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/dev/table/sql/queries/joins/#temporal-joins
  • 38. Summary ● Event time is essential for timely response and analytics ● Watermark and windowing are the key concepts ● Flink SQL simplifies event time processing