SlideShare a Scribd company logo
Notions of Time
Aljoscha Krettek
aljoscha@apache.org
@aljoscha
How Apache Flink™ Handles Time and Windows
Adventures in Timespace
3
Why Windows*?
*not Microsoft Windows…
4
That’s why…
5
StreamingBatch
6
In Streaming:
Arriving data never stops!
7
Solution:
Put elements into buckets,
these are called windows
8
Window (5 min)
Count #Hashtags
Just saw #Trump on
#CNN, super cool. :D
Trump: 2394
Cheese: 12984
Money: 42
9
What I didn’t mention
• tweets have a timestamp,
their event time
• tweets from across the globe
arrive with delay
=> tweets with different
timestamps arrive out-of-order
Window (5 min)
Count #Hashtags
12:34 (13.10.2015):
Just saw #Trump on
#CNN, super cool. :D
Trump: 2394
Cheese: 12984
Money: 42
These arrive with
3 minutes slack
Form windows based
on processing time
of the machine.
Processing Time != Event Time
10
11
Why do people use this?
• easy to implement
• low latency
• this is what systems give you
(Spark Streaming, Apex,
Samza, Storm)*
*not Google Cloud Dataflow
12
Lets look at a more
complex example.
13
Window (5 min)
Correlate Tweets
and News
something...
These still have 3 min slack.
These have 8 min slack.
12:33 (13.10.2015):
Donald Trump speaks
at Cheese conference.
Processing Time != Event Time
Processing Time != Event Time
=> Mismatch in the
timespace continuum
15
Use cases
• out-of-order elements
• sources with delay
• recovery/fault-tolerance
• “catching up” with a stream
Who does it?
• Google Cloud Dataflow
• Apache Flink
16
How can we do this?
17
We need a
Global Clock
that runs on
event time
instead of
processing time.
18
This is a source
This is our window operator
1
0
0
0 0
1
2
1
2
1
1
This is the current event-time time
2
2
2
2
2
This is a watermark.
19
Now, show me the API!
20
StreamExecutionEnvironment env =
StreamExecutionEnvironment.getExecutionEnvironment();
env.setStreamTimeCharacteristic(ProcessingTime);
DataStream<Tweet> text = env.addSource(new TwitterSrc());
DataStream<Tuple2<String, Integer>> counts = text
.flatMap(new ExtractHashtags())
.keyBy(“name”)
.timeWindow(Time.of(5, MINUTES)
.apply(new HashtagCounter());
Processing Time
21
Event Time
StreamExecutionEnvironment env =
StreamExecutionEnvironment.getExecutionEnvironment();
env.setStreamTimeCharacteristic(EventTime);
DataStream<Tweet> text = env.addSource(new TwitterSrc());
text = text.assignTimestamps(new MyTimestampExtractor());
DataStream<Tuple2<String, Integer>> counts = text
.flatMap(new ExtractHashtags())
.keyBy(“name”)
.timeWindow(Time.of(5, MINUTES)
.apply(new HashtagCounter());
22
TL;DL*
• stream data is infinite
• windows are helpful
• event-time != processing time
• watermarks to the rescue
• Flink can do it
*too long, didn’t listen
flink.apache.org
@ApacheFlink
32-35
24-27
20-23
8-110-3
4-7
24
Tumbling Windows of 4 Seconds
123412
4
59
9 0
20
20
22212326323321
26
353642

More Related Content

Similar to Adventures in Timespace - How Apache Flink Handles Time and Windows (20)

PDF
Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing ...
Ververica
 
PPTX
Flink meetup
Christos Hadjinikolis
 
PDF
Apache Flink Training Workshop @ HadoopCon2016 - #4 Advanced Stream Processing
Apache Flink Taiwan User Group
 
PPTX
Apache FLINK.pptx
Ihff1
 
PDF
Big Data Warsaw
Maximilian Michels
 
PDF
Stream processing with Apache Flink - Maximilian Michels Data Artisans
Evention
 
PPTX
Flink. Pure Streaming
Indizen Technologies
 
PDF
Stream Processing with Apache Flink
C4Media
 
PDF
Log Event Stream Processing In Flink Way
George T. C. Lai
 
PPTX
Data Stream Processing with Apache Flink
Fabian Hueske
 
PPTX
Flink-window-function-basic
Preetdeep Kumar
 
PDF
Apache Flink @ Tel Aviv / Herzliya Meetup
Robert Metzger
 
PDF
Apache flink
pranay kumar
 
PDF
Timing is Everything: Understanding Event-Time Processing in Flink SQL
HostedbyConfluent
 
PDF
Making Sense of Apache Flink: A Fearless Introduction
HostedbyConfluent
 
PDF
Strumienie i wiewiórka
Dawid Wysakowicz
 
PDF
Understanding time in structured streaming
datamantra
 
PDF
Unlocking the Power of Apache Flink: An Introduction in 4 Acts
HostedbyConfluent
 
PPTX
Apache Flink(tm) - A Next-Generation Stream Processor
Aljoscha Krettek
 
PPTX
Flink System Overview
Timo Walther
 
Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing ...
Ververica
 
Flink meetup
Christos Hadjinikolis
 
Apache Flink Training Workshop @ HadoopCon2016 - #4 Advanced Stream Processing
Apache Flink Taiwan User Group
 
Apache FLINK.pptx
Ihff1
 
Big Data Warsaw
Maximilian Michels
 
Stream processing with Apache Flink - Maximilian Michels Data Artisans
Evention
 
Flink. Pure Streaming
Indizen Technologies
 
Stream Processing with Apache Flink
C4Media
 
Log Event Stream Processing In Flink Way
George T. C. Lai
 
Data Stream Processing with Apache Flink
Fabian Hueske
 
Flink-window-function-basic
Preetdeep Kumar
 
Apache Flink @ Tel Aviv / Herzliya Meetup
Robert Metzger
 
Apache flink
pranay kumar
 
Timing is Everything: Understanding Event-Time Processing in Flink SQL
HostedbyConfluent
 
Making Sense of Apache Flink: A Fearless Introduction
HostedbyConfluent
 
Strumienie i wiewiórka
Dawid Wysakowicz
 
Understanding time in structured streaming
datamantra
 
Unlocking the Power of Apache Flink: An Introduction in 4 Acts
HostedbyConfluent
 
Apache Flink(tm) - A Next-Generation Stream Processor
Aljoscha Krettek
 
Flink System Overview
Timo Walther
 

More from Aljoscha Krettek (15)

PPTX
Talk Python To Me: Stream Processing in your favourite Language with Beam on ...
Aljoscha Krettek
 
PPTX
The Evolution of (Open Source) Data Processing
Aljoscha Krettek
 
PPTX
Apache Flink and what it is used for
Aljoscha Krettek
 
PPTX
The Past, Present, and Future of Apache Flink®
Aljoscha Krettek
 
PPTX
(Past), Present, and Future of Apache Flink
Aljoscha Krettek
 
PPTX
Python Streaming Pipelines with Beam on Flink
Aljoscha Krettek
 
PPTX
The Past, Present, and Future of Apache Flink
Aljoscha Krettek
 
PPTX
Robust stream processing with Apache Flink
Aljoscha Krettek
 
PDF
Unified stateful big data processing in Apache Beam (incubating)
Aljoscha Krettek
 
PPTX
Stream processing for the practitioner: Blueprints for common stream processi...
Aljoscha Krettek
 
PPTX
Advanced Flink Training - Design patterns for streaming applications
Aljoscha Krettek
 
PPTX
Apache Flink - A Stream Processing Engine
Aljoscha Krettek
 
PPTX
Flink 0.10 - Upcoming Features
Aljoscha Krettek
 
PPTX
Data Analysis with Apache Flink (Hadoop Summit, 2015)
Aljoscha Krettek
 
PPTX
Apache Flink Hands-On
Aljoscha Krettek
 
Talk Python To Me: Stream Processing in your favourite Language with Beam on ...
Aljoscha Krettek
 
The Evolution of (Open Source) Data Processing
Aljoscha Krettek
 
Apache Flink and what it is used for
Aljoscha Krettek
 
The Past, Present, and Future of Apache Flink®
Aljoscha Krettek
 
(Past), Present, and Future of Apache Flink
Aljoscha Krettek
 
Python Streaming Pipelines with Beam on Flink
Aljoscha Krettek
 
The Past, Present, and Future of Apache Flink
Aljoscha Krettek
 
Robust stream processing with Apache Flink
Aljoscha Krettek
 
Unified stateful big data processing in Apache Beam (incubating)
Aljoscha Krettek
 
Stream processing for the practitioner: Blueprints for common stream processi...
Aljoscha Krettek
 
Advanced Flink Training - Design patterns for streaming applications
Aljoscha Krettek
 
Apache Flink - A Stream Processing Engine
Aljoscha Krettek
 
Flink 0.10 - Upcoming Features
Aljoscha Krettek
 
Data Analysis with Apache Flink (Hadoop Summit, 2015)
Aljoscha Krettek
 
Apache Flink Hands-On
Aljoscha Krettek
 
Ad

Recently uploaded (20)

PPTX
Natural Language Processing Datascience.pptx
Anandh798253
 
PPTX
Daily, Weekly, Monthly Report MTC March 2025.pptx
PanjiDewaPamungkas1
 
PPSX
PPT1_CB_VII_CS_Ch3_FunctionsandChartsinCalc.ppsx
animaroy81
 
PDF
TESDA License NC II PC Operations TESDA, Office Productivity
MELJUN CORTES
 
PPT
intro to AI dfg fgh gggdrhre ghtwhg ewge
traineramrsiam
 
PPTX
Krezentios memories in college data.pptx
notknown9
 
PDF
5991-5857_Agilent_MS_Theory_EN (1).pdf. pdf
NohaSalah45
 
PDF
CT-2-Ancient ancient accept-Criticism.pdf
DepartmentofEnglishC1
 
PPTX
Indigo dyeing Presentation (2).pptx as dye
shreeroop1335
 
PPTX
RESEARCH-FINAL-GROUP-3, about the final .pptx
gwapokoha1
 
PDF
IT GOVERNANCE 4-2 - Information System Security (1).pdf
mdirfanuddin1322
 
PPTX
MENU-DRIVEN PROGRAM ON ARUNACHAL PRADESH.pptx
manvi200807
 
PDF
Predicting Titanic Survival Presentation
praxyfarhana
 
PPTX
Project_Update_Summary.for the use from PM
Odysseas Lekatsas
 
PPTX
Mynd company all details what they are doing a
AniketKadam40952
 
DOCX
COT Feb 19, 2025 DLLgvbbnnjjjjjj_Digestive System and its Functions_PISA_CBA....
kayemorales1105
 
PPTX
Presentation.pptx hhgihyugyygyijguuffddfffffff
abhiruppal2007
 
PDF
TCU EVALUATION FACULTY TCU Taguig City 1st Semester 2017-2018
MELJUN CORTES
 
PDF
SaleServicereport and SaleServicereport
2251330007
 
PDF
GOOGLE ADS (1).pdf THE ULTIMATE GUIDE TO
kushalkeshwanisou
 
Natural Language Processing Datascience.pptx
Anandh798253
 
Daily, Weekly, Monthly Report MTC March 2025.pptx
PanjiDewaPamungkas1
 
PPT1_CB_VII_CS_Ch3_FunctionsandChartsinCalc.ppsx
animaroy81
 
TESDA License NC II PC Operations TESDA, Office Productivity
MELJUN CORTES
 
intro to AI dfg fgh gggdrhre ghtwhg ewge
traineramrsiam
 
Krezentios memories in college data.pptx
notknown9
 
5991-5857_Agilent_MS_Theory_EN (1).pdf. pdf
NohaSalah45
 
CT-2-Ancient ancient accept-Criticism.pdf
DepartmentofEnglishC1
 
Indigo dyeing Presentation (2).pptx as dye
shreeroop1335
 
RESEARCH-FINAL-GROUP-3, about the final .pptx
gwapokoha1
 
IT GOVERNANCE 4-2 - Information System Security (1).pdf
mdirfanuddin1322
 
MENU-DRIVEN PROGRAM ON ARUNACHAL PRADESH.pptx
manvi200807
 
Predicting Titanic Survival Presentation
praxyfarhana
 
Project_Update_Summary.for the use from PM
Odysseas Lekatsas
 
Mynd company all details what they are doing a
AniketKadam40952
 
COT Feb 19, 2025 DLLgvbbnnjjjjjj_Digestive System and its Functions_PISA_CBA....
kayemorales1105
 
Presentation.pptx hhgihyugyygyijguuffddfffffff
abhiruppal2007
 
TCU EVALUATION FACULTY TCU Taguig City 1st Semester 2017-2018
MELJUN CORTES
 
SaleServicereport and SaleServicereport
2251330007
 
GOOGLE ADS (1).pdf THE ULTIMATE GUIDE TO
kushalkeshwanisou
 
Ad

Adventures in Timespace - How Apache Flink Handles Time and Windows

Editor's Notes

  • #11: Slack is the amount of time by which elements arrive late.
  • #16: Catching up, for example with elements in Kafka, you would still want correct windows based on timestamp in elements.