Robust
Stream Processing
with
Apache Flink
Jamie Grier
@jamiegrier
jamie@data-artisans.com
Who am I?
• Director of Applications Engineering at data
Artisans
• Previously working on streaming computation at
Twitter, Gnip and Boulder Imaging
• Involved in various kinds of stream processing for
about a decade
• High-speed video, social media streaming, general
frameworks for stream processing
Overview
• What is Apache Flink?
• What is Stateful Stream Processing?
• Windowed computation over streams
• Robust Time Handling (Event Time vs Processing Time)
• Robust Failure Handling
• Robust Planned Downtime Handling
• Robust Reprocessing
What is
Apache Flink?
Apache Flink is an open source platform for distributed
stream and batch data processing.
What is
Apache Flink?
Stream Processing
Your
Code
Data Stream Data Stream
Stateful
Stream Processing
Your
Code
Data Stream Data Stream
State
More Complex
Example
Kafka
Files
Rabb
itMQ
Filter
Map
Join /
Sum
Influx
DB
C*
Distributed and Parallel
Deployment
Kafka
Files
Rabb
it
MQ
Filter
Parse
Join /
Sum
Influx
DB
C*
Robust Stream Processing
with Apache Flink
Code Example!
Windowing
Processing Time
vs
Event Time
Windowing in Processing
Time
0 1 2 34 56 7 8 9 0 1 2 3 4 5 6 7 8 9
Processing Time
Event Time
Windowing in Event
Time
0 1 2 34 56 7 8 9 0 1 2 3 4 5 6 7 8 9
Event Time
Processing Time = Errors!
Event Time = Accuracy
Failure Handling
Downtime Handling
Data Reprocessing
We’re Hiring!
http://data-artisans.com/careers
Flink Forward 2016, Berlin
Submission deadline: June 30, 2016
Early bird deadline: July 15, 2016
www.flink-forward.org
Questions?
Thanks!

Jamie Grier - Robust Stream Processing with Apache Flink