The Stream Processor as the Database - Apache Flink @ Berlin buzzwords

Stephan Ewen
@stephanewen
The Stream Processor
as a Database
Apache Flink

2
Streaming technology is enabling the obvious:
continuous processing on data that is
continuously produced

Apache Flink Stack
3
DataStream API
Stream Processing
DataSet API
Batch Processing
Runtime
Distributed Streaming Data Flow
Libraries
Streaming and batch as first class citizens.

Programs and Dataflows
4
Source
Transformation
Transformation
Sink
val lines: DataStream[String] = env.addSource(new FlinkKafkaConsumer09(…))
val events: DataStream[Event] = lines.map((line) => parse(line))
val stats: DataStream[Statistic] = stream
.keyBy("sensor")
.timeWindow(Time.seconds(5))
.apply(new MyAggregationFunction())
stats.addSink(new RollingSink(path))
Source
[1]
map()
[1]
keyBy()/
window()/
apply()
[1]
Sink
[1]
Source
[2]
map()
[2]
keyBy()/
window()/
apply()
[2]
Streaming
Dataflow

What makes Flink flink?
5
Low latency
High Throughput
Well-behaved
flow control
(back pressure)
Make more sense of data
Works on real-time
and historic data
True
Streaming
Event Time
APIs
Libraries
Stateful
Streaming
Globally consistent
savepoints
Exactly-once semantics
for fault tolerance
Windows &
user-defined state
Flexible windows
(time, count, session, roll-your own)
Complex Event Processing

The (Classic) Use Case
Realtime Counts and Aggregates
6

(Real)Time Series Statistics
7
stream of events realtime statistics

The Architecture
8
collect log analyze serve & store

The Flink Job
9
case class Impressions(id: String, impressions: Long)
val events: DataStream[Event] =
env.addSource(new FlinkKafkaConsumer09(…))
val impressions: DataStream[Impressions] = events
.filter(evt => evt.isImpression)
.map(evt => Impressions(evt.id, evt.numImpressions)
val counts: DataStream[Impressions]= stream
.keyBy("id")
.timeWindow(Time.hours(1))
.sum("impressions")

The Flink Job
10
Kafka
Source
map()
window()/
sum()
Sink
Kafka
Source
map()
window()/
sum()
Sink
filter()
filter()
keyBy()
keyBy()

Putting it all together
11
Periodically (every second)
flush new aggregates
to Redis

How does it perform?
12
Latency Throughput
Number of
Keys

99th Percentile
Latency (sec)
9
8
2
1
Storm 0.10
Flink 0.10
60 80 100 120 140 160 180
Throughput
(1000 events/sec)
Spark Streaming 1.5
Yahoo! Streaming Benchmark
13
Latency
(lower is better)

Extended Benchmark: Throughput
14
Throughput
• 10 Kafka brokers with 2 partitions each
• 10 compute machines (Flink / Storm)
• Xeon E3-1230-V2@3.30GHz CPU (4 cores HT)
• 32 GB RAM (only 8GB allocated to JVMs)
• 10 GigE Ethernet between compute nodes
• 1 GigE Ethernet between Kafka cluster and Flink nodes

Scaling Number of Users
 Yahoo! Streaming Benchmark has 100 keys only
• Every second, only 100 keys are written to
key/value store
• Quite few, compared to many real world use cases
 Tweet impressions: millions keys/hour
• Up to millions of keys updated per second
15
Number of
Keys

The Bottleneck
17
Writes to the key/value
store take too long

Queryable State
20
Optional, and
only at the end of
windows

Queryable State Enablers
 Flink has state as a first class citizen
 State is fault tolerant (exactly once semantics)
 State is partitioned (sharded) together with the operators
that create/update it
 State is continuous (not mini batched)
 State is scalable (e.g., embedded RocksDB state backend)
21

Queryable State Status
 [FLINK-3779] / Pull Request #2051 :
Queryable State Prototype
 Design and implementation under evolution
 Some experiments were using earlier versions of the
implementation
 Exact numbers may differ in final implementation, but order
of magnitude is comparable
22

Queryable State Performance
23

Queryable State: Application View
24
Application only interested in latest realtime results
Application

Queryable State: Application View
25
Application requires both latest realtime- and older results
Database
realtime results older results
Application Query Service
current time
windows
past time
windows

Apache Flink Architecture Review
26

Queryable State: Implementation
27
Query Client
State
Registry
window()/
sum()
Job Manager Task Manager
ExecutionGraph
State Location Server
deploy
status
Query: /job/operation/state-name/key
State
Registry
window()/
sum()
Task Manager
(1) Get location of "key-partition"
for "operator" of" job"
(2) Look up
location
(3)
Respond location
(4) Query
state-name and key
local
state
register

Contrasting with key/value stores
28

Turning the Database Inside Out
 Cf. Martin Kleppman's talks on
re-designing data warehousing
based on log-centric processing
 This view angle picks up some of
these concepts
 Queryable State in Apache Flink = (Turning DB inside out)++
29

Write Path in Cassandra (simplified)
30
From the Apache Cassandra docs

31
First step is durable write to the commit log
(in all databases that offer strong durability)
Memtable is a re-computable
view of the commit log
actions and the persistent SSTables)

32
First step is durable write to the commit log
(in all databases that offer strong durability)
Memtable is a re-computable
view of the commit log
actions and the persistent SSTables)
Replication to Quorum
before write is acknowledged

Durability of Queryable state
33
snapshot
state

Durability of Queryable state
34
Event sequence is the ground truth and
is durably stored in the log already
Queryable state
re-computable
from checkpoint and log
snapshot
state Snapshot replication
can happen in the
background

Performance of Flink's State
35
window()/
sum()
Source /
filter() /
map()
State index
(e.g., RocksDB)
Events are persistent
and ordered (per partition / key)
in the log (e.g., Apache Kafka)
Events flow without replication or synchronous writes

36
window()/
sum()
Source /
filter() /
map()
Trigger checkpoint Inject checkpoint barrier

37
window()/
sum()
Source /
filter() /
map()
Take state snapshot RocksDB:
Trigger state
copy-on-write

38
window()/
sum()
Source /
filter() /
map()
Persist state snapshots Durably persist
snapshots
asynchronously
Processing pipeline continues

Takeaways
 Streaming applications are often not bound by the stream
processor itself. Cross system interaction is frequently biggest
bottleneck
 Queryable state mitigates a big bottleneck: Communication
with external key/value stores to publish realtime results
 Apache Flink's sophisticated support for state makes this
possible
40

Takeaways
Performance of Queryable State
 Data persistence is fast with logs (Apache Kafka)
• Append only, and streaming replication
 Computed state is fast with local data structures and no
synchronous replication (Apache Flink)
 Flink's checkpoint method makes computed state persistent
with low overhead
41

Go Flink!
42
Low latency
High Throughput
Well-behaved
flow control
(back pressure)
Make more sense of data
Works on real-time
and historic data
True
Streaming
Event Time
APIs
Libraries
Stateful
Streaming
Globally consistent
savepoints
Exactly-once semantics
for fault tolerance
Windows &
user-defined state
Flexible windows
(time, count, session, roll-your own)
Complex Event Processing

Flink Forward 2016, Berlin
Submission deadline: June 30, 2016
Early bird deadline: July 15, 2016
www.flink-forward.org

We are hiring!
data-artisans.com/careers

The Stream Processor as the Database - Apache Flink @ Berlin buzzwords

More Related Content

What's hot

Viewers also liked

Similar to The Stream Processor as the Database - Apache Flink @ Berlin buzzwords

Recently uploaded

The Stream Processor as the Database - Apache Flink @ Berlin buzzwords