Distributed Real-Time Stream Processing: Why and How 2.0

Petr Zapletal @petr_zapletal
#ScalaDays
@cakesolutions
Distributed Real-Time Stream Processing:
Why and How

Agenda
● Motivation
● Stream Processing
● Available Frameworks
● Systems Comparison
● Recommendations

The Data Deluge
● New Sources and New Use Cases
● 8 Zettabytes (1 ZB = 1 trillion GB) created in 2015
● Stream Processing to the Rescue

● Continuous processing, aggregation and analysis of unbounded data
Distributed Stream Processing

Points of Interest
➜ Runtime and Programming Model

Points of Interest
➜ Primitives

Points of Interest
➜ Primitives
➜ State Management

Points of Interest
➜ Primitives
➜ Message Delivery Guarantees

Points of Interest
➜ Primitives
➜ Fault Tolerance & Low Overhead Recovery

Points of Interest
➜ Primitives
➜ Latency, Throughput & Scalability

Points of Interest
➜ Primitives
➜ Maturity and Adoption Level

Points of Interest
➜ Primitives
➜ Maturity and Adoption Level
➜ Ease of Development and Operability

● Most important trait of stream processing system
● Defines expressiveness, possible operations and its limitations
● Therefore defines systems capabilities and its use cases
Runtime and Programming Model

Native Streaming
records
Sink
Operator
Source
Operator
Processing
Operator
records processed one at a time
Processing
Operator

Micro-batching
records processed in short batches
Processing
Operator
Receiver
records
Processing
Operator
Micro-batches
Sink
Operator

Native Streaming
● Records are processed as they arrive
Pros
⟹ Expressiveness
⟹ Low-latency
⟹ Stateful operations
Cons
⟹ Throughput
⟹ Fault-tolerance is expensive
⟹ Load-balancing

Micro-batching
Pros
⟹ High-throughput
⟹ Easier fault tolerance
⟹ Simpler load-balancing
Cons
⟹ Lower latency, depends on
batch interval
⟹ Limited expressivity
⟹ Harder stateful operations
● Splits incoming stream into small batches

Programming Model
Compositional
⟹ Provides basic building blocks as
operators or sources
⟹ Custom component definition
⟹ Manual Topology definition &
optimization
⟹ Advanced functionality often
missing
Declarative
⟹ High-level API
⟹ Operators as higher order functions
⟹ Abstract data types
⟹ Advance operations like state
management or windowing supported
out of the box
⟹ Advanced optimizers

Apache Streaming Landscape
TRIDENT

Storm
Stor
● Pioneer in large scale stream processing

● Higher level micro-batching system build atop Storm
Trident

● Unified batch and stream processing over a batch runtime
Spark Streaming
input data
stream Spark
Streaming
Spark
Engine
batches of
input data
batches of
processed data

Samza
● Builds heavily on Kafka’s log based philosophy
Task 1
Task 2
Task 3

Apex
● Processes massive amounts of real-time events natively in Hadoop
Operator
Operator
OperatorOperator Operator Operator
Output
Stream
Stream
Stream
Stream
Stream
Tuple

Flink
Stream Data
Batch Data
Kafka, RabbitMQ, ...
HDFS, JDBC, ...
● Native streaming & High level API

System Comparison
Native Micro-batching Native Native
Compositional Declarative Compositional Declarative
At-least-once Exactly-once At-least-once Exactly-once
Record ACKs
RDD based
Checkpointing
Log-based Checkpointing
Not build-in
Dedicated
Operators
Stateful
Operators
Stateful
Operators
Very Low Medium Low Low
Low Medium High High
High High Medium Low
Micro-batching
Exactly-once*
Dedicated
DStream
Medium
High
Streaming
Model
API
Guarantees
Fault
Tolerance
State
Management
Latency
Throughput
Maturity
TRIDENT
Hybrid
Compositional*
Exactly-once
Checkpointing
Stateful
Operators
Very Low
High
Medium

Counting Words
ScalaDays 2016
Apache Apache Spark
Storm Apache Trident
Flink Streaming Samza
Scala 2016 Streaming
(Apache,3)
(Streaming, 2)
(2016, 2)
(Spark, 1)
(Storm, 1)
(Trident, 1)
(Flink, 1)
(Samza, 1)
(Scala, )
(ScalaDays, 1)

Storm
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("spout", new RandomSentenceSpout(), 5);
builder.setBolt("split", new Split(), 8).shuffleGrouping("spout");
builder.setBolt("count", new WordCount(), 12).fieldsGrouping("split", new Fields("word"));
...
Map<String, Integer> counts = new HashMap<String, Integer>();
public void execute(Tuple tuple, BasicOutputCollector collector) {
String word = tuple.getString(0);
Integer count = counts.containsKey(word) ? counts.get(word) + 1 : 1;
counts.put(word, count);
collector.emit(new Values(word, count));
}

Trident
public static StormTopology buildTopology(LocalDRPC drpc) {
FixedBatchSpout spout = ...
TridentTopology topology = new TridentTopology();
TridentState wordCounts = topology.newStream("spout1", spout)
.each(new Fields("sentence"),new Split(), new Fields("word"))
.groupBy(new Fields("word"))
.persistentAggregate(new MemoryMapState.Factory(),
new Count(), new Fields("count"));
...
}

Spark Streaming
val conf = new SparkConf().setAppName("wordcount")
val ssc = new StreamingContext(conf, Seconds(1))
val text = ...
val counts = text.flatMap(line => line.split(" "))
.map(word => (word, 1))
.reduceByKey(_ + _)
counts.print()
ssc.start()
ssc.awaitTermination()

val conf = new SparkConf().setAppName("wordcount")
val ssc = new StreamingContext(conf, Seconds(1))
val text = ...
val counts = text.flatMap(line => line.split(" "))
.map(word => (word, 1))
.reduceByKey(_ + _)
counts.print()
ssc.start()
ssc.awaitTermination()
Spark Streaming

Samza
class WordCountTask extends StreamTask {
override def process(envelope: IncomingMessageEnvelope,
collector: MessageCollector, coordinator: TaskCoordinator) {
val text = envelope.getMessage.asInstanceOf[String]
val counts = text.split(" ")
.foldLeft(Map.empty[String, Int]) {
(count, word) => count + (word -> (count.getOrElse(word, 0) + 1))
}
collector.send(new OutgoingMessageEnvelope(new SystemStream("kafka", "wordcount"),
counts))
}

class WordCountTask extends StreamTask {
val text = envelope.getMessage.asInstanceOf[String]
val counts = text.split(" ")
.foldLeft(Map.empty[String, Int]) {
(count, word) => count + (word -> (count.getOrElse(word, 0) + 1))
}
collector.send(new OutgoingMessageEnvelope(new SystemStream("kafka", "wordcount"),
counts))
}
Samza

Apex
val input = dag.addOperator("input", new LineReader)
val parser = dag.addOperator("parser", new Parser)
val out = dag.addOperator("console", new ConsoleOutputOperator)
dag.addStream[String]("lines", input.out, parser.in)
dag.addStream[String]("words", parser.out, counter.data)
class Parser extends BaseOperator {
@transient
val out = new DefaultOutputPort[String]()
@transient
val in = new DefaultInputPort[String]()
override def process(t: String): Unit = {
for(w <- t.split(" ")) out.emit(w)
}
}

class Parser extends BaseOperator {
@transient
val out = new DefaultOutputPort[String]()
@transient
val in = new DefaultInputPort[String]()
override def process(t: String): Unit = {
for(w <- t.split(" ")) out.emit(w)
}
}
Apex

Flink
val env = ExecutionEnvironment.getExecutionEnvironment
val text = env.fromElements(...)
val counts = text.flatMap ( _.split(" ") )
.map ( (_, 1) )
.groupBy(0)
.sum(1)
counts.print()
env.execute("wordcount")

Fault tolerance in streaming systems is
inherently harder that in batch
Fault Tolerance

Storm & Trident
Acks are delivered via a system-level bolt
ack ack
Acker Bolt
{A} {B}
ACK

Spark Streaming
faster recovery by using multiple nodes
for recomputations
failed tasks
failed node

Samza
inputstream
Checkpoint
partition 0: offset 6
Partition 0
Partition 1
Partition 2
Samza
StreamTask
partition 1
StreamTask
partition 2
StreamTask
partition 0

Apex
Aggregated Application Windows
System Event
Window
t=0.5 sec. t=0.5 sec. t=0.5 sec.
Sliding Application Windows
BeginWindow
Checkpoint
EndWindow
BeginWindow
EndWindow
Checkpoint
BeginWindow
EndWindow
BeginWindow
EndWindow

Flink
data stream
checkpoint
barrier n
checkpoint
barrier n-1
part of
checkpoint n+1
part of
checkpoint n
part of
checkpoint n-1
newer records older records

Managing State
f: (input, state) => (output, state’)

Spark Streaming
Input Stream
Job Job Job
Output
Stream
State
Micro-batch
processing

Samza
Task Task
Input
Stream
Changelog
Stream
Output
Stream

Flink
Operator
k1
k2
k3
k4
k5
Partitioned State
Operator
Local (non-partitioned) state
s1
s2
s3

Counting Words Revisited
ScalaDays 2016
Apache Apache Spark
Storm Apache Trident
Flink Streaming Samza
Scala 2016 Streaming
(Apache,3)
(Streaming, 2)
(2016, 2)
(Spark, 1)
(Storm, 1)
(Trident, 1)
(Flink, 1)
(Samza, 1)
(Scala, )
(ScalaDays, 1)

Trident
import storm.trident.operation.builtin.Count;
TridentTopology topology = new TridentTopology();
TridentState wordCounts =
topology.newStream("spout1", spout)
.each(new Fields("sentence"), new Split(), new Fields("word"))
.groupBy(new Fields("word"))
.persistentAggregate(new MemoryMapState.Factory(), new Count(),
new Fields("count"))
.parallelismHint(6);

Spark Streaming
// Initial RDD input to updateStateByKey
val initialRDD = ssc.sparkContext.parallelize(List.empty[(String, Int)])
val lines = ...
val words = lines.flatMap(_.split(" "))
val wordDstream = words.map(x => (x, 1))
val trackStateFunc = (batchTime: Time, word: String, one: Option[Int], state: State[Int]) => {
val sum = one.getOrElse(0) + state.getOption.getOrElse(0)
val output = (word, sum)
state.update(sum)
Some(output)
}
val stateDstream = wordDstream.trackStateByKey(
StateSpec.function(trackStateFunc).initialState(initialRDD))

// Initial RDD input to updateStateByKey
val initialRDD = ssc.sparkContext.parallelize(List.empty[(String, Int)])
val lines = ...
val words = lines.flatMap(_.split(" "))
val wordDstream = words.map(x => (x, 1))
val trackStateFunc = (batchTime: Time, word: String, one: Option[Int], state: State[Int]) => {
val sum = one.getOrElse(0) + state.getOption.getOrElse(0)
val output = (word, sum)
state.update(sum)
Some(output)
}
val stateDstream = wordDstream.trackStateByKey(
StateSpec.function(trackStateFunc).initialState(initialRDD))
Spark Streaming

Samza
class WordCountTask extends StreamTask with InitableTask {
private var store: CountStore = _
def init(config: Config, context: TaskContext) {
this.store = context.getStore("wordcount-store").asInstanceOf[KeyValueStore[String, Integer]]
}
val words = envelope.getMessage.asInstanceOf[String].split(" ")
words.foreach{ key =>
val count: Integer = Option(store.get(key)).getOrElse(0)
store.put(key, count + 1)
collector.send(new OutgoingMessageEnvelope(new SystemStream("kafka", "wordcount"), (key, count)))
}
}

Apex
@ApplicationAnnotation(name="WordCount")
class Application extends StreamingApplication {
override def populateDAG(dag: DAG, configuration: Configuration): Unit = {
val counter = dag.addOperator("counter", new UniqueCounter[String])
dag.addStream[java.util.HashMap[String,Integer]]("counts", counter.count, out.input)
}
}

Flink
val env = ExecutionEnvironment.getExecutionEnvironment
val text = env.fromElements(...)
val words = text.flatMap ( _.split(" ") )
words.keyBy(x => x).mapWithState{
(word, count: Option[Int]) =>
{
val newCount = count.getOrElse(0) + 1
val output = (word, newCount)
(output, Some(newCount))
}
}
...

Performance
Hard to design not biased test, lots
of variables

Performance
➜ Latency vs. Throughput

Performance
➜ Costs of Delivery Guarantees, Fault-tolerance & State Management

Performance
➜ Tuning

Performance
➜ Tuning
➜ Network operations, Data locality & Serialization

Project Maturity
When picking up the framework, you
should always consider its maturity

Project Maturity [Storm & Trident]
For a long time de-facto
industrial standard

Project Maturity [Spark Streaming]
The most trending Scala repository these
days and one of the engines behind Scala’s
popularity

Project Maturity [Samza]
Used by LinkedIn and also by tens of
other companies

Project Maturity [Apex]
Graduated very recently, adopted by a
couple of corporate clients already

Project Maturity [Flink]
Still an emerging project, but we can
see its first production deployments

Summary
Native Micro-batching Native Native
Compositional Declarative Compositional Declarative
At-least-once Exactly-once At-least-once Exactly-once
Record ACKs
RDD based
Checkpointing
Log-based Checkpointing
Not build-in
Dedicated
Operators
Stateful
Operators
Stateful
Operators
Very Low Medium Low Low
Low Medium High High
High High Medium Low
Micro-batching
Exactly-once*
Dedicated
DStream
Medium
High
Streaming
Model
API
Guarantees
Fault
Tolerance
State
Management
Latency
Throughput
Maturity
TRIDENT
Hybrid
Compositional*
Exactly-once
Checkpointing
Stateful
Operators
Very Low
High
Medium

General Guidelines
As always it depends

General Guidelines
➜ Evaluate particular application needs

General Guidelines
➜ Programming model

General Guidelines
➜ Available delivery guarantees

General Guidelines
➜ Almost all non-trivial jobs have state

General Guidelines
➜ Almost all non-trivial jobs have state
➜ Fast recovery is critical

Recommendations [Storm & Trident]
● Fits for small and fast tasks
● Very low (tens of milliseconds) latency
● State & Fault tolerance degrades performance significantly
● Potential update to Heron
○ Keeps API, according to Twitter better in every single way
○ Future open-sourcing is uncertain

Recommendations [Spark Streaming]
● Spark Ecosystem
● Data Exploration
● Latency is not critical
● Micro-batching limitations

Recommendations [Samza]
● Kafka is a cornerstone of your architecture
● Application requires large states
● Don’t need exactly once
● Kafka Streams

Recommendations [Apex]
● Prefer compositional approach
● Hadoop
● Great performance
● Dynamic DAG changes

Recommendations [Flink]
● Conceptually great, fits very most use cases
● Take advantage of batch processing capabilities
● Need a functionality which is hard to implement in micro-batch
● Enough courage to use emerging project

Dataflow and Apache Beam
Dataflow
Model & SDKs
Apache Flink
Apache Spark
Direct Pipeline
Google Cloud
Dataflow
Stream Processing
Batch Processing
Multiple Modes One Pipeline Many Runtimes
Local
or
cloud
Local
Cloud

Questions
MANCHESTER LONDON NEW YORK

MANCHESTER LONDON NEW YORK
@petr_zapletal @cakesolutions
347 708 1518
enquiries@cakesolutions.net
We are hiring
http://www.cakesolutions.net/careers

Distributed Real-Time Stream Processing: Why and How 2.0

More Related Content

What's hot

Viewers also liked

Similar to Distributed Real-Time Stream Processing: Why and How 2.0

More from Petr Zapletal

Recently uploaded

Distributed Real-Time Stream Processing: Why and How 2.0