0% found this document useful (0 votes)

99 views37 pages

Apache Flink® Training: Intro

flinkoverview

Uploaded by

Aylin Koroglu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

99 views37 pages

Apache Flink® Training: Intro

flinkoverview

Uploaded by

Aylin Koroglu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

Apache Flink® Training

Intro

Apache Flink® Training

Flink v1.3 – 8.9.2017

Where we’re going today
▪ Stateful stream processing as a paradigm for
continuous data

▪ Apache Flink is a sophisticated and battle-

tested stateful stream processor with a
comprehensive set of features

▪ Efficiency, management, and operational issues

for state are taken very seriously

2
Stream Processing

process records
one-at-a-time

...
Your
Code

Long running computation, on an endless stream of input

3
Distributed Stream Processing
...

...

...
● partitions input streams by
some key in the data

● distributes computation
across multiple instances
Your Your Your
Code Code Code ● each instance is
responsible for some key
range

4
Stateful Stream Processing
...

...

var x = …

update local
variables/structures
qwe
Your Your if (condition(x)) {
Code Code
…
}

Process

5
Stateful Stream Processing
...

...

var x = …

update local
variables/structures
qwe
Your Your if (condition(x)) {
Code Code
… ● embedded local state
} backend

● state co-partitioned with

Process
the input stream by key

6
About time ...
...

...

When should results be emitted?

● Control for determining when the computation

has fully processed all required events

● It’s mostly about time. e.g. have I received all

events for 3 - 4 pm?
Your Your
Code Code
● Did event B occur within 5 minutes of event A?

● Wall clock time is not correct. Event-time

awareness is required.

7
Traditional batch processing

t
● Continuously
... ingesting data

● Time-bounded
batch files
2017-06-14 2017-06-14 2017-06-13 2017-06-13
01:00am 00:00am 11:00pm 10:00pm
● Periodic batch
jobs

Batch
jobs

8
Traditional batch processing (II)

t
● Consider computing
... conversion metric
(# of A → B per hour)

● What if the conversion

crossed time
2017-06-14 2017-06-14 2017-06-13 2017-06-13
boundaries?
01:00am 00:00am 11:00pm 10:00pm → carry intermediate
results to next batch

● What if events come

intermediate out of order?
state
→ ???

9
The ideal way

accumulate state

● view of the “history” of the

input stream
● output depends
● counters, in-progress on notion of
windows time

● parameters of incrementally ● outputs when

trained ML models, etc. results are
complete
● state influences the output

long running computation

10
The ideal way (II)
● Stateful stream processing as
a new paradigm to
continuously process
continuous data
Stateful Stream Processor
that handles ● Produce accurate results

● Having results available in

real-time (with low latency
Large Time / Order / and high throughput) is a
Distributed State Completeness
natural consequence of the
model
consistently, robustly, and efficiently
● Process both real-time and
servicing historic data using exactly the
same application

11
Flink APIs and Runtime

12
Apache Flink Stack

Libraries

DataStream API DataSet API

Stream Processing Batch Processing

Runtime
Distributed Streaming Data Flow

Streaming and batch as first class citizens.

13
Programming Model

Source Source

Computation stat Computation stat

e e

Transformation Computation stat

Computation stat
e
Sin
Sin k
k

14
Parallelism
Distributed
Execution
Levels of abstraction

Stream SQL high-level langauge

Table API (dynamic tables) declarative DSL

stream processing &

DataStream API (streams, windows) analytics
low-level
Process Function (events, state, time) (stateful stream
processing)

18
Process Function

class MyFunction extends ProcessFunction[MyEvent, Result] {

// declare state to use in the program

lazy val state: ValueState[CountWithTimestamp] = getRuntimeContext().getState(…)

def processElement(event: MyEvent, ctx: Context, out: Collector[Result]): Unit = {

// work with event and state
(event, state.value) match { … }

out.collect(…) // emit events

state.update(…) // modify state

// schedule a timer callback

ctx.timerService.registerEventTimeTimer(event.timestamp + 500)
}

def onTimer(timestamp: Long, ctx: OnTimerContext, out: Collector[Result]): Unit = {

// handle callback when event-/processing- time instant is reached
}
}

19
Data Stream API

val lines: DataStream[String] = env.addSource(

new FlinkKafkaConsumer10<>(…))

val events: DataStream[Event] = lines.map((line) => parse(line))

val stats: DataStream[Statistic] = stream

.keyBy("sensor")
.timeWindow(Time.seconds(5))
.sum(new MyAggregationFunction())

stats.addSink(new RollingSink(path))

20
Table API & Stream SQL

21
Deployment Options

22
Local Execution
▪ Starts local Flink cluster

▪ All processes run in the same Job Manager

JVM
Task Task
▪ Behaves just like a regular Cluster Manager Manager

▪ Local cluster can be started in

your IDE! Task Task
Manager Manager
▪ Very useful for developing and
debugging
JVM

23
Remote Execution

Client Job Manager

Submit
job

Task Task
▪ Submit a Job to a Manager Manager
remotely running
cluster
Task Task
Manager Manager
▪ Monitor the status of
a job
Cluster
24
YARN Job Mode
▪ Brings up a Flink Resource Manager
cluster in YARN to run
a single job
Node Manager Node Manager

Task
Job Manager
▪ Better isolation than Manager

session mode
Node Manager Node Manager

Task Other
Manager Application
Client

YARN Cluster
25
YARN Session Mode
▪ Starts a Flink cluster in Resource Manager
YARN containers
▪ Multi-user scenario Node Manager Node Manager

▪ Resource sharing Job Manager

Task
Manager

▪ Easy to setup
Node Manager Node Manager

Task Other
Manager Application
Client

YARN Cluster
26
Other Deployment Options

▪ Apache Mesos
• Either with or without DC/OS
▪ Amazon Elastic MapReduce
• Available in EMR 5.1.0
▪ Google Compute Engine
• Available via bdutil
▪ Docker / Kubernetes
Flink in the real world

28
Flink community
Github

41 meetups
16,544 members

29
Powered by Flink

Zalando, one of the largest ecommerce King, the creators of Candy Crush Saga,
companies in Europe, uses Flink for real- uses Flink to provide data science teams
time business process monitoring. with real-time analytics.

Alibaba, the world's largest retailer, built a Bouygues Telecom uses Flink for real-time
Flink-based system (Blink) to optimize event processing over billions of Kafka
search rankings in real time. messages per day.

See more at
30
flink.apache.org/poweredby.html
31
Largest job has > 20 operators, runs on > 5000
vCores in 1000-node cluster, processes millions of
events per second

Complex jobs of > 30 operators running 24/7,

processing 30 billion events daily, maintaining state
of 100s of GB with exactly-once guarantees

30 Flink applications in production for more than one

year. 10 billion events (2TB) processed daily

32
What is being built with Flink?

▪ First wave for streaming was lambda architecture

• Aid batch systems to be more real-time

▪ Second wave was analytics (real time and lag-time)

• Based on distributed collections, functions, and windows

▪ The next wave is much broader:

A new architecture for event-driven applications

33
@

Complete social network implemented

using event sourcing and
CQRS (Command Query Responsibility Segregation)

34
Flink Forward 2016

35
Flink Forward 2017

San Francisco Berlin

• 10-11 April 2017
• 11-13 September 2017
• The first Flink Forward event
outside of Berlin • Over 350 attendees last year

• Talks are online at sf.flink- • Registration opening soon!

forward.org/

36
http://training.data-artisans.com/

ITHome - Deep Dive Into Apache Flink - Gordon
No ratings yet
ITHome - Deep Dive Into Apache Flink - Gordon
44 pages
Large-Scale Apache Flink Insights
No ratings yet
Large-Scale Apache Flink Insights
76 pages
Apache Flink Tutorial
100% (1)
Apache Flink Tutorial
44 pages
Apache Flink for Big Data Experts
No ratings yet
Apache Flink for Big Data Experts
68 pages
Continuous Processing With Apache Flink: Stephan Ewen @stephanewen
No ratings yet
Continuous Processing With Apache Flink: Stephan Ewen @stephanewen
41 pages
Apache Flink Introduction - Big Data Landscape
No ratings yet
Apache Flink Introduction - Big Data Landscape
26 pages
Apache Flink: Stream & Batch Processing Features
No ratings yet
Apache Flink: Stream & Batch Processing Features
15 pages
02data Stream Processing With Apache Flink
No ratings yet
02data Stream Processing With Apache Flink
61 pages
Apache Flink.9443699.Powerpoint
No ratings yet
Apache Flink.9443699.Powerpoint
6 pages
BDA Notes (Unit-1)
No ratings yet
BDA Notes (Unit-1)
11 pages
Apache SD Papers
No ratings yet
Apache SD Papers
21 pages
Apache Flink ™: Stream and Batch Processing in A Single Engine
No ratings yet
Apache Flink ™: Stream and Batch Processing in A Single Engine
11 pages
Mawaporasirukinu
No ratings yet
Mawaporasirukinu
2 pages
Flink: Big Data Huawei Course
No ratings yet
Flink: Big Data Huawei Course
22 pages
Apache Flink for Big Data Engineers
No ratings yet
Apache Flink for Big Data Engineers
116 pages
Module 08 Flink - Stream Processing and Batch Processing Platform
No ratings yet
Module 08 Flink - Stream Processing and Batch Processing Platform
40 pages
Flink HandsOn
No ratings yet
Flink HandsOn
39 pages
Chapter 7 Flink Stream and Batch Processing in A Single Engine
No ratings yet
Chapter 7 Flink Stream and Batch Processing in A Single Engine
45 pages
BD 11 Stream
No ratings yet
BD 11 Stream
60 pages
BOSS16 Tutorial Flink
No ratings yet
BOSS16 Tutorial Flink
32 pages
Flink: Another Data Stream Framework!
No ratings yet
Flink: Another Data Stream Framework!
7 pages
Ververica Platform Whitepaper Stream Processing For Real-Time Business, Powered by Apache Flink®
No ratings yet
Ververica Platform Whitepaper Stream Processing For Real-Time Business, Powered by Apache Flink®
22 pages
Hyderabad Meetup Dec 7th 2024 - Diptiman - Confluent
No ratings yet
Hyderabad Meetup Dec 7th 2024 - Diptiman - Confluent
85 pages
Apache Flink Is An Open-Source, Dis
No ratings yet
Apache Flink Is An Open-Source, Dis
2 pages
Apache Kafka-Flink Course Outline
No ratings yet
Apache Kafka-Flink Course Outline
2 pages
Streaming with Apache Flink
No ratings yet
Streaming with Apache Flink
232 pages
Kubernetes and Real Time World Analytics Albert Lewandowski
No ratings yet
Kubernetes and Real Time World Analytics Albert Lewandowski
55 pages
7 - Streaming 2 - Calcite
No ratings yet
7 - Streaming 2 - Calcite
45 pages
Stream Processing - Hands-On With Apache Flink (Giannis Polyzos) (Z-Library)
No ratings yet
Stream Processing - Hands-On With Apache Flink (Giannis Polyzos) (Z-Library)
234 pages
Big Data PDF
No ratings yet
Big Data PDF
10 pages
Buyers Guide - Decoding The Top 4 Real-Time Data Platforms Powered by Apache Flink
No ratings yet
Buyers Guide - Decoding The Top 4 Real-Time Data Platforms Powered by Apache Flink
17 pages
MA - VaishuAchini - VIT - 24 - ICT703 - A3
No ratings yet
MA - VaishuAchini - VIT - 24 - ICT703 - A3
8 pages
TRabl StreamProcessing
No ratings yet
TRabl StreamProcessing
79 pages
SA Unit 1 PPT 2
No ratings yet
SA Unit 1 PPT 2
27 pages
VERA White Paper
No ratings yet
VERA White Paper
35 pages
Datastream Api: Fault Tolerance
No ratings yet
Datastream Api: Fault Tolerance
26 pages
Building Real-Time Streaming Pipelines With Apache Flink & PyFlink - by Yousef Yousefi - Medium
No ratings yet
Building Real-Time Streaming Pipelines With Apache Flink & PyFlink - by Yousef Yousefi - Medium
15 pages
Apache Flink On Confluent Cloud
No ratings yet
Apache Flink On Confluent Cloud
2 pages
Flink Vs Spark by Slim Baltagi
No ratings yet
Flink Vs Spark by Slim Baltagi
67 pages
Bigdata
No ratings yet
Bigdata
3 pages
Common Flink Mistakes
No ratings yet
Common Flink Mistakes
23 pages
5a - Streaming Data Analytics PDF
No ratings yet
5a - Streaming Data Analytics PDF
37 pages
Big Data Analytics - Unit 2 Notes
No ratings yet
Big Data Analytics - Unit 2 Notes
44 pages
BDA Lec10
No ratings yet
BDA Lec10
33 pages
6 - Streaming Part 1
No ratings yet
6 - Streaming Part 1
44 pages
Stream Data Processing
No ratings yet
Stream Data Processing
32 pages
Glossary - Apache-Flink
No ratings yet
Glossary - Apache-Flink
4 pages
Report
No ratings yet
Report
5 pages
Smart Data Boden Introduction Flink
No ratings yet
Smart Data Boden Introduction Flink
37 pages
BDA Unit 3
No ratings yet
BDA Unit 3
42 pages
Spark Streaming: Tathagata "TD" Das
No ratings yet
Spark Streaming: Tathagata "TD" Das
28 pages
Spark Streaming for Developers
100% (1)
Spark Streaming for Developers
28 pages
Flink
No ratings yet
Flink
31 pages
Spark Streaming Workflow Guide
No ratings yet
Spark Streaming Workflow Guide
25 pages
Stream Processing in Big Data
No ratings yet
Stream Processing in Big Data
39 pages
Design Patterns For Working With Fast Data: © 2016 Mapr Technologies © 2016 Mapr Technologies
No ratings yet
Design Patterns For Working With Fast Data: © 2016 Mapr Technologies © 2016 Mapr Technologies
64 pages
Datastream Api: Time and Watermarks
No ratings yet
Datastream Api: Time and Watermarks
24 pages
Datastream Api: Time and Watermarks
No ratings yet
Datastream Api: Time and Watermarks
24 pages
Big Data
No ratings yet
Big Data
92 pages
Amastris The First Hellenistic Queen PDF
No ratings yet
Amastris The First Hellenistic Queen PDF
13 pages
TM 1503 AVEVA Plant 12 Series Pipe Stress Interface User Caesar II 5 1 Rev 5 0 PDF
100% (1)
TM 1503 AVEVA Plant 12 Series Pipe Stress Interface User Caesar II 5 1 Rev 5 0 PDF
63 pages
70-740: Installation, Storage, and Compute With Windows Server 2016
No ratings yet
70-740: Installation, Storage, and Compute With Windows Server 2016
110 pages
Cmi Case
No ratings yet
Cmi Case
9 pages
Renegades and Heretics
100% (1)
Renegades and Heretics
24 pages
Cebacom 2022 676
No ratings yet
Cebacom 2022 676
4 pages
Graphene: Physics and Applications
No ratings yet
Graphene: Physics and Applications
17 pages
CBT Feelinggood: The Simple CBT Thought Stop and Reframe Handout
100% (1)
CBT Feelinggood: The Simple CBT Thought Stop and Reframe Handout
1 page
Performance Appraisals Report, Akshit Jadwani
No ratings yet
Performance Appraisals Report, Akshit Jadwani
2 pages
DLL Hele
No ratings yet
DLL Hele
6 pages
Student Self-Esteem Assessment
100% (1)
Student Self-Esteem Assessment
2 pages
Ielts PPT For Ims, Mnums
No ratings yet
Ielts PPT For Ims, Mnums
16 pages
Tips and Best Practices For Getting The Best From Your Virtual Apps and Desktops
No ratings yet
Tips and Best Practices For Getting The Best From Your Virtual Apps and Desktops
52 pages
Revtex 4 Dot 2 Template and Samples
No ratings yet
Revtex 4 Dot 2 Template and Samples
7 pages
Byd Reach Truck Rtr16 Service Operator Parts Manual
100% (7)
Byd Reach Truck Rtr16 Service Operator Parts Manual
22 pages
Mohamed El-Mahdy Fathalla: About Objective
No ratings yet
Mohamed El-Mahdy Fathalla: About Objective
1 page
Lesson Plan Letter P
100% (13)
Lesson Plan Letter P
2 pages
Service Quality
No ratings yet
Service Quality
21 pages
Plant Tissue Structure for Students
No ratings yet
Plant Tissue Structure for Students
8 pages
Atg Quiz Comparatives1 PDF
No ratings yet
Atg Quiz Comparatives1 PDF
2 pages
Mess Media Flow and Differential Growth in Knowedge PDF
100% (1)
Mess Media Flow and Differential Growth in Knowedge PDF
13 pages
0460 Example Candidate Responses Paper 4
No ratings yet
0460 Example Candidate Responses Paper 4
34 pages
By: Prof. A.S.Mohanty: Lesson Notes On Organizational Behaviour Semester - 3 Under BPUT Syllabus NOTE - 19
No ratings yet
By: Prof. A.S.Mohanty: Lesson Notes On Organizational Behaviour Semester - 3 Under BPUT Syllabus NOTE - 19
2 pages
Marine Geochemistry 1
No ratings yet
Marine Geochemistry 1
19 pages
Organization and Management
No ratings yet
Organization and Management
14 pages
1.introduction To Operations Management PDF
67% (3)
1.introduction To Operations Management PDF
7 pages
PersonalLeadershipPlan Template Week1
No ratings yet
PersonalLeadershipPlan Template Week1
4 pages
Science of The Heart
100% (1)
Science of The Heart
20 pages
Report Writing Quiz for Inspectors
No ratings yet
Report Writing Quiz for Inspectors
2 pages
Rubric - Mock Job Interview
No ratings yet
Rubric - Mock Job Interview
2 pages
Customer Service Executive
No ratings yet
Customer Service Executive
2 pages

Apache Flink® Training: Intro

Uploaded by

Apache Flink® Training: Intro

Uploaded by

Apache Flink® Training

Apache Flink® Training

Flink v1.3 – 8.9.2017

▪ Apache Flink is a sophisticated and battle-

▪ Efficiency, management, and operational issues

Long running computation, on an endless stream of input

● state co-partitioned with

When should results be emitted?

● Control for determining when the computation

● It’s mostly about time. e.g. have I received all

● Wall clock time is not correct. Event-time

● What if the conversion

● What if events come

● view of the “history” of the

● parameters of incrementally ● outputs when

long running computation

● Having results available in

DataStream API DataSet API

Streaming and batch as first class citizens.

Computation stat Computation stat

Transformation Computation stat

Stream SQL high-level langauge

Table API (dynamic tables) declarative DSL

stream processing &

class MyFunction extends ProcessFunction[MyEvent, Result] {

// declare state to use in the program

def processElement(event: MyEvent, ctx: Context, out: Collector[Result]): Unit = {

out.collect(…) // emit events

// schedule a timer callback

def onTimer(timestamp: Long, ctx: OnTimerContext, out: Collector[Result]): Unit = {

val lines: DataStream[String] = env.addSource(

val events: DataStream[Event] = lines.map((line) => parse(line))

val stats: DataStream[Statistic] = stream

▪ All processes run in the same Job Manager

▪ Local cluster can be started in

Client Job Manager

▪ Resource sharing Job Manager

Complex jobs of > 30 operators running 24/7,

30 Flink applications in production for more than one

▪ First wave for streaming was lambda architecture

▪ Second wave was analytics (real time and lag-time)

▪ The next wave is much broader:

Complete social network implemented

San Francisco Berlin

• Talks are online at sf.flink- • Registration opening soon!

You might also like