Apache Flink: Stream & Batch Processing Features

Apache Flink is a stream and batch processing framework that supports stream and batch processing, state management, event-time processing, and exactly-once state consistency. It can be deployed on various resource managers like YARN and Kubernetes, and provides high availability without single points of failure through checkpointing. Flink scales to large state and high throughput applications and powers many production stream processing jobs. It also supports savepoints for consistent application snapshots.

Uploaded by

Neha Khatri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

87 views15 pages

Apache Flink: Stream & Batch Processing Features

Uploaded by

Neha Khatri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 15

Flink Basics

Features

Apache Flink is an excellent choice to develop and run many different types of applications due to its extensive features set.
• Flink’s features include support for stream and batch processing,
• sophisticated state management,
• event-time processing semantics,
• and exactly-once consistency guarantees for state.
• Moreover, Flink can be deployed on various resource providers such as YARN and Kubernetes, but also as stand-alone cluster on bare-
metal hardware.
• Configured for high availability, Flink does not have a single point of failure. Fault-tolerance is achieved by periodically writing
checkpoints to a remote persistent storage.
• Flink has been proven to scale to thousands of cores and terabytes of application state, delivers high throughput and low latency, and
powers some of the world’s most demanding stream processing applications.
• However, Flink’s outstanding feature for event-driven applications is savepoint. A savepoint is a consistent state image that can be used as
a starting point for compatible applications. Given a savepoint, an application can be updated or adapt its scale, or multiple versions of an
application can be started for A/B testing.
• supplier<Stream<Interger>> str =() -> {1,2,3,4};
• str1=Str.map().count();
• Str.
Execution Environment
• . Flink integrates with all common cluster resource managers such as
• Hadoop YARN
• Kubernetes,
• but can also be setup to run as a stand-alone cluster.

• When deploying a Flink application, Flink automatically identifies the

required resources based on the application’s configured parallelism and
requests them from the resource manager. In case of a failure, Flink replaces
the failed container by requesting new resources. All communication to
submit or control an application happens via REST calls. This eases the
integration of Flink in many environments.
Kubernetes– brief
• Kubernetes is a portable, extensible, open source platform for managing
containerized workloads and services, that facilitates both declarative
configuration and automation. It has a large, rapidly growing ecosystem.
Kubernetes services, support, and tools are widely available.

• Containers
• Technology for packaging an application along with its runtime
dependencies.

• Workloads
• Understand Pods, the smallest deployable compute object in Kubernetes,
and the higher-level abstractions that help you to run them.
• Node manager ==Task tracker
Application manager == Job tracker (which takes care of (half
responsibility of Job tracker of MRv1) data execution engine and
scheduling job and taking updates from node manager and asking for
resources from resource manager ..it sits in between )

And resource allocation part of Job tracker of MR1 is now Assigned to

resource manager of MR2.
Flink Application Structure

1. Streams
2. State
3. Time
streams are a fundamental aspect of stream processing. However, streams can have different characteristics that
affect how a stream can and should be processed. Flink is a versatile processing framework that can handle any
kind of stream.

•Bounded and unbounded streams: Streams can be unbounded or bounded, i.e., fixed-sized data sets. Flink
has sophisticated features to process unbounded streams, but also dedicated operators to efficiently process
bounded streams.

•Real-time and recorded streams: All data are generated as streams. There are two ways to process the data.
Processing it in real-time as it is generated or persisting the stream to a storage system, e.g., a file system or
object store, and processed it later. Flink applications can process recorded or real-time streams.
State:-

only applications that apply transformations on individual events do not require state.

Any application that runs basic business logic needs to remember events or intermediate results to access them at a
later point in time, for example when the next event is received or after a specific time duration.
•Features of state:

•Exactly-once state consistency: Flink’s checkpointing and recovery algorithms guarantee the consistency of
application state in case of a failure. Hence, failures are transparently handled and do not affect the correctness of an
application.

•Very Large State: Flink is able to maintain application state of several terabytes in size due to its asynchronous and
incremental checkpoint algorithm.

•Scalable Applications: Flink supports scaling of stateful applications by redistributing the state to more or fewer
workers.
Time
Time is another important ingredient of streaming applications. Most event streams have inherent time semantics
because each event is produced at a specific point in time.

Moreover, many common stream computations are based on time, such as windows aggregations, sessionization,
pattern detection, and time-based joins. An important aspect of stream processing is how an application
measures time, i.e., the difference of event-time and processing-time.
Flink provides a rich set of time-related features.

•Event-time Mode: Applications that process streams with event-time semantics compute results based on timestamps of
the events. Thereby, event-time processing allows for accurate and consistent results regardless whether recorded or real-
time events are processed.

•Watermark Support: Flink employs watermarks to reason about time in event-time applications. Watermarks are also a
flexible mechanism to trade-off the latency and completeness of results.

•Late Data Handling: When processing streams in event-time mode with watermarks, it can happen that a computation
has been completed before all associated events have arrived. Such events are called late events. Flink features multiple
options to handle late events, such as rerouting them via side outputs and updating previously completed results.

•Processing-time Mode: In addition to its event-time mode, Flink also supports processing-time semantics which performs
computations as triggered by the wall-clock time of the processing machine. The processing-time mode can be suitable for
certain applications with strict low-latency requirements that can tolerate approximate results.
Layered APIs
Flink provides three layered APIs. Each API offers a different trade-off
between conciseness and expressiveness and targets different use cases.

ITHome - Deep Dive Into Apache Flink - Gordon
No ratings yet
ITHome - Deep Dive Into Apache Flink - Gordon
44 pages
02data Stream Processing With Apache Flink
No ratings yet
02data Stream Processing With Apache Flink
61 pages
Mawaporasirukinu
No ratings yet
Mawaporasirukinu
2 pages
Large-Scale Apache Flink Insights
No ratings yet
Large-Scale Apache Flink Insights
76 pages
Module 08 Flink - Stream Processing and Batch Processing Platform
No ratings yet
Module 08 Flink - Stream Processing and Batch Processing Platform
40 pages
Apache Flink® Training: Intro
No ratings yet
Apache Flink® Training: Intro
37 pages
Apache SD Papers
No ratings yet
Apache SD Papers
21 pages
Apache Flink ™: Stream and Batch Processing in A Single Engine
No ratings yet
Apache Flink ™: Stream and Batch Processing in A Single Engine
11 pages
BOSS16 Tutorial Flink
No ratings yet
BOSS16 Tutorial Flink
32 pages
Apache Flink Is An Open-Source, Dis
No ratings yet
Apache Flink Is An Open-Source, Dis
2 pages
Apache Flink for Big Data Experts
No ratings yet
Apache Flink for Big Data Experts
68 pages
Apache Flink.9443699.Powerpoint
No ratings yet
Apache Flink.9443699.Powerpoint
6 pages
Flink: Big Data Huawei Course
No ratings yet
Flink: Big Data Huawei Course
22 pages
BDA Notes (Unit-1)
No ratings yet
BDA Notes (Unit-1)
11 pages
Flink: Another Data Stream Framework!
No ratings yet
Flink: Another Data Stream Framework!
7 pages
Continuous Processing With Apache Flink: Stephan Ewen @stephanewen
No ratings yet
Continuous Processing With Apache Flink: Stephan Ewen @stephanewen
41 pages
Apache Flink Introduction - Big Data Landscape
No ratings yet
Apache Flink Introduction - Big Data Landscape
26 pages
Apache Flink Tutorial
100% (1)
Apache Flink Tutorial
44 pages
Apache Flink On Confluent Cloud
No ratings yet
Apache Flink On Confluent Cloud
2 pages
Report
No ratings yet
Report
5 pages
Flink HandsOn
No ratings yet
Flink HandsOn
39 pages
Buyers Guide - Decoding The Top 4 Real-Time Data Platforms Powered by Apache Flink
No ratings yet
Buyers Guide - Decoding The Top 4 Real-Time Data Platforms Powered by Apache Flink
17 pages
Chapter 7 Flink Stream and Batch Processing in A Single Engine
No ratings yet
Chapter 7 Flink Stream and Batch Processing in A Single Engine
45 pages
Csa Overview
No ratings yet
Csa Overview
9 pages
Streaming with Apache Flink
No ratings yet
Streaming with Apache Flink
232 pages
Hyderabad Meetup Dec 7th 2024 - Diptiman - Confluent
No ratings yet
Hyderabad Meetup Dec 7th 2024 - Diptiman - Confluent
85 pages
Stream Processing - Hands-On With Apache Flink (Giannis Polyzos) (Z-Library)
No ratings yet
Stream Processing - Hands-On With Apache Flink (Giannis Polyzos) (Z-Library)
234 pages
Apache Flink for Big Data Engineers
No ratings yet
Apache Flink for Big Data Engineers
116 pages
Building Real-Time Streaming Pipelines With Apache Flink & PyFlink - by Yousef Yousefi - Medium
No ratings yet
Building Real-Time Streaming Pipelines With Apache Flink & PyFlink - by Yousef Yousefi - Medium
15 pages
Kubernetes and Real Time World Analytics Albert Lewandowski
No ratings yet
Kubernetes and Real Time World Analytics Albert Lewandowski
55 pages
7 - Streaming 2 - Calcite
No ratings yet
7 - Streaming 2 - Calcite
45 pages
5a - Streaming Data Analytics PDF
No ratings yet
5a - Streaming Data Analytics PDF
37 pages
Datastream Api: Fault Tolerance
No ratings yet
Datastream Api: Fault Tolerance
26 pages
Cessing
No ratings yet
Cessing
67 pages
Ververica Platform Whitepaper Stream Processing For Real-Time Business, Powered by Apache Flink®
No ratings yet
Ververica Platform Whitepaper Stream Processing For Real-Time Business, Powered by Apache Flink®
22 pages
Apache Kafka-Flink Course Outline
No ratings yet
Apache Kafka-Flink Course Outline
2 pages
BD 11 Stream
No ratings yet
BD 11 Stream
60 pages
Glossary - Apache-Flink
No ratings yet
Glossary - Apache-Flink
4 pages
VERA White Paper
No ratings yet
VERA White Paper
35 pages
Optimizing Flink For High-Throughput Machine Learning: Streaming Feature Engineering in Banking
No ratings yet
Optimizing Flink For High-Throughput Machine Learning: Streaming Feature Engineering in Banking
10 pages
SA Unit 1 PPT 2
No ratings yet
SA Unit 1 PPT 2
27 pages
Common Flink Mistakes
No ratings yet
Common Flink Mistakes
23 pages
Big Data PDF
No ratings yet
Big Data PDF
10 pages
BDA Lec10
No ratings yet
BDA Lec10
33 pages
Big Data Analytics - Unit 2 Notes
No ratings yet
Big Data Analytics - Unit 2 Notes
44 pages
MA - VaishuAchini - VIT - 24 - ICT703 - A3
No ratings yet
MA - VaishuAchini - VIT - 24 - ICT703 - A3
8 pages
Choose The Right Stream Processing Engine Whitepaper
No ratings yet
Choose The Right Stream Processing Engine Whitepaper
16 pages
Unit 2 BD Mining Data Streams
No ratings yet
Unit 2 BD Mining Data Streams
34 pages
Flink Vs Spark by Slim Baltagi
No ratings yet
Flink Vs Spark by Slim Baltagi
67 pages
Lightweight Asynchronous Snapshots For Distributed Dataflows (Flink)
No ratings yet
Lightweight Asynchronous Snapshots For Distributed Dataflows (Flink)
8 pages
Chapter 6 Spark and Flink Questions Answers
No ratings yet
Chapter 6 Spark and Flink Questions Answers
5 pages
Unit - 5 FBDA
No ratings yet
Unit - 5 FBDA
7 pages
Big Data Analytics Unit-2
100% (1)
Big Data Analytics Unit-2
11 pages
Real Time Fraud Detection Using Apache Flink - Part 2 - by Yugen - Ai - Yugen - Ai Technology Blog - Medium
No ratings yet
Real Time Fraud Detection Using Apache Flink - Part 2 - by Yugen - Ai - Yugen - Ai Technology Blog - Medium
36 pages
5a. Introduction To Data Ingestion and Processing
No ratings yet
5a. Introduction To Data Ingestion and Processing
26 pages
SQL Notees Linkedin
No ratings yet
SQL Notees Linkedin
40 pages
David Bowie's Tech Management Expertise
No ratings yet
David Bowie's Tech Management Expertise
3 pages
GCP Technologies
No ratings yet
GCP Technologies
12 pages
Self-Assessment Form - Big Data
No ratings yet
Self-Assessment Form - Big Data
4 pages
Write A Letter To Your Neighbour
No ratings yet
Write A Letter To Your Neighbour
1 page
Logic Puzzle Challenges
100% (1)
Logic Puzzle Challenges
5 pages
Design and Development of Double Chamber Centrifugal De-Huller For Millets
No ratings yet
Design and Development of Double Chamber Centrifugal De-Huller For Millets
179 pages
Unit IV JavaScript New
No ratings yet
Unit IV JavaScript New
22 pages
KG2-3G - Baseload Performance - W-Part Load Data - Rev01
No ratings yet
KG2-3G - Baseload Performance - W-Part Load Data - Rev01
21 pages
Chapter 4 Mem
No ratings yet
Chapter 4 Mem
20 pages
Studying IEEE-802.11 Encryption Protocol
No ratings yet
Studying IEEE-802.11 Encryption Protocol
5 pages
Speed /frequency / Wavelength: Equation
No ratings yet
Speed /frequency / Wavelength: Equation
3 pages
Is Iec 41 1991
100% (1)
Is Iec 41 1991
214 pages
Chromatography Noise Analysis
No ratings yet
Chromatography Noise Analysis
2 pages
A DEH PR-2014-0109-GB Filter-2000 DF R6-02-2016 150dpi
No ratings yet
A DEH PR-2014-0109-GB Filter-2000 DF R6-02-2016 150dpi
92 pages
Different Types of Transistor: Name Definitio N Schematic Symbol Illustration Purpose
No ratings yet
Different Types of Transistor: Name Definitio N Schematic Symbol Illustration Purpose
4 pages
AUTOMGEN8: PLC & SCADA Software Guide
No ratings yet
AUTOMGEN8: PLC & SCADA Software Guide
12 pages
Electrical Installation in Hazardous Area Presentation
100% (1)
Electrical Installation in Hazardous Area Presentation
79 pages
DSA Notes Well Organised
No ratings yet
DSA Notes Well Organised
166 pages
Tensile Structures CENO TEC PDF
No ratings yet
Tensile Structures CENO TEC PDF
11 pages
F and G Series Hammer Gas Charge Service Kit Information
50% (2)
F and G Series Hammer Gas Charge Service Kit Information
5 pages
Excel 2013 Pivot Table Guide
No ratings yet
Excel 2013 Pivot Table Guide
5 pages
Java Date and Time
No ratings yet
Java Date and Time
7 pages
Ntop Resources - Lattices
No ratings yet
Ntop Resources - Lattices
3 pages
Phased Array Ultrasonic Testing Guide
No ratings yet
Phased Array Ultrasonic Testing Guide
3 pages
Average Pada Data Jumlah Kejadian Hipertensi
No ratings yet
Average Pada Data Jumlah Kejadian Hipertensi
11 pages
5G Beamforming Techniques Guide
No ratings yet
5G Beamforming Techniques Guide
82 pages
LMS Plus 7.5 Service Manual
100% (1)
LMS Plus 7.5 Service Manual
36 pages
STOBER Drive Systems Catalog
No ratings yet
STOBER Drive Systems Catalog
240 pages
Application of Adaptive Neuro-Fuzzy Inferen
No ratings yet
Application of Adaptive Neuro-Fuzzy Inferen
14 pages
EurekaMath - G6 - Operations and Division of Fractions
No ratings yet
EurekaMath - G6 - Operations and Division of Fractions
226 pages
9th Class Grammar Short Notes PDF
0% (1)
9th Class Grammar Short Notes PDF
16 pages
Aerodynamic Bus Design Report
No ratings yet
Aerodynamic Bus Design Report
7 pages
ESO202 HW 08 2024 04 Oct
No ratings yet
ESO202 HW 08 2024 04 Oct
2 pages
Transformations (Y12
100% (1)
Transformations (Y12
11 pages

Apache Flink: Stream & Batch Processing Features

Uploaded by

Apache Flink: Stream & Batch Processing Features

Uploaded by

Flink Basics

• When deploying a Flink application, Flink automatically identifies the

And resource allocation part of Job tracker of MR1 is now Assigned to

You might also like