SlideShare a Scribd company logo
Presented By:
Kundan Kumar
Software Consultant
Stateful Stream
Processing with
Apache Flink
Lack of etiquette and manners is a huge turn off.
KnolX Etiquettes
Punctuality
Respect Knolx session timings, you
are requested not to join sessions
after a 5 minutes threshold post
the session start time.
Feedback
Make sure to submit a constructive
feedback for all sessions as it is
very helpful for the presenter.
Mute
Be on mute until you have
questions or concerns.
Agenda
01 What is stateful stream processing
02 Flink takes on stateful stream processing
Demo
03
What is Stateful Stream Processing?
Streaming and Stream Processing:
Stream processing is the processing of data in motion, or in other words, computing on data directly
as it is produced or received.
The systems that receive and send the data streams and execute the application or analytics logic are
called stream processors.
Stateful Stream Processing:
Stateful stream processing is a subset of stream processing in which the computation maintains
contextual state. This state is used to store information derived from the previously-seen events.
Stateful stream processing means a “State” is shared between events(stream entities). And therefore
past events can influence the way the current events are processed.
Flink takes on stateful stream processing
Flink in nutshell-
● Apache Flink is a Big Data framework and distributed processing engine for stateful
computations over unbounded and bounded data streams.
➢ A Flink application may consume real-time data from streaming sources such as
message queues or distributed logs, like Apache Kafka or Kinesis.
➢ Flink can also consume bounded, historic data from a variety of data sources.
➢ The streams of results being produced by a Flink application can be sent to a wide
variety of systems that can be connected as sinks.
➢ Fast, In memory, scalable, large state, fault tolerant, event time, exactly once.
Source
Transformations
Sink
➢ Programs in Flink are inherently parallel and distributed.
➢ During execution, a stream has one or more stream partitions, and each
operator has one or more operator subtasks.
➢ Flink facilitate stateful operations.
➢ Current handling event can depend on the accumulated effect of all the events that
came before it.
➢ The set of parallel instances of a stateful operator is effectively a sharded key-value
store. Each parallel instance is responsible for handling events for a specific group of
keys, and the state for those keys is kept locally.
Stateful stream processing with Apache Flink
States in Flink
➢ Operator State: State is maintained on per operator basis on stream. Special type of
state used in source and sink implementations.
➢ Keyed State: Maintaining state on per key basis on stream. Stores state associated
with the same key. Embedded key value store.
➢ Broadcast State: Special type of operator state used where records of one stream will
be broadcast to all downstream task which needs access to those records.
➢ Queryable State: Feature that allow client API’s to query Jobstate from outside Flink.
Stateful streaming application in Flink
Stateful stream processing with Apache Flink
State Backends
1. Memory state backend:
➢ This is the default backend used by Flink in case nothing is configured.
➢ Persists the data in the memory of each task manager’s Heap.
➢ This state should never be used in production jobs.
➢ The state creates a backup of the data (also known as checkpointing) in the job
manager memory which puts unnecessary pressure on the job manager's operational
stability.
2. File System Backend
➢ This backend is similar to Memory state backend except, it stores the backup on the
filesystem rather than job manager’s memory.
➢ The filesystem can be task manager's local filesystem or a durable store such as
HDFS/S3.
3. RocksDB backend
➢ This backend uses RocksDB by Facebook to store the data
➢ RocksDB maintains an in-memory table (also known as mem-table) along with bloom
filters, reading recent data also is extremely fast.
➢ Each task manager maintains its own Rocks DB file and the backup of this state is
checkpointed to a durable store such as HDFS/S3.
➢ This is the only backend which offers support for incremental checkpointing i.e. taking a
backup of only modified data rather than complete data.
Checkpointing
Checkpoint: Specific marked point in each input stream from which stream can
replayed. Flink implements it by persisting state of all stateful operator. Periodically
save state to reliable storage system.
Stream Barriers: Lightweight stream marker with unique ID’s. Injected by Flink into
input stream and flow with stream in line.
Checkpointing mechanism
Aligned Checkpointing-
Unaligned Checkpointing-
Demo
Q/A
References
1. https://flink.apache.org
2. https://ci.apache.org/projects/flink/flink-docs-release-1.12/con
cepts/stateful-stream-processing.html#unaligned-checkpointin
g
3. Book: Learning Apache Flink By Tanmay Deshpande
Thank You !

More Related Content

PPTX
Flexible and Real-Time Stream Processing with Apache Flink
PDF
Kafka streams windowing behind the curtain
PDF
Introduction to Apache Flink - Fast and reliable big data processing
PDF
Introduction to Apache Flink
PPTX
Autoscaling Flink with Reactive Mode
PPTX
Real-time Analytics with Trino and Apache Pinot
PDF
Apache Flink internals
PPTX
Apache Flink and what it is used for
Flexible and Real-Time Stream Processing with Apache Flink
Kafka streams windowing behind the curtain
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink
Autoscaling Flink with Reactive Mode
Real-time Analytics with Trino and Apache Pinot
Apache Flink internals
Apache Flink and what it is used for

What's hot (20)

PDF
Apache flink
PPTX
Apache Flink Training: System Overview
PPTX
Netflix Data Pipeline With Kafka
PPTX
Real-time Stream Processing with Apache Flink
PPTX
Introduction to Apache Flink
PPTX
Hive: Loading Data
PPTX
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17
PDF
Spark shuffle introduction
PDF
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
PDF
Flink 2.0: Navigating the Future of Unified Stream and Batch Processing
PPTX
Using Apache Hive with High Performance
PPTX
APACHE KAFKA / Kafka Connect / Kafka Streams
PPTX
Flink vs. Spark
PPTX
Evening out the uneven: dealing with skew in Flink
PPTX
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
PDF
Pinot: Near Realtime Analytics @ Uber
PDF
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
PDF
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
PDF
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
PPTX
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
Apache flink
Apache Flink Training: System Overview
Netflix Data Pipeline With Kafka
Real-time Stream Processing with Apache Flink
Introduction to Apache Flink
Hive: Loading Data
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17
Spark shuffle introduction
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Flink 2.0: Navigating the Future of Unified Stream and Batch Processing
Using Apache Hive with High Performance
APACHE KAFKA / Kafka Connect / Kafka Streams
Flink vs. Spark
Evening out the uneven: dealing with skew in Flink
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Pinot: Near Realtime Analytics @ Uber
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
Ad

Similar to Stateful stream processing with Apache Flink (20)

PDF
Introduction To Flink
PPTX
Distributed Middleware Reliability & Fault Tolerance Support in System S
PPTX
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
PDF
Flink Forward SF 2017: Joe Olson - Using Flink and Queryable State to Buffer ...
PPTX
Flink Architecture
PDF
20031109 WRUG Presentation
PPTX
Unit 2...............................................
PPTX
introduction to operating system unit 2
PDF
Flink at netflix paypal speaker series
PPTX
UNIT 2 OS.pptx Introduction of Operating System
PPTX
Processing management
PPTX
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
PPTX
Centralized logging with Flume
PDF
Operating system Interview Questions
PDF
Flink forward-2017-netflix keystones-paas
PPTX
Debunking Common Myths in Stream Processing
PPTX
Program control board in Operating System
PPT
Rtos Concepts
Introduction To Flink
Distributed Middleware Reliability & Fault Tolerance Support in System S
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
Flink Forward SF 2017: Joe Olson - Using Flink and Queryable State to Buffer ...
Flink Architecture
20031109 WRUG Presentation
Unit 2...............................................
introduction to operating system unit 2
Flink at netflix paypal speaker series
UNIT 2 OS.pptx Introduction of Operating System
Processing management
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
Centralized logging with Flume
Operating system Interview Questions
Flink forward-2017-netflix keystones-paas
Debunking Common Myths in Stream Processing
Program control board in Operating System
Rtos Concepts
Ad

More from Knoldus Inc. (20)

PPTX
Angular Hydration Presentation (FrontEnd)
PPTX
Optimizing Test Execution: Heuristic Algorithm for Self-Healing
PPTX
Self-Healing Test Automation Framework - Healenium
PPTX
Kanban Metrics Presentation (Project Management)
PPTX
Java 17 features and implementation.pptx
PPTX
Chaos Mesh Introducing Chaos in Kubernetes
PPTX
GraalVM - A Step Ahead of JVM Presentation
PPTX
Nomad by HashiCorp Presentation (DevOps)
PPTX
Nomad by HashiCorp Presentation (DevOps)
PPTX
DAPR - Distributed Application Runtime Presentation
PPTX
Introduction to Azure Virtual WAN Presentation
PPTX
Introduction to Argo Rollouts Presentation
PPTX
Intro to Azure Container App Presentation
PPTX
Insights Unveiled Test Reporting and Observability Excellence
PPTX
Introduction to Splunk Presentation (DevOps)
PPTX
Code Camp - Data Profiling and Quality Analysis Framework
PPTX
AWS: Messaging Services in AWS Presentation
PPTX
Amazon Cognito: A Primer on Authentication and Authorization
PPTX
ZIO Http A Functional Approach to Scalable and Type-Safe Web Development
PPTX
Managing State & HTTP Requests In Ionic.
Angular Hydration Presentation (FrontEnd)
Optimizing Test Execution: Heuristic Algorithm for Self-Healing
Self-Healing Test Automation Framework - Healenium
Kanban Metrics Presentation (Project Management)
Java 17 features and implementation.pptx
Chaos Mesh Introducing Chaos in Kubernetes
GraalVM - A Step Ahead of JVM Presentation
Nomad by HashiCorp Presentation (DevOps)
Nomad by HashiCorp Presentation (DevOps)
DAPR - Distributed Application Runtime Presentation
Introduction to Azure Virtual WAN Presentation
Introduction to Argo Rollouts Presentation
Intro to Azure Container App Presentation
Insights Unveiled Test Reporting and Observability Excellence
Introduction to Splunk Presentation (DevOps)
Code Camp - Data Profiling and Quality Analysis Framework
AWS: Messaging Services in AWS Presentation
Amazon Cognito: A Primer on Authentication and Authorization
ZIO Http A Functional Approach to Scalable and Type-Safe Web Development
Managing State & HTTP Requests In Ionic.

Recently uploaded (20)

PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
Cloud computing and distributed systems.
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Electronic commerce courselecture one. Pdf
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Machine learning based COVID-19 study performance prediction
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Advanced IT Governance
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
KodekX | Application Modernization Development
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Cloud computing and distributed systems.
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Electronic commerce courselecture one. Pdf
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Machine learning based COVID-19 study performance prediction
NewMind AI Weekly Chronicles - August'25 Week I
Chapter 3 Spatial Domain Image Processing.pdf
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
Mobile App Security Testing_ A Comprehensive Guide.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
NewMind AI Monthly Chronicles - July 2025
Advanced IT Governance
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Spectral efficient network and resource selection model in 5G networks
KodekX | Application Modernization Development

Stateful stream processing with Apache Flink

  • 1. Presented By: Kundan Kumar Software Consultant Stateful Stream Processing with Apache Flink
  • 2. Lack of etiquette and manners is a huge turn off. KnolX Etiquettes Punctuality Respect Knolx session timings, you are requested not to join sessions after a 5 minutes threshold post the session start time. Feedback Make sure to submit a constructive feedback for all sessions as it is very helpful for the presenter. Mute Be on mute until you have questions or concerns.
  • 3. Agenda 01 What is stateful stream processing 02 Flink takes on stateful stream processing Demo 03
  • 4. What is Stateful Stream Processing? Streaming and Stream Processing: Stream processing is the processing of data in motion, or in other words, computing on data directly as it is produced or received. The systems that receive and send the data streams and execute the application or analytics logic are called stream processors.
  • 5. Stateful Stream Processing: Stateful stream processing is a subset of stream processing in which the computation maintains contextual state. This state is used to store information derived from the previously-seen events. Stateful stream processing means a “State” is shared between events(stream entities). And therefore past events can influence the way the current events are processed.
  • 6. Flink takes on stateful stream processing Flink in nutshell- ● Apache Flink is a Big Data framework and distributed processing engine for stateful computations over unbounded and bounded data streams.
  • 7. ➢ A Flink application may consume real-time data from streaming sources such as message queues or distributed logs, like Apache Kafka or Kinesis. ➢ Flink can also consume bounded, historic data from a variety of data sources. ➢ The streams of results being produced by a Flink application can be sent to a wide variety of systems that can be connected as sinks. ➢ Fast, In memory, scalable, large state, fault tolerant, event time, exactly once.
  • 9. ➢ Programs in Flink are inherently parallel and distributed. ➢ During execution, a stream has one or more stream partitions, and each operator has one or more operator subtasks.
  • 10. ➢ Flink facilitate stateful operations. ➢ Current handling event can depend on the accumulated effect of all the events that came before it. ➢ The set of parallel instances of a stateful operator is effectively a sharded key-value store. Each parallel instance is responsible for handling events for a specific group of keys, and the state for those keys is kept locally.
  • 12. States in Flink ➢ Operator State: State is maintained on per operator basis on stream. Special type of state used in source and sink implementations. ➢ Keyed State: Maintaining state on per key basis on stream. Stores state associated with the same key. Embedded key value store. ➢ Broadcast State: Special type of operator state used where records of one stream will be broadcast to all downstream task which needs access to those records. ➢ Queryable State: Feature that allow client API’s to query Jobstate from outside Flink.
  • 15. State Backends 1. Memory state backend: ➢ This is the default backend used by Flink in case nothing is configured. ➢ Persists the data in the memory of each task manager’s Heap. ➢ This state should never be used in production jobs. ➢ The state creates a backup of the data (also known as checkpointing) in the job manager memory which puts unnecessary pressure on the job manager's operational stability.
  • 16. 2. File System Backend ➢ This backend is similar to Memory state backend except, it stores the backup on the filesystem rather than job manager’s memory. ➢ The filesystem can be task manager's local filesystem or a durable store such as HDFS/S3. 3. RocksDB backend ➢ This backend uses RocksDB by Facebook to store the data ➢ RocksDB maintains an in-memory table (also known as mem-table) along with bloom filters, reading recent data also is extremely fast. ➢ Each task manager maintains its own Rocks DB file and the backup of this state is checkpointed to a durable store such as HDFS/S3. ➢ This is the only backend which offers support for incremental checkpointing i.e. taking a backup of only modified data rather than complete data.
  • 17. Checkpointing Checkpoint: Specific marked point in each input stream from which stream can replayed. Flink implements it by persisting state of all stateful operator. Periodically save state to reliable storage system. Stream Barriers: Lightweight stream marker with unique ID’s. Injected by Flink into input stream and flow with stream in line.
  • 20. Demo
  • 21. Q/A