© 2019 Ververica
Automatic State Cleanup in Apache Flink
Deep Dive into State Time-To-Live (TTL)
Andrey Zagrebin, Software Engineer and Apache Flink Committer
Flink Forward Europe 2019
© 2019 Ververica
Agenda for the talk
• Assumptions about the audience
2
• State TTL feature
─ Why?
─ What is it?
─ How to use it?
• Tech deep dive
─ General idea: State Wrappers
─ Background Cleanup
• Concurrent background process
• Incremental Heap Backend cleanup
• TTL Compaction filter for RocksDB
• Future roadmap & Useful links
© 2019 Ververica
Assumptions about the audience
• Familiar with Apache Flink and its Keyed Stateful Processing
3
• OR think of the Flink State as a local KV store with a single-threaded
access
• Two types of Flink State storage backed by
─ in-memory Java object map (heap, non-serialized)
─ RocksDB embedded KV store (native memory + local drive, serialized)
© 2019 Ververica
State TTL feature
(since Flink 1.6, improved 1.8)
© 2019 Ververica
State TTL feature: Motivation
5
• Save space:
do not store what is not used
Implement some cleanup
on the Application level
OR …
Make Flink
take care about cleanup
under the hood
• Data privacy:
access for limited amount of time
e.g. using Flink Timers
which are separate state
Trade off: storage size implications!
Clean
up?
© 2019 Ververica
State TTL feature: Workflow
6
KEY VALUE Flink
State
Start
TTL timer
KEY VALUE
Read value
Time
Read value
Flink
State
Key/Value
Unexpired
… Use State ...
Expires Expired
… Forget State ...
Flink purges
the state
/ Update
Re-
Create
© 2019 Ververica
StateTtlConfig ttlConfig = StateTtlConfig
7
State TTL feature: Example
// Configure state TTL
// Set TTL
// When to restart TTL? OnCreateAndWrite or OnWriteAndRead
// Get expired state if still there?
// YES for cached, NO for GDPR
// Create state as usual
.newBuilder(Time.days(1))
.setUpdateType(UpdateType.OnCreateAndWrite )
.setStateVisibility (NeverReturnExpired )
.build();
ValueStateDescriptor <Long> lastLogin =
new
ValueStateDescriptor <>("login",Long.class); // Enable TTL
// Use unexpired
// Oops.. expired, not there anymore
lastUserLogin.enableTimeToLive (ttlConfig);
lastUserLogin.update(value); // Create/update for key
// do something during the day
lastUserLogin.get(); // -> value
// do nothing for more than a day
lastUserLogin.get(); // -> null
* For collections (List or Map), TTL applies per entry level
.cleanupIncrementally (10, false) // Activate automatic background cleanup
© 2019 Ververica
Tech Deep Dive
© 2019 Ververica
9
General idea: State Wrappers
KEY
USER
VALUE
Flink
State
Works with
TTL
State
Wrapper
TTL
State
Serializer
User
State
Serializer
TS
KEY
TTL STATE
USER
VALUE
TS
User State TTL Wrapper Normal Flink State
© 2019 Ververica
10
Background Cleanup
TTL
State
Single-Threaded
Check
Expired?
Drop
Needs synchronization!
● Complicated
● Performance implications
Clean
up?
© 2019 Ververica
11
Incremental Heap Backend cleanup (since Flink 1.8)
Single-Threaded
Check
Expired?
Drop
Clean up
...
….
...
Heap State
Global State Iterator
Global State Iterator
KEY VALUE
KEY VALUE
KEY VALUE
Global State Iterator
KEY VALUE
Global State Iterator
KEY VALUE
All state entries are periodically
cleaned up
IF the state is being accessed
© 2019 Ververica
RocksDB
TTL Compaction filter for RocksDB (since Flink 1.8)
Memtable
Immutable
SSTable 1
Flush to disk
Immutable
SSTable 2
Compacted
SSTable
Compaction
Updates
Flink TTL
Compaction
Filter
state
state
...
Iterates
Applied
per
entry
JVM Native C++
JNI
Ask for
current time
Configures
Clean up
FRocksDB
Check
Expired?
Drop
12
© 2019 Ververica
Future roadmap
• Event time support: FLINK-12005
• Support queryable state with TTL
• Flink Timer-based cleanup strategy
• FRocksDB -> RocksDB + Flink extensions (WIP)
13
© 2019 Ververica
Useful links
• The latest blogpost about the feature up to Flink 1.8
• FLIP-25: TTL design discussion
• JIRA Issue FLINK-3089
14
• User documentation for the State TTL
© 2019 Ververica
THANK YOU!
QUESTIONS?
© 2019 Ververica
www.ververica.com @VervericaDatainfo@ververica.com

Time to-live: How to Perform Automatic State Cleanup in Apache Flink - Andrey Zagrebin, Ververica

  • 1.
    © 2019 Ververica AutomaticState Cleanup in Apache Flink Deep Dive into State Time-To-Live (TTL) Andrey Zagrebin, Software Engineer and Apache Flink Committer Flink Forward Europe 2019
  • 2.
    © 2019 Ververica Agendafor the talk • Assumptions about the audience 2 • State TTL feature ─ Why? ─ What is it? ─ How to use it? • Tech deep dive ─ General idea: State Wrappers ─ Background Cleanup • Concurrent background process • Incremental Heap Backend cleanup • TTL Compaction filter for RocksDB • Future roadmap & Useful links
  • 3.
    © 2019 Ververica Assumptionsabout the audience • Familiar with Apache Flink and its Keyed Stateful Processing 3 • OR think of the Flink State as a local KV store with a single-threaded access • Two types of Flink State storage backed by ─ in-memory Java object map (heap, non-serialized) ─ RocksDB embedded KV store (native memory + local drive, serialized)
  • 4.
    © 2019 Ververica StateTTL feature (since Flink 1.6, improved 1.8)
  • 5.
    © 2019 Ververica StateTTL feature: Motivation 5 • Save space: do not store what is not used Implement some cleanup on the Application level OR … Make Flink take care about cleanup under the hood • Data privacy: access for limited amount of time e.g. using Flink Timers which are separate state Trade off: storage size implications! Clean up?
  • 6.
    © 2019 Ververica StateTTL feature: Workflow 6 KEY VALUE Flink State Start TTL timer KEY VALUE Read value Time Read value Flink State Key/Value Unexpired … Use State ... Expires Expired … Forget State ... Flink purges the state / Update Re- Create
  • 7.
    © 2019 Ververica StateTtlConfigttlConfig = StateTtlConfig 7 State TTL feature: Example // Configure state TTL // Set TTL // When to restart TTL? OnCreateAndWrite or OnWriteAndRead // Get expired state if still there? // YES for cached, NO for GDPR // Create state as usual .newBuilder(Time.days(1)) .setUpdateType(UpdateType.OnCreateAndWrite ) .setStateVisibility (NeverReturnExpired ) .build(); ValueStateDescriptor <Long> lastLogin = new ValueStateDescriptor <>("login",Long.class); // Enable TTL // Use unexpired // Oops.. expired, not there anymore lastUserLogin.enableTimeToLive (ttlConfig); lastUserLogin.update(value); // Create/update for key // do something during the day lastUserLogin.get(); // -> value // do nothing for more than a day lastUserLogin.get(); // -> null * For collections (List or Map), TTL applies per entry level .cleanupIncrementally (10, false) // Activate automatic background cleanup
  • 8.
  • 9.
    © 2019 Ververica 9 Generalidea: State Wrappers KEY USER VALUE Flink State Works with TTL State Wrapper TTL State Serializer User State Serializer TS KEY TTL STATE USER VALUE TS User State TTL Wrapper Normal Flink State
  • 10.
    © 2019 Ververica 10 BackgroundCleanup TTL State Single-Threaded Check Expired? Drop Needs synchronization! ● Complicated ● Performance implications Clean up?
  • 11.
    © 2019 Ververica 11 IncrementalHeap Backend cleanup (since Flink 1.8) Single-Threaded Check Expired? Drop Clean up ... …. ... Heap State Global State Iterator Global State Iterator KEY VALUE KEY VALUE KEY VALUE Global State Iterator KEY VALUE Global State Iterator KEY VALUE All state entries are periodically cleaned up IF the state is being accessed
  • 12.
    © 2019 Ververica RocksDB TTLCompaction filter for RocksDB (since Flink 1.8) Memtable Immutable SSTable 1 Flush to disk Immutable SSTable 2 Compacted SSTable Compaction Updates Flink TTL Compaction Filter state state ... Iterates Applied per entry JVM Native C++ JNI Ask for current time Configures Clean up FRocksDB Check Expired? Drop 12
  • 13.
    © 2019 Ververica Futureroadmap • Event time support: FLINK-12005 • Support queryable state with TTL • Flink Timer-based cleanup strategy • FRocksDB -> RocksDB + Flink extensions (WIP) 13
  • 14.
    © 2019 Ververica Usefullinks • The latest blogpost about the feature up to Flink 1.8 • FLIP-25: TTL design discussion • JIRA Issue FLINK-3089 14 • User documentation for the State TTL
  • 15.
    © 2019 Ververica THANKYOU! QUESTIONS?
  • 16.