Building Real-Time Streaming Pipelines With Apache Flink & PyFlink - by Yousef Yousefi - Medium
Building Real-Time Streaming Pipelines With Apache Flink & PyFlink - by Yousef Yousefi - Medium
1 of 15 6/14/2025, 10:13 AM
Building Real-Time Streaming Pipelines with Apache Flink & PyFlink | by Yousef Yousefi | ... https://medium.com/@usefusefi/building-real-time-streaming-pipelines-with-apache-flink-pyfl...
This guide will provide a deep dive into advanced Flink optimizations using
PyFlink (Python API for Apache Flink):
Optimizing event-time processing with Watermarks
Tuning RocksDB for large-scale stateful streaming
2 of 15 6/14/2025, 10:13 AM
Building Real-Time Streaming Pipelines with Apache Flink & PyFlink | by Yousef Yousefi | ... https://medium.com/@usefusefi/building-real-time-streaming-pipelines-with-apache-flink-pyfl...
3 of 15 6/14/2025, 10:13 AM
Building Real-Time Streaming Pipelines with Apache Flink & PyFlink | by Yousef Yousefi | ... https://medium.com/@usefusefi/building-real-time-streaming-pipelines-with-apache-flink-pyfl...
env = StreamExecutionEnvironment.get_execution_environment()
env.set_state_backend(EmbeddedRocksDBStateBackend()) # Use RocksDB for state management
4 of 15 6/14/2025, 10:13 AM
Building Real-Time Streaming Pipelines with Apache Flink & PyFlink | by Yousef Yousefi | ... https://medium.com/@usefusefi/building-real-time-streaming-pipelines-with-apache-flink-pyfl...
def extract_timestamp(event):
return event["timestamp"]
watermark_strategy = WatermarkStrategy \
.for_bounded_out_of_orderness(timedelta(seconds=10)) \
.with_timestamp_assigner(extract_timestamp)
stream = env.add_source(kafka_source).assign_timestamps_and_watermarks(watermark_strategy)
Key Points:
5 of 15 6/14/2025, 10:13 AM
Building Real-Time Streaming Pipelines with Apache Flink & PyFlink | by Yousef Yousefi | ... https://medium.com/@usefusefi/building-real-time-streaming-pipelines-with-apache-flink-pyfl...
class SumTransactions(ReduceFunction):
def reduce(self, a, b):
return {"user_id": a["user_id"], "amount": a["amount"] + b["amount"]}
windowed_stream = stream \
.key_by(lambda event: event["user_id"], key_type=Types.STRING()) \
.window(TumblingEventTimeWindows.of(timedelta(minutes=5))) \
.reduce(SumTransactions())
windowed_stream.print()
6 of 15 6/14/2025, 10:13 AM
Building Real-Time Streaming Pipelines with Apache Flink & PyFlink | by Yousef Yousefi | ... https://medium.com/@usefusefi/building-real-time-streaming-pipelines-with-apache-flink-pyfl...
Elasticsearch.
env = StreamExecutionEnvironment.get_execution_environment()
fraud_alerts = transactions \
.filter(detect_fraud)
7 of 15 6/14/2025, 10:13 AM
Building Real-Time Streaming Pipelines with Apache Flink & PyFlink | by Yousef Yousefi | ... https://medium.com/@usefusefi/building-real-time-streaming-pipelines-with-apache-flink-pyfl...
What’s happening?
8 of 15 6/14/2025, 10:13 AM
Building Real-Time Streaming Pipelines with Apache Flink & PyFlink | by Yousef Yousefi | ... https://medium.com/@usefusefi/building-real-time-streaming-pipelines-with-apache-flink-pyfl...
taskmanager.memory.process.size: 8GB
taskmanager.numberOfTaskSlots: 4
In Summary
Apache Flink is an industry-leading real-time processing engine, capable of
handling millions of events per second with low-latency stateful
processing. This article covered:
Stateful processing with RocksDB
Advanced event-time processing with Watermarks
Optimizing Flink performance
Building a Kafka → Flink → Elasticsearch fraud detection pipeline
9 of 15 6/14/2025, 10:13 AM
Building Real-Time Streaming Pipelines with Apache Flink & PyFlink | by Yousef Yousefi | ... https://medium.com/@usefusefi/building-real-time-streaming-pipelines-with-apache-flink-pyfl...
No responses yet
Sava Matic
10 of 15 6/14/2025, 10:13 AM
Building Real-Time Streaming Pipelines with Apache Flink & PyFlink | by Yousef Yousefi | ... https://medium.com/@usefusefi/building-real-time-streaming-pipelines-with-apache-flink-pyfl...
Feb 19 2 Feb 25 69
11 of 15 6/14/2025, 10:13 AM
Building Real-Time Streaming Pipelines with Apache Flink & PyFlink | by Yousef Yousefi | ... https://medium.com/@usefusefi/building-real-time-streaming-pipelines-with-apache-flink-pyfl...
Mar 6 3 Feb 24 3
12 of 15 6/14/2025, 10:13 AM
Building Real-Time Streaming Pipelines with Apache Flink & PyFlink | by Yousef Yousefi | ... https://medium.com/@usefusefi/building-real-time-streaming-pipelines-with-apache-flink-pyfl...
Apr 3 1 Feb 23 17
13 of 15 6/14/2025, 10:13 AM
Building Real-Time Streaming Pipelines with Apache Flink & PyFlink | by Yousef Yousefi | ... https://medium.com/@usefusefi/building-real-time-streaming-pipelines-with-apache-flink-pyfl...
Mar 7 Jun 3 33
5d ago Feb 23 1
14 of 15 6/14/2025, 10:13 AM
Building Real-Time Streaming Pipelines with Apache Flink & PyFlink | by Yousef Yousefi | ... https://medium.com/@usefusefi/building-real-time-streaming-pipelines-with-apache-flink-pyfl...
Help Status About Careers Press Blog Privacy Rules Terms Text to speech
15 of 15 6/14/2025, 10:13 AM