#DevoxxFR
Apache Kafka
Patterns / AntiPatterns
Florent Ramière @framiere
Jean-Louis Boudart @jlboudart
1
2
Problem ?
3
Silos explained by Data Gravity concept
As data accumulates (builds mass) there is a greater
likelihood that additional services and applications will
be attracted to this data.
This is the same effect gravity has on objects around a
planet. As the mass or density increases, so does the
strength of gravitational pull.
4
With
5
How
6
Store & ETL Process
Publish &
Subscribe
In short
7
From a simple idea
8
From a simple idea
9
with great properties !
• Scalability
• Retention
• Durability
• Replication
• Security
• Resiliency
• Throughput
• Ordering
• Exactly Once Semantic
• Transaction
• Idempotency
• Immutability
• …
10
10
So goooooood
11
11
What could potentially go wrong ?
12
13
13
…which is true for any data systems
14
14
Not thinking about Durability
15
Data durability If you didn’t think
about it… it’s not
durable!
16
17
18
19
19
And you might have lost data!
20
Data durability Kafka is not waiting
for a disk flush by
default.
Durability is achieved
through replication.
21
22
23
24
25
Data durability It depends on your
configuration...
26
27
28
29
30
Data durability acks=1 (default value)
good for latency
acks=all
good for durability
31
32
acks=all The leader will wait for
the full set of in-sync
replicas to acknowledge
the record.
33
34
35
36
min.insync.replicas minimum number of
replicas that must
acknowledge.
Default is 1.
37
38
38
Data Durability while Producing ?
Tune it with the parameters acks and
min.insync.replicas
39
defaults The default values are
optimized for availability
& latency.
If durability is more
important, tune it!
40
40
Deploying on multi datacenters ?
41
42
Multi-dc It’s quite complicated...
It’s easy to make it wrong
on many levels.
It could be a 3h talk.
43
Multi-dc Disaster recovery for
multi datacenter
44
44
What about the consumers ?
45
consumers Consumer can read only
committed data.
46
47
47
Think about data durability and
decide of the best trade-off for you
48
Throughput, latency,
durability, availability
Optimizing your Apache
Kafka deployment
49
49
Focusing only on the happy path
50
51
52
53
retries It will cause the client to
resend any record whose
send fails with a
potentially transient error.
Default value : 0
54
55
retries Use built in retries !
Bump it from 0 to infinity!
56
retries But you are exposed to a
different kind of issue…
57
58
enable.idempotence When set to 'true',
the producer will ensure
that exactly
one copy of each
message is written.
Default value: false
59
60
61
61
Use built in idempotency!
62
62
But it does not save you from
- Managing exception and failure
- Developing Idempotent consumer
63
63
No Idempotent consumer
64
65
65
At least once (default)
At most once
Exactly Once
66
67
68
69
70
71
72
commit Manually committing
aggressively...
Add a huge workload on
Apache Kafka
73
74
commit Manually committing
aggressively...
Does not provide exactly
once semantic
75
75
Embrace at least once
76
76
Rely on Kafka Streams
with Exactly Once !
77
77
No exception handling
78
79
Future<RecordMetadata> send(ProducerRecord<K, V> record);
80
Future<RecordMetadata> send(ProducerRecord<K, V> record,
Callback callback);
81
producer.send(record, (metadata, exception) -> {
});
82
error handling We don’t expect the
unexpected until the
unexpected is expected.
83
84
error handling A message can not be
processed
85
error handling A message can not be
processed
A message doesn’t have
the expected schema
86
86
Retry
87
88
88
Infinite retry
89
90
90
Write to a dead letter queue and
continue
91
92
92
Ignore and continue
93
94
94
No silver bullet
95
95
Handle the exceptions !
https://eng.uber.com/reliable-reprocessing/
96
96
No data governance
97
98
99
100
governance Changes in producers
might impact consumers
101
governance Schema registry
102
103
104
104
Share Schemas
105
105
Let bad citizens wander around
106
107
107
Leverage Security, ACL and Quota
Security
Authorization and ACLs
Enforcing Client Quotas
108
108
Installing prod on Sunday night
109
110
configuration If you use the default
configuration…
You will have issues!
111
111
Please read the doc
Running Kafka in Production
Running ZooKeeper in Production
112
112
Not configuring your OS
113
114
os Tune at least your open
file descriptors and
mmap count.
115
115
Configure your os
Running Kafka in Production
116
116
Disregarding Apache Zookeeper
117
117
Not understanding Ordering
118
118
No monitoring
119
119
Too much partitions
120
120
Not enough partitions
121
121
Partition key choice
122
122
Topics vs Partitions
123
123
Call external services in Kafka
Streams
124
124
Questions

More Related Content

PDF
Devoxx university - Kafka de haut en bas
PDF
Implementing Domain Events with Kafka
PPTX
Apache Kafka 0.8 basic training - Verisign
PDF
Building Event Driven (Micro)services with Apache Kafka
PDF
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
PDF
Dual write strategies for microservices
PDF
When NOT to use Apache Kafka?
PPTX
RabbitMQ & Kafka
Devoxx university - Kafka de haut en bas
Implementing Domain Events with Kafka
Apache Kafka 0.8 basic training - Verisign
Building Event Driven (Micro)services with Apache Kafka
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
Dual write strategies for microservices
When NOT to use Apache Kafka?
RabbitMQ & Kafka

What's hot (20)

PDF
Apache Kafka - Martin Podval
PPTX
A visual introduction to Apache Kafka
PDF
Apache Kafka Introduction
PDF
Kafka Streams: What it is, and how to use it?
PDF
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
PDF
Hello, kafka! (an introduction to apache kafka)
PPTX
Kafka presentation
PDF
Fundamentals of Apache Kafka
PPTX
Introduction to Apache Kafka
PPTX
RedisConf17- Using Redis at scale @ Twitter
PPTX
Kafka Tutorial - introduction to the Kafka streaming platform
PPTX
Apache kafka
PPTX
Apache kafka
PDF
Introduction to Apache Flink - Fast and reliable big data processing
PPTX
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
PDF
An Introduction to Apache Kafka
PPTX
Introduction to Apache Kafka
PDF
Kafka Streams State Stores Being Persistent
PDF
Introduction to apache kafka
PPTX
Apache Kafka
Apache Kafka - Martin Podval
A visual introduction to Apache Kafka
Apache Kafka Introduction
Kafka Streams: What it is, and how to use it?
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Hello, kafka! (an introduction to apache kafka)
Kafka presentation
Fundamentals of Apache Kafka
Introduction to Apache Kafka
RedisConf17- Using Redis at scale @ Twitter
Kafka Tutorial - introduction to the Kafka streaming platform
Apache kafka
Apache kafka
Introduction to Apache Flink - Fast and reliable big data processing
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
An Introduction to Apache Kafka
Introduction to Apache Kafka
Kafka Streams State Stores Being Persistent
Introduction to apache kafka
Apache Kafka
Ad

Similar to Apache Kafka - Patterns anti-patterns (20)

PPTX
Paris Kafka Meetup - patterns anti-patterns
PPTX
Webinar patterns anti patterns
PDF
Apache Kafka – (Pattern and) Anti-Pattern
DOCX
A Quick Guide to Refresh Kafka Skills
PDF
Reliability Guarantees for Apache Kafka
PDF
Deep dive into Apache Kafka consumption
PPTX
Kafka reliability velocity 17
PDF
Scaling big with Apache Kafka
PPTX
Apache Kafka Reliability
PPTX
Streaming in Practice - Putting Apache Kafka in Production
PDF
Error Handling with Kafka: From Patterns to Code
PDF
Fault Tolerance with Kafka
PDF
Designing a Scalable Data Platform
PPTX
Event Driven Architectures
PDF
Event Driven Architectures
PPTX
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
PDF
Apache Kafka® at Dropbox
PDF
Type safe, versioned, and rewindable stream processing with Apache {Avro, K...
PDF
Building High-Throughput, Low-Latency Pipelines in Kafka
PDF
Kafka Overview
Paris Kafka Meetup - patterns anti-patterns
Webinar patterns anti patterns
Apache Kafka – (Pattern and) Anti-Pattern
A Quick Guide to Refresh Kafka Skills
Reliability Guarantees for Apache Kafka
Deep dive into Apache Kafka consumption
Kafka reliability velocity 17
Scaling big with Apache Kafka
Apache Kafka Reliability
Streaming in Practice - Putting Apache Kafka in Production
Error Handling with Kafka: From Patterns to Code
Fault Tolerance with Kafka
Designing a Scalable Data Platform
Event Driven Architectures
Event Driven Architectures
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
Apache Kafka® at Dropbox
Type safe, versioned, and rewindable stream processing with Apache {Avro, K...
Building High-Throughput, Low-Latency Pipelines in Kafka
Kafka Overview
Ad

More from Florent Ramiere (10)

PDF
Back to database fundamentals aka the origin of the streaming platform.
PDF
Perfug 20-11-2019 - Kafka Performances
PDF
Back to database fundamentals
PDF
JHipster conf 2019 - Kafka Ecosystem
PDF
Beyond the brokers - Un tour de l'écosystème Kafka
PDF
Jug - ecosystem
PDF
Paris jug ksql - 2018-06-28
PDF
Chti jug - 2018-06-26
PDF
Riviera Jug - 20/03/2018 - KSQL
PDF
Riviera Jug - 20/03/2018 - Kafka streams
Back to database fundamentals aka the origin of the streaming platform.
Perfug 20-11-2019 - Kafka Performances
Back to database fundamentals
JHipster conf 2019 - Kafka Ecosystem
Beyond the brokers - Un tour de l'écosystème Kafka
Jug - ecosystem
Paris jug ksql - 2018-06-28
Chti jug - 2018-06-26
Riviera Jug - 20/03/2018 - KSQL
Riviera Jug - 20/03/2018 - Kafka streams

Recently uploaded (20)

PPTX
UNIT II: Software design, software .pptx
PPTX
Hexagone difital twin solution in the desgining
PDF
Coding with GPT-5- What’s New in GPT 5 That Benefits Developers.pdf
PPTX
AI Tools Revolutionizing Software Development Workflows
PDF
10 Mistakes Agile Project Managers Still Make
PDF
OpenImageIO Virtual Town Hall - August 2025
PPTX
Relevance Tuning with Genetic Algorithms
PPTX
Empowering Asian Contributions: The Rise of Regional User Groups in Open Sour...
PPTX
Greedy best-first search algorithm always selects the path which appears best...
PPTX
SAP Business AI_L1 Overview_EXTERNAL.pptx
PDF
Ragic Data Security Overview: Certifications, Compliance, and Network Safegua...
PPTX
Presentation - Summer Internship at Samatrix.io_template_2.pptx
PDF
Module 1 - Introduction to Generative AI.pdf
PDF
SBOM Document Quality Guide - OpenChain SBOM Study Group
PDF
MaterialX Virtual Town Hall - August 2025
PPTX
oracle_ebs_12.2_project_cutoveroutage.pptx
PDF
IDM Crack Activation Key 2025 Free Download
PDF
OpenColorIO Virtual Town Hall - August 2025
PDF
Canva Desktop App With Crack Free Download 2025?
PDF
Enscape 3D Crack + With 2025 Activation Key free
UNIT II: Software design, software .pptx
Hexagone difital twin solution in the desgining
Coding with GPT-5- What’s New in GPT 5 That Benefits Developers.pdf
AI Tools Revolutionizing Software Development Workflows
10 Mistakes Agile Project Managers Still Make
OpenImageIO Virtual Town Hall - August 2025
Relevance Tuning with Genetic Algorithms
Empowering Asian Contributions: The Rise of Regional User Groups in Open Sour...
Greedy best-first search algorithm always selects the path which appears best...
SAP Business AI_L1 Overview_EXTERNAL.pptx
Ragic Data Security Overview: Certifications, Compliance, and Network Safegua...
Presentation - Summer Internship at Samatrix.io_template_2.pptx
Module 1 - Introduction to Generative AI.pdf
SBOM Document Quality Guide - OpenChain SBOM Study Group
MaterialX Virtual Town Hall - August 2025
oracle_ebs_12.2_project_cutoveroutage.pptx
IDM Crack Activation Key 2025 Free Download
OpenColorIO Virtual Town Hall - August 2025
Canva Desktop App With Crack Free Download 2025?
Enscape 3D Crack + With 2025 Activation Key free

Apache Kafka - Patterns anti-patterns

Editor's Notes

  • #5: This is how a company is built Event-driven architecture isn’t new, but different enough now that it is a new type of animal. If we’re successful this will be a major data platform in companies and will redefine the architecture of a digital company. The people here will be part of making that happen.
  • #7: So what is a streaming platform? There are a set of core capabilities around data streams you have to have... The first is the ability to publish and subscribe to streams of data. This is something that’s been around for a long time. Messaging systems have been able to do this. What’s different now is the ability to store data and do it properly in a replicated manner. The final capability is to be able to process these streams of data. Initially starting off as a messaging system, over the years Kafka has evolved into a full-fledged distributed streaming platform that embodies the quintessential characteristics of this new category of infrastructure .