SlideShare a Scribd company logo
Introducing Apache Kafka®
Paul Brebner
Technical Evangelist February 2019
What is Kafka?
Message flow
Distributed streams
Processing System
Messages sent by
Distributed Producers
to
Distributed Consumers
Consumers
via
Distributed Kafka Cluster
Cluster
Kafka Benefits
 Fast
 Scalable
 Reliable
 Durable
 Open Source
 Managed Service
Kafka benefits
Kafka Benefits
 Fast – high throughput and low latency
 Scalable – horizontally scalable with nodes and partitions
 Reliable – distributed and fault tolerant
 Durable - zero data loss, messages persisted to disk with immutable log
 Open Source – An Apache project
 Available as a Managed Service - on multiple cloud platforms
Kafka benefits
Message flow
Message flow
A visual introduction to Apache Kafka
To send messages from A to B
“A” is the Producer – sends a message, to
“B” the Consumer (recipient) of the message
Due to decline in “snail mail” direct deliveries
Due to decline in “snail mail” direct deliveries
Instead … “Poste Restante”
“Poste Restante”?
• Not a post office in a
restaurant
• General delivery (in the
US)
• The mail is delivered to
a post office, they hold
it for you until you call
Consumers “poll” for messages by visiting the
Poste Restante counter at the post office
Kafka topics act like a Post Office
Benefits include
• Disconnected delivery –
consumer doesn’t need to
be available to receive
messages
• Less effort for the
messaging service – only
has to deliver to a few
locations not every
consumer
• Can scale better and
handle more complex
delivery semantics!
Scalability? Many consumers for a topic?
A single counter introduces delays
More counters increases concurrency
Kafka topics have >= 1 Partitions (“counters”)
• Partitions increase
consumer concurrency
• Increase throughput
• Reduce latency
Santa
North Pole
What’s a Kafka message?
A Record – like a letter
Santa
North PoleTopic
Topic is the destination
Santa
North PoleTopic
Timestamp, offset, partition
The “Postmark”
Santa
North PoleTopic
Timestamp, offset, partition
The “Postmark”
Time semantics are flexible
time of event creation,
ingestion,
or processing.
Santa
North PoleTopic
Timestamp, offset, partition
Key -> Partition (optional)
Key (optional)
Santa
North PoleTopic
Timestamp, offset, partition
Key -> Partition (optional)
Key (optional)
Refines the destination
Send to Santa not just any Elf
Santa
North PoleTopic
Timestamp, offset, partition
Key -> Partition (optional)
Value is contents (byte array)
Santa
North PoleTopic
Timestamp, offset, partition
Key -> Partition (optional)
Kafka Producers and Consumers
need a serializer and de-serializer
to write & read key and value
• Kafka doesn’t look
at the value
• Consumer can
read value
• And try to make
sense of the
message
• What will Santa
be delivering?!
Next…
Delivery Semantics
Do we care
if the message arrives?
Yes! Guaranteed delivery is desirable
But homing pigeons got lost or eaten
Send multiple pigeons
One pigeon may make it
Eaten
Lost
Producer
Broker
1
Broker
2
Broker
3
M1
How does Kafka guarantee delivery?
Producer
Broker
1
Broker
2
Broker
3
M1
A Message (M1) is written to a broker (2)
Producer
Broker
1
Broker
2
Broker
3
M1
M1
The message is always persisted to disk.
Producer
Broker
1
Broker
2
Broker
3
M1
M1
This makes it resilient to loss of power.
Producer
Broker
1
Broker
2
Broker
3
M1 M1
M1 M1M1
The message is also replicated on multiple “brokers”
Producer
Broker
1
Broker
2
Broker
3
M1
M1 M1M1
Which makes it resilient to loss of most servers
Producer
Broker
1
Broker
2
Broker
3
Acknowledgement
M1 M1 M1
Producer gets acknowledgement
once the message is persisted and replicated
(configurable)
Consumer
Broker
1
Broker
2
Broker
3
M1 M1 M1
Multiple Brokers and Partitions 
increased read availability and concurrency
Producer
Consumer
Consumer
Consumer
Consumer
The 2nd aspect of delivery semantics:
Who gets the messages?
How many times are messages delivered?
Producer
Consumer
Consumer
Consumer
Consumer
Delivery Semantics - Kafka is “pub-sub”
- Loosely coupled
- Producers and consumers don’t know about each other
Producer
Topic “Parties”
Consumer
Consumer
Consumer
Consumer
Which consumers get which messages
(filtering), is topic based
Topic “Work”
Producer
Topic “Parties”
Consumer
Consumer
Consumer
Consumer
Topic “Work”
Consumers Subscribe to topic “Parties”
Subscribe
Producer
Topic “Parties”
Consumer
Consumer
Consumer
Consumers
Subscribed to Topic “Parties”
Consumer
Publishers send messages to topics
Send
Topic “Work”Send
Producer
Topic “Parties”
Consumer
Consumer
Consumer
Consumers
Subscribed to Topic “Parties”
Consumer
Consumers only receive messages from
subscribed topics
Consumers Poll
To receive messages
from ”Parties”
Topic “Work” Consumers not subscribed to “Work”
Don’t receive any “Work” messages
Partitions and Consumer Groups
Enable sharing of work across
consumers
Duplicate message delivery
Each message is
delivered to each
subscribed
consumer group
Producer
Partition 1
Partition 2
Partition n
Topic “Parties”
C1
Consumer Group
Consumer Group
Consumer
Consumer
Consumer
Consumer
Consumers subscribed to topic are allocated partitions
They will only get messages from their allocated partitions.
Producer
Partition 1
Partition 2
Partition n
Topic “Parties”
C1
Consumer Group
Consumer
Consumer
Consumer
Consumers share work
within groups
Consumers in the same group share the work around
Each consumer gets only a subset of messages
Producer
Partition 1
Partition 2
Partition n
Topic “Parties”
C1
Consumer Group
Consumer Group
Consumer
Consumer
Consumer
Consumer
Messages are duplicated across
Consumer groups
Multiple groups enable message broadcasting
Messages are duplicated across groups, each consumer
group receives a copy of each message.
Key
Partition based delivery
Which messages are delivered to
which consumers?
If a message has a key, then Kafka
uses Partition based delivery.
Messages with the same key are
always sent to the same partition
and therefore the same
consumer.
And the order is guaranteed.
No Key
Round robin delivery
If the key is null, then
Kafka uses round robin
delivery
Each message is delivered
to the next partition
Time for an Example, with 2 consumer groups.
Consumer Group = Nerds
Multiple consumers
Consumer Group = Nerds
Multiple consumers
Consumer Group = Hairy
Single consumer
Producer
Partition 1
Partition 2
Partition n
Topic “Parties”
C1
Group “Nerds”
Group “Hairy”
Consumer 1 (Bill)
Consumer 2 (Paul)
Consumer n
Consumer 1 (Chewy)
Consumers
Subscribed to “Parties”
No Key
Round Robin
M1
M2
etc
M1
M1
M1
M2
M2
M2
Case 1: No Key
Message (M1, M2, etc) sent to the next partition
All consumers allocated to that partition will receive a message when they poll next.
Here’s what happens (not showing producer or topics, have to imagine them)
1. Both Groups subscribe to Topic “parties” (11 partitions, so 1 consumer per partition).
Subscribe to “Parties” Subscribe to “Parties”
1. Both Groups subscribe to Topic “parties” (11 partitions, so 1 consumer per partition).
2. Producer sends record “Cool pool party – Invitation”
<key=null, value=“Cool pool party - Invitation”> to “parties” topic (no key)
1. Both Groups subscribe to Topic “parties” (11 partitions, so 1 consumer per partition).
2. Producer sends record “Cool pool party - Invitation”> to “parties” topic
3. Bill and Chewbacca receive a copy of the invitation and plan to attend
4. Producer sends another record “Cool pool party – Cancelled”
<key=null, value=“Cool pool party - Cancelled”> to “parties” topic
4. Producer sends another record <key=null, value=“Cool pool party - Cancelled”> to “parties” topic
5. Paul and Chewbacca receive the cancellation.
Paul gets the message this time as it’s round robin, ignores it as he didn’t get the invitation. Bill wastes his
time trying to go to cancelled party. The rest of the gang aren’t surprised at not receiving any party invites and
stay at home to do some hacking. Chewy is only consumer in his group so gets all messages, plans something
fun instead…
A visual introduction to Apache Kafka
Producer
Partition 1
Partition 2
Partition n
Topic “Parties”
C1
Group “Nerds”
Group “Hairy”
Consumer 1 (Bill)
Consumer 2 (Paul)
Consumer n
Consumer 1 (Chewy)
Consumers
Subscribed to “Parties”
Key
Hashed to partition
M1, M2
etc
M1, M2
M1, M2
M1, M2
M3
M3
M3
M3
Case 2: If there is a Key
A key is hashed to a partition, and a Message with that key is always sent to that partition.
Assume there are 3 messages, and messages 1 and 2 are hashed to same partition.
Here’s what happens with a key: key is “title” of the message (e.g. “Cool pool party”)
Same set up as before:
1. Both Groups subscribe to Topic “parties” (11 partitions).
1. Both Groups subscribe to Topic “parties” (11 partitions).
2. Producer sends record <key=“Cool pool party”, value=“Invitation”> to “parties” topic
1. Both Groups subscribe to Topic “parties” (11 partitions).
2. Producer sends record <key=“Cool pool party”, value=“Invitation”> to “parties” topic
3. As before Bill and Chewbacca receive a copy of the invitation and plan to attend
4. Producer sends another record <key=“Cool pool party”, value=“Cancelled”> to “parties” topic
4. Producer sends another record <key=“Cool pool party”, value=“Cancelled”> to “parties” topic
5. Bill and Chewbacca receive the cancellation (same consumers this time, as identical key)
6. Producer sends another record <key=“Horrible Halloween party”, value=“Invitation”> to ”parties” topic
6. Producer sends another record <key=“Horrible Halloween party”, value=“Invitation”> to ”parties” topic
7. Paul and Chewy receive the invitation
Paul receives the Halloween invitation as the key is different and the record is sent to the partition that Paul is
allocated to
Chewy is the only consumer in his group so he gets every record no matter what partition it’s sent to
A visual introduction to Apache Kafka
A visual introduction to Apache Kafka
Example Kafka Use Cases
Real-time data pipeline
Read-time data pipeline features:
• Ingestion of multiple heterogeneous sources
• Sending data to multiple heterogeneous sinks
• Acts as a buffer to smooth out load spikes
• Enables use cases which reprocess data (e.g. disaster recovery)
Anomaly Detection Pipeline
Real-time Event processing pipeline:
• Simple event driven applications (If X then Y…)
• May write and read from other data sources (e.g. Cassandra)
• New Events sent back to Kafka or to other systems
• E.g. Anomaly Detection, check out my current blog series if you are interested in this example.
Kafka Streams Processing (Kongo IoT Blog series)
Streams processing features:
• Complex streams processing (multiple events and streams)
• Time, windows, and transformations
• Uses Kafka Streams API, includes state store
• Visualization of the streams topology
• Continuously computes the loads for trucks and checks if they are overloaded.
Linkedin - Before Kafka (BK)
A real example from Linkedin, who developed Kafka.
Before Kafka they had spaghetti integration of monolithic applications.
To accommodate growing membership and increasing site complexity, they migrated from a monolithic
application infrastructure to one based on microservices, which made the integration even more complex!
After Kafka (AK)
Rather than maintaining and scaling each pipeline individually, they invested in the
development of a single, distributed pub-sub platform - Kafka was born.
The main benefit was better Service decoupling and independent scaling.
The End (of the introduction) -
Find out more
Apache Kafka: https://kafka.apache.org/
Instaclustr blogs
• Mix of Cassandra, Spark, Zeppelin and Kafka
https://www.instaclustr.com/paul-brebner/
• Kafka introduction
https://insidebigdata.com/2018/04/12/developing-deeper-understanding-apache-kafka-architecture/
https://insidebigdata.com/2018/04/19/developing-deeper-understanding-apache-kafka-architecture-part-2-
• Kongo – Kafka IoT logistics application blog series
https://www.instaclustr.com/instaclustr-kongo-iot-logistics-streaming-demo-application/
• Anomaly detection with Kafka and Cassandra (and Kubernetes), current blog series
https://www.instaclustr.com/anomalia-machina-1-massively-scalable-anomaly-detection-with-apache-kafka-
Instaclustr’s Managed Kafka (Free trial)
https://www.instaclustr.com/solutions/managed-apache-kafka/
Ad

Recommended

PDF
Apache Kafka Architecture & Fundamentals Explained
confluent
 
PDF
Fundamentals of Apache Kafka
Chhavi Parasher
 
PPTX
Introduction to Apache Kafka
Jeff Holoman
 
PPTX
Apache Kafka
Saroj Panyasrivanit
 
ODP
Stream processing using Kafka
Knoldus Inc.
 
PDF
1. Brochure Isolateur Composite Ines max
Amzil Yassine
 
PDF
Getting Started with Kubernetes
VMware Tanzu
 
PPTX
Kafka presentation
Mohammed Fazuluddin
 
PPTX
Kafka 101
Clement Demonchy
 
PPTX
Introduction to Apache Kafka
AIMDek Technologies
 
PDF
Apache Kafka Fundamentals for Architects, Admins and Developers
confluent
 
PPTX
kafka
Amikam Snir
 
PPTX
Apache Kafka Best Practices
DataWorks Summit/Hadoop Summit
 
PDF
From Zero to Hero with Kafka Connect
confluent
 
PPTX
Kafka 101
Aparna Pillai
 
PPTX
Apache Kafka
emreakis
 
PDF
Apache Kafka Introduction
Amita Mirajkar
 
PDF
Apache Kafka - Martin Podval
Martin Podval
 
PPTX
APACHE KAFKA / Kafka Connect / Kafka Streams
Ketan Gote
 
PDF
Introduction to apache kafka
Dimitris Kontokostas
 
PDF
An Introduction to Apache Kafka
Amir Sedighi
 
PPTX
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Jean-Paul Azar
 
PDF
Producer Performance Tuning for Apache Kafka
Jiangjie Qin
 
PDF
Introduction to Kafka Streams
Guozhang Wang
 
PPTX
Apache kafka
Kumar Shivam
 
PDF
Introduction to Apache Kafka
Shiao-An Yuan
 
PDF
Apache kafka
NexThoughts Technologies
 
PDF
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
PPTX
Kafka101
Aparna Pillai
 

More Related Content

What's hot (20)

PPTX
Kafka 101
Clement Demonchy
 
PPTX
Introduction to Apache Kafka
AIMDek Technologies
 
PDF
Apache Kafka Fundamentals for Architects, Admins and Developers
confluent
 
PPTX
kafka
Amikam Snir
 
PPTX
Apache Kafka Best Practices
DataWorks Summit/Hadoop Summit
 
PDF
From Zero to Hero with Kafka Connect
confluent
 
PPTX
Kafka 101
Aparna Pillai
 
PPTX
Apache Kafka
emreakis
 
PDF
Apache Kafka Introduction
Amita Mirajkar
 
PDF
Apache Kafka - Martin Podval
Martin Podval
 
PPTX
APACHE KAFKA / Kafka Connect / Kafka Streams
Ketan Gote
 
PDF
Introduction to apache kafka
Dimitris Kontokostas
 
PDF
An Introduction to Apache Kafka
Amir Sedighi
 
PPTX
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Jean-Paul Azar
 
PDF
Producer Performance Tuning for Apache Kafka
Jiangjie Qin
 
PDF
Introduction to Kafka Streams
Guozhang Wang
 
PPTX
Apache kafka
Kumar Shivam
 
PDF
Introduction to Apache Kafka
Shiao-An Yuan
 
PDF
Apache kafka
NexThoughts Technologies
 
Kafka 101
Clement Demonchy
 
Introduction to Apache Kafka
AIMDek Technologies
 
Apache Kafka Fundamentals for Architects, Admins and Developers
confluent
 
Apache Kafka Best Practices
DataWorks Summit/Hadoop Summit
 
From Zero to Hero with Kafka Connect
confluent
 
Kafka 101
Aparna Pillai
 
Apache Kafka
emreakis
 
Apache Kafka Introduction
Amita Mirajkar
 
Apache Kafka - Martin Podval
Martin Podval
 
APACHE KAFKA / Kafka Connect / Kafka Streams
Ketan Gote
 
Introduction to apache kafka
Dimitris Kontokostas
 
An Introduction to Apache Kafka
Amir Sedighi
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Jean-Paul Azar
 
Producer Performance Tuning for Apache Kafka
Jiangjie Qin
 
Introduction to Kafka Streams
Guozhang Wang
 
Apache kafka
Kumar Shivam
 
Introduction to Apache Kafka
Shiao-An Yuan
 

Similar to A visual introduction to Apache Kafka (14)

PDF
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
PPTX
Kafka101
Aparna Pillai
 
PDF
Apache Kafka Women Who Code Meetup
Snehal Nagmote
 
PDF
Stream Processing Metamorphosis - A Kafka's tale
João Vazão Vasques
 
PPTX
From a Kafkaesque Story to The Promised Land at LivePerson
LivePerson
 
PPSX
Animated: Components of Kafka
Arvind Singh Rawat
 
PPTX
04-Kafka.pptx
MannMehta13
 
PPTX
04-Kafka.pptx
AdityaGanguly12
 
PPTX
From a kafkaesque story to The Promised Land
Ran Silberman
 
PDF
Exactly once delivery is a harsh mistress - DevOps Days TLV
Natan Silnitsky
 
PDF
Exactly Once Delivery is a Harsh Mistress - Natan Silnitsky
DevOpsDays Tel Aviv
 
PDF
Exactly Once Delivery with Kafka - JOTB2020 Mini Session
Natan Silnitsky
 
PPTX
Kafka.pptx
Tarun techme
 
PDF
TDEA 2018 Kafka EOS (Exactly-once)
Erhwen Kuo
 
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
Kafka101
Aparna Pillai
 
Apache Kafka Women Who Code Meetup
Snehal Nagmote
 
Stream Processing Metamorphosis - A Kafka's tale
João Vazão Vasques
 
From a Kafkaesque Story to The Promised Land at LivePerson
LivePerson
 
Animated: Components of Kafka
Arvind Singh Rawat
 
04-Kafka.pptx
MannMehta13
 
04-Kafka.pptx
AdityaGanguly12
 
From a kafkaesque story to The Promised Land
Ran Silberman
 
Exactly once delivery is a harsh mistress - DevOps Days TLV
Natan Silnitsky
 
Exactly Once Delivery is a Harsh Mistress - Natan Silnitsky
DevOpsDays Tel Aviv
 
Exactly Once Delivery with Kafka - JOTB2020 Mini Session
Natan Silnitsky
 
Kafka.pptx
Tarun techme
 
TDEA 2018 Kafka EOS (Exactly-once)
Erhwen Kuo
 
Ad

More from Paul Brebner (20)

PPTX
Streaming More For Less With Apache Kafka Tiered Storage
Paul Brebner
 
PDF
30 Of My Favourite Open Source Technologies In 30 Minutes
Paul Brebner
 
PDF
Superpower Your Apache Kafka Applications Development with Complementary Open...
Paul Brebner
 
PDF
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Paul Brebner
 
PDF
Architecting Applications With Multiple Open Source Big Data Technologies
Paul Brebner
 
PDF
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
Paul Brebner
 
PDF
Apache ZooKeeper and Apache Curator: Meet the Dining Philosophers
Paul Brebner
 
PDF
Spinning your Drones with Cadence Workflows and Apache Kafka
Paul Brebner
 
PDF
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...
Paul Brebner
 
PDF
Scaling Open Source Big Data Cloud Applications is Easy/Hard
Paul Brebner
 
PDF
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
Paul Brebner
 
PDF
A Visual Introduction to Apache Kafka
Paul Brebner
 
PDF
Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...
Paul Brebner
 
PDF
Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...
Paul Brebner
 
PDF
Grid Middleware – Principles, Practice and Potential
Paul Brebner
 
PDF
Grid middleware is easy to install, configure, secure, debug and manage acros...
Paul Brebner
 
PPTX
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Paul Brebner
 
PPTX
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Paul Brebner
 
PPTX
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
Paul Brebner
 
PPTX
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Paul Brebner
 
Streaming More For Less With Apache Kafka Tiered Storage
Paul Brebner
 
30 Of My Favourite Open Source Technologies In 30 Minutes
Paul Brebner
 
Superpower Your Apache Kafka Applications Development with Complementary Open...
Paul Brebner
 
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Paul Brebner
 
Architecting Applications With Multiple Open Source Big Data Technologies
Paul Brebner
 
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
Paul Brebner
 
Apache ZooKeeper and Apache Curator: Meet the Dining Philosophers
Paul Brebner
 
Spinning your Drones with Cadence Workflows and Apache Kafka
Paul Brebner
 
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...
Paul Brebner
 
Scaling Open Source Big Data Cloud Applications is Easy/Hard
Paul Brebner
 
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
Paul Brebner
 
A Visual Introduction to Apache Kafka
Paul Brebner
 
Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...
Paul Brebner
 
Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...
Paul Brebner
 
Grid Middleware – Principles, Practice and Potential
Paul Brebner
 
Grid middleware is easy to install, configure, secure, debug and manage acros...
Paul Brebner
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Paul Brebner
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Paul Brebner
 
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
Paul Brebner
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Paul Brebner
 
Ad

Recently uploaded (20)

PDF
AI vs Human Writing: Can You Tell the Difference?
Shashi Sathyanarayana, Ph.D
 
PDF
Cracking the Code - Unveiling Synergies Between Open Source Security and AI.pdf
Priyanka Aash
 
PDF
ReSTIR [DI]: Spatiotemporal reservoir resampling for real-time ray tracing ...
revolcs10
 
PPTX
CapCut Pro Crack For PC Latest Version {Fully Unlocked} 2025
pcprocore
 
PPTX
"How to survive Black Friday: preparing e-commerce for a peak season", Yurii ...
Fwdays
 
PDF
10 Key Challenges for AI within the EU Data Protection Framework.pdf
Priyanka Aash
 
DOCX
Daily Lesson Log MATATAG ICT TEchnology 8
LOIDAALMAZAN3
 
PDF
WebdriverIO & JavaScript: The Perfect Duo for Web Automation
digitaljignect
 
PDF
Lessons Learned from Developing Secure AI Workflows.pdf
Priyanka Aash
 
PDF
A Constitutional Quagmire - Ethical Minefields of AI, Cyber, and Privacy.pdf
Priyanka Aash
 
PPTX
Security Tips for Enterprise Azure Solutions
Michele Leroux Bustamante
 
PDF
Coordinated Disclosure for ML - What's Different and What's the Same.pdf
Priyanka Aash
 
PDF
EIS-Webinar-Engineering-Retail-Infrastructure-06-16-2025.pdf
Earley Information Science
 
PPTX
OpenACC and Open Hackathons Monthly Highlights June 2025
OpenACC
 
PDF
cnc-processing-centers-centateq-p-110-en.pdf
AmirStern2
 
PPTX
UserCon Belgium: Honey, VMware increased my bill
stijn40
 
PDF
Oh, the Possibilities - Balancing Innovation and Risk with Generative AI.pdf
Priyanka Aash
 
PDF
Quantum AI Discoveries: Fractal Patterns Consciousness and Cyclical Universes
Saikat Basu
 
PDF
9-1-1 Addressing: End-to-End Automation Using FME
Safe Software
 
PDF
Quantum AI: Where Impossible Becomes Probable
Saikat Basu
 
AI vs Human Writing: Can You Tell the Difference?
Shashi Sathyanarayana, Ph.D
 
Cracking the Code - Unveiling Synergies Between Open Source Security and AI.pdf
Priyanka Aash
 
ReSTIR [DI]: Spatiotemporal reservoir resampling for real-time ray tracing ...
revolcs10
 
CapCut Pro Crack For PC Latest Version {Fully Unlocked} 2025
pcprocore
 
"How to survive Black Friday: preparing e-commerce for a peak season", Yurii ...
Fwdays
 
10 Key Challenges for AI within the EU Data Protection Framework.pdf
Priyanka Aash
 
Daily Lesson Log MATATAG ICT TEchnology 8
LOIDAALMAZAN3
 
WebdriverIO & JavaScript: The Perfect Duo for Web Automation
digitaljignect
 
Lessons Learned from Developing Secure AI Workflows.pdf
Priyanka Aash
 
A Constitutional Quagmire - Ethical Minefields of AI, Cyber, and Privacy.pdf
Priyanka Aash
 
Security Tips for Enterprise Azure Solutions
Michele Leroux Bustamante
 
Coordinated Disclosure for ML - What's Different and What's the Same.pdf
Priyanka Aash
 
EIS-Webinar-Engineering-Retail-Infrastructure-06-16-2025.pdf
Earley Information Science
 
OpenACC and Open Hackathons Monthly Highlights June 2025
OpenACC
 
cnc-processing-centers-centateq-p-110-en.pdf
AmirStern2
 
UserCon Belgium: Honey, VMware increased my bill
stijn40
 
Oh, the Possibilities - Balancing Innovation and Risk with Generative AI.pdf
Priyanka Aash
 
Quantum AI Discoveries: Fractal Patterns Consciousness and Cyclical Universes
Saikat Basu
 
9-1-1 Addressing: End-to-End Automation Using FME
Safe Software
 
Quantum AI: Where Impossible Becomes Probable
Saikat Basu
 

A visual introduction to Apache Kafka

  • 1. Introducing Apache Kafka® Paul Brebner Technical Evangelist February 2019
  • 2. What is Kafka? Message flow Distributed streams Processing System Messages sent by Distributed Producers to Distributed Consumers Consumers via Distributed Kafka Cluster Cluster
  • 3. Kafka Benefits  Fast  Scalable  Reliable  Durable  Open Source  Managed Service Kafka benefits
  • 4. Kafka Benefits  Fast – high throughput and low latency  Scalable – horizontally scalable with nodes and partitions  Reliable – distributed and fault tolerant  Durable - zero data loss, messages persisted to disk with immutable log  Open Source – An Apache project  Available as a Managed Service - on multiple cloud platforms Kafka benefits
  • 8. To send messages from A to B
  • 9. “A” is the Producer – sends a message, to
  • 10. “B” the Consumer (recipient) of the message
  • 11. Due to decline in “snail mail” direct deliveries
  • 12. Due to decline in “snail mail” direct deliveries
  • 13. Instead … “Poste Restante”
  • 14. “Poste Restante”? • Not a post office in a restaurant • General delivery (in the US) • The mail is delivered to a post office, they hold it for you until you call
  • 15. Consumers “poll” for messages by visiting the Poste Restante counter at the post office
  • 16. Kafka topics act like a Post Office
  • 17. Benefits include • Disconnected delivery – consumer doesn’t need to be available to receive messages • Less effort for the messaging service – only has to deliver to a few locations not every consumer • Can scale better and handle more complex delivery semantics!
  • 19. A single counter introduces delays
  • 20. More counters increases concurrency
  • 21. Kafka topics have >= 1 Partitions (“counters”) • Partitions increase consumer concurrency • Increase throughput • Reduce latency
  • 22. Santa North Pole What’s a Kafka message? A Record – like a letter
  • 24. Santa North PoleTopic Timestamp, offset, partition The “Postmark”
  • 25. Santa North PoleTopic Timestamp, offset, partition The “Postmark” Time semantics are flexible time of event creation, ingestion, or processing.
  • 26. Santa North PoleTopic Timestamp, offset, partition Key -> Partition (optional) Key (optional)
  • 27. Santa North PoleTopic Timestamp, offset, partition Key -> Partition (optional) Key (optional) Refines the destination Send to Santa not just any Elf
  • 28. Santa North PoleTopic Timestamp, offset, partition Key -> Partition (optional) Value is contents (byte array)
  • 29. Santa North PoleTopic Timestamp, offset, partition Key -> Partition (optional) Kafka Producers and Consumers need a serializer and de-serializer to write & read key and value
  • 30. • Kafka doesn’t look at the value • Consumer can read value • And try to make sense of the message • What will Santa be delivering?!
  • 31. Next… Delivery Semantics Do we care if the message arrives?
  • 32. Yes! Guaranteed delivery is desirable
  • 33. But homing pigeons got lost or eaten
  • 35. One pigeon may make it Eaten Lost
  • 40. Producer Broker 1 Broker 2 Broker 3 M1 M1 M1 M1M1 The message is also replicated on multiple “brokers”
  • 41. Producer Broker 1 Broker 2 Broker 3 M1 M1 M1M1 Which makes it resilient to loss of most servers
  • 42. Producer Broker 1 Broker 2 Broker 3 Acknowledgement M1 M1 M1 Producer gets acknowledgement once the message is persisted and replicated (configurable)
  • 43. Consumer Broker 1 Broker 2 Broker 3 M1 M1 M1 Multiple Brokers and Partitions  increased read availability and concurrency
  • 44. Producer Consumer Consumer Consumer Consumer The 2nd aspect of delivery semantics: Who gets the messages? How many times are messages delivered?
  • 45. Producer Consumer Consumer Consumer Consumer Delivery Semantics - Kafka is “pub-sub” - Loosely coupled - Producers and consumers don’t know about each other
  • 46. Producer Topic “Parties” Consumer Consumer Consumer Consumer Which consumers get which messages (filtering), is topic based Topic “Work”
  • 48. Producer Topic “Parties” Consumer Consumer Consumer Consumers Subscribed to Topic “Parties” Consumer Publishers send messages to topics Send Topic “Work”Send
  • 49. Producer Topic “Parties” Consumer Consumer Consumer Consumers Subscribed to Topic “Parties” Consumer Consumers only receive messages from subscribed topics Consumers Poll To receive messages from ”Parties” Topic “Work” Consumers not subscribed to “Work” Don’t receive any “Work” messages
  • 50. Partitions and Consumer Groups Enable sharing of work across consumers
  • 51. Duplicate message delivery Each message is delivered to each subscribed consumer group
  • 52. Producer Partition 1 Partition 2 Partition n Topic “Parties” C1 Consumer Group Consumer Group Consumer Consumer Consumer Consumer Consumers subscribed to topic are allocated partitions They will only get messages from their allocated partitions.
  • 53. Producer Partition 1 Partition 2 Partition n Topic “Parties” C1 Consumer Group Consumer Consumer Consumer Consumers share work within groups Consumers in the same group share the work around Each consumer gets only a subset of messages
  • 54. Producer Partition 1 Partition 2 Partition n Topic “Parties” C1 Consumer Group Consumer Group Consumer Consumer Consumer Consumer Messages are duplicated across Consumer groups Multiple groups enable message broadcasting Messages are duplicated across groups, each consumer group receives a copy of each message.
  • 55. Key Partition based delivery Which messages are delivered to which consumers? If a message has a key, then Kafka uses Partition based delivery. Messages with the same key are always sent to the same partition and therefore the same consumer. And the order is guaranteed.
  • 56. No Key Round robin delivery If the key is null, then Kafka uses round robin delivery Each message is delivered to the next partition
  • 57. Time for an Example, with 2 consumer groups.
  • 58. Consumer Group = Nerds Multiple consumers
  • 59. Consumer Group = Nerds Multiple consumers Consumer Group = Hairy Single consumer
  • 60. Producer Partition 1 Partition 2 Partition n Topic “Parties” C1 Group “Nerds” Group “Hairy” Consumer 1 (Bill) Consumer 2 (Paul) Consumer n Consumer 1 (Chewy) Consumers Subscribed to “Parties” No Key Round Robin M1 M2 etc M1 M1 M1 M2 M2 M2 Case 1: No Key Message (M1, M2, etc) sent to the next partition All consumers allocated to that partition will receive a message when they poll next.
  • 61. Here’s what happens (not showing producer or topics, have to imagine them)
  • 62. 1. Both Groups subscribe to Topic “parties” (11 partitions, so 1 consumer per partition). Subscribe to “Parties” Subscribe to “Parties”
  • 63. 1. Both Groups subscribe to Topic “parties” (11 partitions, so 1 consumer per partition). 2. Producer sends record “Cool pool party – Invitation” <key=null, value=“Cool pool party - Invitation”> to “parties” topic (no key)
  • 64. 1. Both Groups subscribe to Topic “parties” (11 partitions, so 1 consumer per partition). 2. Producer sends record “Cool pool party - Invitation”> to “parties” topic 3. Bill and Chewbacca receive a copy of the invitation and plan to attend
  • 65. 4. Producer sends another record “Cool pool party – Cancelled” <key=null, value=“Cool pool party - Cancelled”> to “parties” topic
  • 66. 4. Producer sends another record <key=null, value=“Cool pool party - Cancelled”> to “parties” topic 5. Paul and Chewbacca receive the cancellation. Paul gets the message this time as it’s round robin, ignores it as he didn’t get the invitation. Bill wastes his time trying to go to cancelled party. The rest of the gang aren’t surprised at not receiving any party invites and stay at home to do some hacking. Chewy is only consumer in his group so gets all messages, plans something fun instead…
  • 68. Producer Partition 1 Partition 2 Partition n Topic “Parties” C1 Group “Nerds” Group “Hairy” Consumer 1 (Bill) Consumer 2 (Paul) Consumer n Consumer 1 (Chewy) Consumers Subscribed to “Parties” Key Hashed to partition M1, M2 etc M1, M2 M1, M2 M1, M2 M3 M3 M3 M3 Case 2: If there is a Key A key is hashed to a partition, and a Message with that key is always sent to that partition. Assume there are 3 messages, and messages 1 and 2 are hashed to same partition.
  • 69. Here’s what happens with a key: key is “title” of the message (e.g. “Cool pool party”) Same set up as before: 1. Both Groups subscribe to Topic “parties” (11 partitions).
  • 70. 1. Both Groups subscribe to Topic “parties” (11 partitions). 2. Producer sends record <key=“Cool pool party”, value=“Invitation”> to “parties” topic
  • 71. 1. Both Groups subscribe to Topic “parties” (11 partitions). 2. Producer sends record <key=“Cool pool party”, value=“Invitation”> to “parties” topic 3. As before Bill and Chewbacca receive a copy of the invitation and plan to attend
  • 72. 4. Producer sends another record <key=“Cool pool party”, value=“Cancelled”> to “parties” topic
  • 73. 4. Producer sends another record <key=“Cool pool party”, value=“Cancelled”> to “parties” topic 5. Bill and Chewbacca receive the cancellation (same consumers this time, as identical key)
  • 74. 6. Producer sends another record <key=“Horrible Halloween party”, value=“Invitation”> to ”parties” topic
  • 75. 6. Producer sends another record <key=“Horrible Halloween party”, value=“Invitation”> to ”parties” topic 7. Paul and Chewy receive the invitation Paul receives the Halloween invitation as the key is different and the record is sent to the partition that Paul is allocated to Chewy is the only consumer in his group so he gets every record no matter what partition it’s sent to
  • 79. Real-time data pipeline Read-time data pipeline features: • Ingestion of multiple heterogeneous sources • Sending data to multiple heterogeneous sinks • Acts as a buffer to smooth out load spikes • Enables use cases which reprocess data (e.g. disaster recovery)
  • 80. Anomaly Detection Pipeline Real-time Event processing pipeline: • Simple event driven applications (If X then Y…) • May write and read from other data sources (e.g. Cassandra) • New Events sent back to Kafka or to other systems • E.g. Anomaly Detection, check out my current blog series if you are interested in this example.
  • 81. Kafka Streams Processing (Kongo IoT Blog series) Streams processing features: • Complex streams processing (multiple events and streams) • Time, windows, and transformations • Uses Kafka Streams API, includes state store • Visualization of the streams topology • Continuously computes the loads for trucks and checks if they are overloaded.
  • 82. Linkedin - Before Kafka (BK) A real example from Linkedin, who developed Kafka. Before Kafka they had spaghetti integration of monolithic applications. To accommodate growing membership and increasing site complexity, they migrated from a monolithic application infrastructure to one based on microservices, which made the integration even more complex!
  • 83. After Kafka (AK) Rather than maintaining and scaling each pipeline individually, they invested in the development of a single, distributed pub-sub platform - Kafka was born. The main benefit was better Service decoupling and independent scaling.
  • 84. The End (of the introduction) - Find out more Apache Kafka: https://kafka.apache.org/ Instaclustr blogs • Mix of Cassandra, Spark, Zeppelin and Kafka https://www.instaclustr.com/paul-brebner/ • Kafka introduction https://insidebigdata.com/2018/04/12/developing-deeper-understanding-apache-kafka-architecture/ https://insidebigdata.com/2018/04/19/developing-deeper-understanding-apache-kafka-architecture-part-2- • Kongo – Kafka IoT logistics application blog series https://www.instaclustr.com/instaclustr-kongo-iot-logistics-streaming-demo-application/ • Anomaly detection with Kafka and Cassandra (and Kubernetes), current blog series https://www.instaclustr.com/anomalia-machina-1-massively-scalable-anomaly-detection-with-apache-kafka- Instaclustr’s Managed Kafka (Free trial) https://www.instaclustr.com/solutions/managed-apache-kafka/

Editor's Notes

  • #3: What is Kafka? Kafka is a distributed streams processing system, it allows distributed producers to send messages to distributed consumers via a Kafka cluster.
  • #4: Kafka benefits Fast – high throughput and low latency Scalable – horizontally scalable, just add nodes and partitions Reliable – distributed and fault tolerant Zero data loss – messages are persisted to disk with immutable log Open Source – An Apache project Available as a Managed Service - on multiple cloud platforms
  • #5: Kafka benefits Fast – high throughput and low latency Scalable – horizontally scalable, just add nodes and partitions Reliable – distributed and fault tolerant Zero data loss – messages are persisted to disk with immutable log Open Source – An Apache project Available as a Managed Service - on multiple cloud platforms
  • #6: Back to the intro Kafka overview diagram, it’s a bit monochrome and boring. This talk will be more colourful and it’s going to be an extended story…
  • #7: Back to the intro Kafka overview diagram, it’s a bit monochrome and boring. This talk will be more colourful and it’s going to be an extended story…
  • #8: Let’s build a modern day fully electronic postal service – the Kafka Postal Service
  • #9: What does a postal service do? It sends messages from A to B (animated with sounds, click!)
  • #10: A is a producer, sends a message…
  • #11: To B, the consumer.
  • #12: Actually, no. Due to the decline in “snail mail” volumes, direct deliveries have been cancelled.
  • #13: Actually, no. Due to the decline in “snail mail” volumes, direct deliveries have been cancelled.
  • #14: Instead we have “Poste Restante” - Not a post office in a restaurant It’s called general delivery (in the US). The mail is delivered to a post office, and they hold it for you until you call for it.
  • #15: Consumers poll for messages by visiting the counter at the post office.
  • #16: Consumers poll for messages by visiting the counter at the post office.
  • #17: Kafka topics act like a Post Office. What are the benefits? Disconnected delivery – consumer doesn’t need to be available to receive messages Less effort for the messaging service – only has to delivery to a few locations not larger number of consumer addresses Can scale better and handle more complex delivery semantics!
  • #18: Kafka topics act like a Post Office. What are the benefits? Disconnected delivery – consumer doesn’t need to be available to receive messages Less effort for the messaging service – only has to delivery to a few locations not larger number of consumer addresses Can scale better and handle more complex delivery semantics!
  • #19: First lets see how it scales. What if there are many consumers for a topic?
  • #20: A single counter introduces delays and limits concurrency
  • #21: More counters increases concurrency and can reduce delays
  • #22: Kafka Topics have 1 or more Partitions, partitions function like multiple counters and enable high concurrency.
  • #23: Before looking at delivery semantics what does a message actually look like? In Kafka a message is called a Record and is like a letter.
  • #24: The topic is the destination.
  • #25: The “Postmark” includes a timestamp, offset in the topic, and the partition it was sent to. Time semantics are flexible, either the time of event creation, ingestion, or processing.
  • #26: The “Postmark” includes a timestamp, offset in the topic, and the partition it was sent to. Time semantics are flexible, either the time of event creation, ingestion, or processing.
  • #27: There’s also a thing called a Key, which is is optional. It refines the destination so it’s a bit like the rest of the address. We want this letter sent to Santa not just any Elf.
  • #28: There’s also a thing called a Key, which is is optional. It refines the destination so it’s a bit like the rest of the address. We want this letter sent to Santa not just any Elf.
  • #29: The value is the contents (just a byte array). Kafka Producers and consumers need to have a shared serializer and de-serializer for both the key and value.
  • #30: The value is the contents (just a byte array). Kafka Producers and consumers need to have a shared serializer and de-serializer for both the key and value.
  • #31: Kafka doesn’t look inside the value, but the Producer and Consumer can, and the Consumer can try and make sense of the message… I wonder what Santa will be delivering?
  • #32: Next lets look at delivery semantics. For example, do we care if the message actually arrives or not?
  • #33: Yes we do! Guaranteed message delivery is desirable. Homing pigeons got lost or eaten, so need to send the message with multiple pigeons
  • #34: Yes we do! Guaranteed message delivery is desirable. Homing pigeons got lost or eaten, so need to send the message with multiple pigeons
  • #35: Yes we do! Guaranteed message delivery is desirable. Homing pigeons got lost or eaten, so need to send the message with multiple pigeons
  • #36: Yes we do! Guaranteed message delivery is desirable. Homing pigeons got lost or eaten, so need to send the message with multiple pigeons
  • #37: How does Kafka guarantee delivery? A Message (M1) is written to a broker (2)
  • #38: How does Kafka guarantee delivery? A Message (M1) is written to a broker (2)
  • #39: and the message is always persisted to disk.
  • #40: This makes it resilient to loss of power.
  • #41: The message is also replicated on multiple “brokers” , 3 is typical
  • #42: And makes it resilient to loss of most servers
  • #43: Finally the producer gets acknowledgement once the message is persisted and replicated (configurable for number, and sync or async) The message is now available from more than one broker in case some fail. This also increases the read concurrency as partitions are spread over multiple brokers.
  • #44: Finally the producer gets acknowledgement once the message is persisted and replicated (configurable for number, and sync or async) The message is now available from more than one broker in case some fail. This also increases the read concurrency as partitions are spread over multiple brokers.
  • #45: Now let’s look at the 2nd aspect of delivery semantics. Who gets the messages and how many times are messages delivered? Kafka is “pub-sub”, It’s loosely coupled, producers and consumers don’t know about each other
  • #46: Now let’s look at the 2nd aspect of delivery semantics. Who gets the messages and how many times are messages delivered? Kafka is “pub-sub”, It’s loosely coupled, producers and consumers don’t know about each other
  • #47: Filtering, or which consumers get which messages, is topic based Publishers send messages to topics Consumers subscribe to topics of interest, e.g. parties. When they poll they only receive messages sent to those topics.
  • #48: Filtering, or which consumers get which messages, is topic based Publishers send messages to topics Consumers subscribe to topics of interest, e.g. parties. When they poll they only receive messages sent to those topics.
  • #49: Filtering, or which consumers get which messages, is topic based Publishers send messages to topics Consumers subscribe to topics of interest, e.g. parties. When they poll they only receive messages sent to those topics.
  • #50: Filtering, or which consumers get which messages, is topic based Publishers send messages to topics Consumers subscribe to topics of interest, e.g. parties. When they poll they only receive messages sent to those topics.
  • #51: Just a few more details and we can see how this works. Partitions and consumer groups enable sharing of work across multiple consumers, the more partitions a topic has the more consumers it supports
  • #52: Kafka supports delivery of the same message to multiple consumers. Kafka doesn’t throw messages away immediately they are delivered, so the same message can easily be delivered to multiple consumer groups.
  • #53: Consumers subscribed to ”parties” topic are allocated partitions. When they poll they will only get messages from their allocated partitions.
  • #54: This enables consumers in the same group to share the work around. Each consumer gets only a subset of the available messages.
  • #55: Multiple groups enable message broadcasting. Messages are duplicated across groups, as each consumer group receives a copy of each message.
  • #56: Which messages are delivered to which consumers? The final aspect of delivery semantics is to do with message keys. If a message has a key, then Kafka uses Partition based delivery. Messages with the same key are always sent to the same partition and therefore the same consumer. And the order is guaranteed.
  • #57: If the key is null, then Kafka uses round robin delivery. Each message is delivered to the next partition.
  • #58: Let’s look at an example with 2 consumer groups. Nerds, which has multiple consumers, and Hairy which has a single consumer.
  • #59: Let’s look at an example with 2 consumer groups. Nerds, which has multiple consumers, and Hairy which has a single consumer.
  • #60: Let’s look at an example with 2 consumer groups. Nerds, which has multiple consumers, and Hairy which has a single consumer.
  • #61: Looking at the case where there’s No Key 1st, each message (1, 2, etc) is sent to the next partition, and all consumers allocated to that partition will receive the message when they poll next.
  • #62: Here’s what actually happens. We’re not showing the producer or topic for simplicity. You’ll have to imagine them.
  • #63: Both Groups subscribe to Topic “parties” (11 partitions, so 1 consumer per partition).
  • #64: 2 Producer sends record with the value “Cool pool party - Invitation” to “parties” topic (there’s no key)
  • #65: 3 Bill and Chewbacca receive a copy of the invitation and plan to attend
  • #66: 4. Producer sends another record with the value “Cool pool party – Cancelled” to “parties” topic
  • #67: In the Nerds group, Paul gets the message this time as it’s round robin, and Chewy gets it as he’s the only consumer in his group. Paul ignores it as he didn’t get the original invite Bill wastes his time trying to go The rest of the gang aren’t surprised at not receiving any invites and stay home to do some hacking Chewy plans something else fun instead...
  • #68: A visit to the hairdressers!
  • #69: How does it work if there is a Key? The key is hashed to a partition, and the Message is sent to that partition. Assume there are 3 messages, and messages 1 and 2 are hashed to same partition.
  • #70: Here’s what happens with a key, assuming that the key is the “title” of the message (“Cool pool party”) As before Both Groups subscribe to Topic “parties” (11 partitions). Producer sends record <key=“Cool pool party”, value=“Invitation”> to “parties” topic
  • #71: Here’s what happens with a key, assuming that the key is the “title” of the message (“Cool pool party”) As before Both Groups subscribe to Topic “parties” (11 partitions). Producer sends record <key=“Cool pool party”, value=“Invitation”> to “parties” topic
  • #72: As before, Bill and Chewbacca receive a copy of the invitation and plan to attend
  • #73: 4. Producer sends another record with the same key but a cancellation value to “parties” topic
  • #74: This time, Bill and Chewbacca receive the cancellation (the same consumers as the key is identical)
  • #75: Producer sends out another invitation to a Halloween party. The key is different this time.
  • #76: Paul receives the Halloween invitation as the key is different and the record is sent to the partition that Paul is allocated Chewy is the only consumer in his group so he gets every record no matter what partition it’s sent to
  • #77: This time Chewy gets dressed up and goes to the party.
  • #78: Who did you imagine was producing the invitations? Maybe this fellow.
  • #80: Here’s a UML-like diagram with the main Kafka components. We’ve introduced Producers, Topics, Partitions, Consumer Groups and Consumes today. There’s a lot more to explore, including how Kafka provides replication, and the Connect and Streaming APIs.
  • #81: To finish up here are three important use cases for Kafka
  • #82: The Real-time Data pipeline features Ingestion of multiple heterogeneous sources Sending data to multiple heterogeneous sinks Acts as a buffer to smooth out load spikes Enables use cases which reprocess data (e.g. disaster recovery)
  • #83: Real-time Event processing features: Simple event driven applications (If X then Y…) May write and read from other data sources (e.g. Cassandra) New Events sent back to Kafka or to other systems E.g. Anomaly Detection, check out my current blog series if you are interested in this example.
  • #84: Streams processing features: Complex streams processing (multiple events and streams) Time, windows, and transformations Streams API, includes state store This a visualization of the streams topology from the streams processing pipeline from my previous blog series, Kongo, a Kafka IoT logistics application. It continuously computes the loads for trucks and checks if they are overloaded.
  • #85: Here’s a real example from Linkedin, who developed Kafka. Before Kafka they had spaghetti integration of monolithic applications. To accommodate growing membership and increasing site complexity, they migrated from a monolithic application infrastructure to one based on microservices, which made the integration even more complex.
  • #86: After Kafka Rather than maintaining and scaling each pipeline individually, they invested in the development of a single, distributed pub-sub platform. Thus, Kafka was born. This main benefit was better Service decoupling and independent scaling.