0% found this document useful (0 votes)

293 views72 pages

Kafka & Confluent: A Technical Guide

Kafka is a distributed streaming platform that allows publishing and subscribing to streams of records. It is scalable, fault-tolerant, and very fast. Kafka uses a publish-subscribe messaging model and is designed as a distributed transaction log. It partitions topics into segments spread across nodes in a cluster. Producers write data to topics that are consumed by consumer groups pulling data from the brokers. Zookeeper coordinates the cluster metadata.

Uploaded by

nadir nadjem

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

293 views72 pages

Kafka & Confluent: A Technical Guide

Uploaded by

nadir nadjem

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 72

Kafka - Spark streaming - ESGI 2020.

IPPON 2019
2018
What is Kafka Kafka architecture Produce and consume
messages

Motivations Start Kafka

● Kafka

What is Kafka ? ● Conﬂuent

LinkedIn before Kafka

Ippon Technologies 2018

Linkedin after Kafka

Ippon Technologies 2018

What is Kafka ?

● MOM (Message Oriented Middleware)

● Used to publish and subscribe to streams of
records
● It’s Scalable
● It’s Polyglot
● It’s Fast

Ippon Technologies 2018

What about today ?

● Created by Linkedin in 2009

● Open source since 2011
● Part of the Apache foundation
○ Very active community
○ Current version 2.3.0

● Spinoﬀ company Conﬂuent created in 2014

○ Jay Kreps, Neda Narkhede and Jun Rao
○ Created the conﬂuent platform
○ Raised several billion dollars (2,5 Billions 23/01/19)

Ippon Technologies 2018

What is Kafka ?

● Message bus
○ Written in Scala
○ Heavily inspired by transaction logs
● Initially created at LinkedIn in 2010
○ Open sourced in 2011
○ Became an apache top level project in 2012
● Designed to support batch and real time analytics
● Performs very well, especially at very large scale
What is Conﬂuent ?

● Founded in 2014 by the creators of Kafka

● Provides support, training etc. for Kafka
● Provides the conﬂuent platform
○ A lot of products to work with Kafka, to produce messages, transform data etc.
● Traditional systems

The importance of real time

Motivations ●

● The birth of Kafka

Traditional systems

● In a traditional system, data is dispatched over some databases

○ In a database, HDFS etc.
○ Each producer implements its own transformation logic et write into the database
● Over time, the system will grow
○ The codebase grows and becomes hard to maintain
Traditional systems

● At ﬁrst, it is easy to connect several systems, several data sources to

databases
Traditional systems

● But eventually it becomes hard to maintain

The importance of real time

● Batch processing is traditional and well known

○ We use this approach with Spark
○ Every day, week etc. I run my batch processing
● But it implies a strong restriction
○ I need to wait the batch to be executed to start data analysis
The importance of real time

● Nowadays, it is really common to have real time processing needs

○ Fraud detection
○ Recommander systems
○ Log monitoring
○ real time supplier for HDFS
○ etc.
Kafka

● Kafka has been created to solve 2 issues

○ Simplify the architecture of data ﬂows
○ Handle data streaming

● Kafka separate data production and consumption

○ Both are usually tied into one application
○ “Publish” / “Subscribe” concepts
Kafka

● Kafka is designed to work in a cluster

● A cluster is a set of instances (nodes) that know each other
Kafka

● Once the data in Kafka, It can be read by several / different consumers

○ A consumer which writes in HDFS, another applying an alerting process etc.
● Increasing the number of consumers does not have any signiﬁcant impact
on performances
● A consumer can be added without touching the producer
● Fundamentals

● Producing messages

Partitioning
The architecture ●

● Consuming messages

● Zookeeper
Fundamentals

● Data sent into Kafka are messages

○ Each message is key / value pair
○ By default, messages do not have any schema
● Each message is written in a topic
○ It is a way to group messages
○ Very close (conceptually) to a message queue
● Topics can be created in advance or dynamically by the producers
Fundamentals

● The 4 key components of Kafka are

○ Producers
○ Brokers
○ Consumers
○ Zookeeper
The producer

● It has the task of sending messages to the Kafka cluster

● One can write a producer in a lot of programming languages
○ Java, C, Python, Scala etc. In our case, it will be Scala
About messages

● It is a key / value pair

● keys and values can be of any type
○ You provide a serializer to tell to the producer how to transform the data into a byte array
● The key is optional
○ It is used for partitioning (see that soon)
○ Without any key provided, the message can be written in any partition
About partitioning

● Topics are splitted into partitions

● Each partition contains a subset of the topic’s messages
● Kafka use the key (hashed) to choose the partition where the message will
be written
● Partitions are dispatched on the whole cluster
The broker

● The broker is the heart of Kafka

● It receives messages and persists them
● Highly performant (can handle several millions of messages per second)
The broker

● A Kafka cluster usually contains several brokers

○ For development / testing purpose, we may work with only one
● Each broker handle one or several partitions
○ Partitions are dispatched over the whole cluster
The consumer

● It reads messages from Kafka

● Several consumers can read the same topic
○ Each consumer will receive all messages from the topic (default behaviour)
● It recevies messages by pulling them from Kafka
○ Other products offer to push them to the consumers
○ The main advantage of pulling is that it does not overload the consumer (backpressure)
○ The consumer reads as its own speed
Zookeeper

● Apache project
● It is a conﬁguration centralisation tool
● It is used by Kafka’s internals
Global architecture
● HDFS & RDBMS

Kafka versus ● CAP Theorem

HDFS & RDBMS

● Kafka is similar to products like RabbitMQ

○ RabbitMQ pushes messages to consumer
● Kafka can be used as a database, by modifying the message retention
duration
○ It is not its main purpose
○ It is hard to manipulate messages individually
● It is a kind of orchestrator, it supplies different services, different
databases
○ Such as HDFS
HDFS

● Distributed ﬁle system

● Scala extremely well
○ Even when more than a thousand of nodes composes the cluster
● Not so true for Cassandra or MongoDB
○ Beyond a certain number of nodes, performances decrease
CAP theorem

● Consistency, Availability, Partition tolerance

● A distributed system can only satisfy at most 2 of those properties
○ A RDBMS is CA, because not scalable so not concerned by P

C A P

Kafka X X

MongoDB X X

Cassandra X X

HDFS X X
● Partitions

Advanced ● Commit log

● Consumer group and offset

architecture ● Replicas
Partitions

● Each topic is divided into partitions

Each topic is divided into

one or several partitions

Partitions are distributed

over all the brokers in the
cluster
Partitions

● With partitions, we can scale. Data is no longer centralised mais

distributed
● Inside the same partition, data are read is the same order they are written.
Order is guaranteed on the partition
● On the other hand, from the point view of the topic, there is no order
guarantee between messages (coming from different partitions)
● This is why it is important to choose the right key if the order does matter
Commit log

● Data of each partition are persisted in a commit log

● Commonly implemented with a ﬁle in “append only” mode
○ Thus, data is immutable and reads / writes are highly eﬃcient
● Also used by classical RDBMS
○ To trace all the changes that happen on tables
Consumer group

● Several consumers can consume together as a consumer group

○ They will not read the same messages from a given topic
○ They will share messages, a message will be read only once in the group
● Each consumer will read from one or several partitions
● Data from a partition will be read by only one consumer in the group
Consumer group

Consumers in a group Single consumers

share partitions consume all the partitions
Consumer group

● The number of useful consumers is limited by the number of partitions

○ A useful consumer receives data
○ Others do not, they are waiting
Offset

● For each consumer group and each partition, Kafka keeps an offset (an
integer)
● It is the position of the last element read by a given consumer group in a
given partition
Offset

● When a consumer asks for a message, Kafka search for any offset it has
for this consumer group (in any partition of the requested topic) and send
the corresponding message
● When a consumer gets a message, it will commit it
● When a consumer commits, Kafka increments the offset for the given
partition
● We can ask Kafka to read from a speciﬁc offset. Thus the consumer can
consume from wherever it wants
Réplicas

● It is possible (and recommended) to replicate partitions

● Replicas are perfect copies of main partitions

Topic-Partition-1 Topic-Partition-2

Topic-Replica-2 Topic-Replica-1

Broker 1 Broker 2
Réplicas

● If a broker is down, the replica becomes the leading partition and thus we
can still consume / produce messages

Topic-Partition-1 Topic-Partition-2

Topic-Replica-2 Topic-Partition-1

Broker 1 Broker 2
● Start Kafka

Produce and ● Dependencies

● Produce
consume ● Consume
Start Kafka

● Download Zookeeper and Kafka

(https://www-us.apache.org/dist/zookeeper/current/zookeeper-3.4.12.tar.
gz &
https://www.apache.org/dyn/closer.cgi?path=/kafka/2.1.0/kafka_2.12-2.1
.0.tgz)
● bin/zookeeper-server-start.sh conﬁg/zookeeper.properties
● bin/kafka-server-start.sh ./conﬁg/server.properties
Console

● Kafka provides command line tools to manipulate topics, consume

messages etc.
● To create a topic
○ bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1
--topic test
Console

● To produce a message
○ bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
● To consume a topic
○ bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test
--from-beginning
Scala dependencies

● Kafka is a Scala dependency

○ One can use Maven or SBT
○ With Maven :
Scala dependencies

● Kafka is a Scala dependency

○ One can use Maven or SBT
○ With SBT :
Produce

● To start, we need to instantiate a producer

Produce

● Then we need to conﬁgure the producer. There are 3 mandatory properties:

○ The address of at least one broker
○ The serializers for the key and the value
○ Other serializers are provided by Kafka and we can deﬁne our own serializers
Produce

● Kafka provides a utility class to simplify the conﬁguration

Produce

● There is a lot of possible parameters

● Everything is documented
Produce

● To send a message
Produce

● The call to producer.send() is asynchronous (non blocking)

● It does not bloque the code
● To force a synchronous call (blocking), we need to call
producer.send().get()
Produce

● To get the result, there are two ways

● The call producer.send() returns a Future
○ Unfortunately, it is a Java Future, hard to use in Scala
Produce

● The method producer.send() can also take a function as parameter, a

callback
● When the call will be done, the callback function will be called
Consume

● As for the producer, we need to instantiate the consumer

Consume

● As for the producer, we need to conﬁgure the consumer

● Another parameter is mandatory, the group id
Consume

● Several other parameters can be set

● For example, the parameter enable.auto.commit is used to tell the
consumer if it has to commit automatically. Otherwise, it has to be done
manually
○ If the property is set to true, the consumer commit every auto.commit.interval.ms
(5000ms by default)
● By default, enable.auto.commit is set to true
Consume

● Then, we need to subscribe to topics we wish to consume

● Kafka will then dispatch partitions between every consumer (of a given
group)
Consume

● Then we can fetch the results

● The call to poll is synchronous. If not a single message is available, the
consumer waits the duration indicated in parameter before giving control
back to the user
Consume

● If we set the parameter enable.auto.commit to be false, we will have to

manually commit, otherwise we will indeﬁnitely read the same messages
Consume

● We can also asynchronously commit

Conﬂuent ecosystem

● Schema registry
○ Offers possibility to apply schemas to messages
● Kafka Streams
○ High level library (offers a DSL) to transform data between topics
○ Plays the role of T in ETL
● Kafka Connect
○ Offers connectors to supply Kafka with data or transform data from Kafka to other
systems
■ There are connectors for HDFS, ﬁle system, cassandra etc.
○ Plays the role of E in ETL if the connector is a source and L if it is a sink
● etc.
Kafka Streams

● High level API to consume and producer messages between topics

○ Is used to transform data
○ Kafka Streams also offers a low level API. We will concentrate on the high level API
● is an alternative to
○ Spark Streaming
○ Apache Storm
○ Akka stream
○ etc.
Kafka Streams

● Kafka streams has 2 concepts

● KStream
○ The topic is seen as a data ﬂow, where every data is independent from other data
● KTable
○ Similar to a changelog. Each data is seen as an update (depending on the key)
● For example, I have a topic with two elements (“euro”, 5) and (“euro”, 1)
○ If I create a KStream on this topic and sum the values in euros, I will get 6
○ If I create a KTable, I will get 1
Kafka Streams

● Kafka streams offers usual high level functions :

○ map
○ ﬁlter
○ groupByKey
○ count
○ etc.
Kafka Streams

● Simple example
Kafka Streams

● Word count example

NoSQL Intro
No ratings yet
NoSQL Intro
26 pages
Hive Queries
No ratings yet
Hive Queries
5 pages
PG-DAC Student Performance Report
No ratings yet
PG-DAC Student Performance Report
1 page
Advanced Python Material
No ratings yet
Advanced Python Material
232 pages
Percona Monitoring and Management Documentation: Date .Getfullyear )
No ratings yet
Percona Monitoring and Management Documentation: Date .Getfullyear )
589 pages
Real Time Data Processing With PDI
No ratings yet
Real Time Data Processing With PDI
15 pages
Mongo DB
No ratings yet
Mongo DB
31 pages
Complete Python Notes 1716019066
No ratings yet
Complete Python Notes 1716019066
298 pages
BCA 428 Oracle
No ratings yet
BCA 428 Oracle
142 pages
Hive Using Hiveql
No ratings yet
Hive Using Hiveql
38 pages
Rest Api
No ratings yet
Rest Api
122 pages
01-Docker - 02 - Install Docker Desktop On Windows
No ratings yet
01-Docker - 02 - Install Docker Desktop On Windows
6 pages
Java Threads & Basics Explained
No ratings yet
Java Threads & Basics Explained
250 pages
Introduction to DBMS and SQL Basics
No ratings yet
Introduction to DBMS and SQL Basics
145 pages
Java J2ee Syllabus JBK PDF
No ratings yet
Java J2ee Syllabus JBK PDF
3 pages
Adm2000 Lab Guide
100% (1)
Adm2000 Lab Guide
48 pages
How To Implement Keycloak Authentication in React - LogRocket Blog
No ratings yet
How To Implement Keycloak Authentication in React - LogRocket Blog
15 pages
Flask PyKafka Integration
No ratings yet
Flask PyKafka Integration
15 pages
14-Lesson Cloudera Hive
No ratings yet
14-Lesson Cloudera Hive
9 pages
Oops
No ratings yet
Oops
71 pages
Durgasoft Jquery
No ratings yet
Durgasoft Jquery
19 pages
Ignite Sample
0% (1)
Ignite Sample
88 pages
PySpark RDD Assignment
No ratings yet
PySpark RDD Assignment
1 page
APIGateway DevelopersGuide allOS en PDF
No ratings yet
APIGateway DevelopersGuide allOS en PDF
162 pages
PENTAGON SPACE - Java Full Stack Brochure New Syllabus 01
No ratings yet
PENTAGON SPACE - Java Full Stack Brochure New Syllabus 01
10 pages
SQL Detailed Notes For Professionals 1672765219
No ratings yet
SQL Detailed Notes For Professionals 1672765219
166 pages
Spark DataFrames Project Exercise - Jupyter Notebook
No ratings yet
Spark DataFrames Project Exercise - Jupyter Notebook
7 pages
Data Stream Processing Insights
No ratings yet
Data Stream Processing Insights
67 pages
Scala and Spark Practice Questions - Free Practice Test - Spark Quiz and Test
No ratings yet
Scala and Spark Practice Questions - Free Practice Test - Spark Quiz and Test
9 pages
7 Best Practices For Building Data Applications On Snowflake
No ratings yet
7 Best Practices For Building Data Applications On Snowflake
15 pages
Adv - Java GTU Study Material Presentations Unit-6 Hibernate 4.0
No ratings yet
Adv - Java GTU Study Material Presentations Unit-6 Hibernate 4.0
60 pages
Apache Hue-Cloudera
No ratings yet
Apache Hue-Cloudera
63 pages
HBase Presentation
No ratings yet
HBase Presentation
23 pages
Map Reduce
No ratings yet
Map Reduce
40 pages
Cloudera Administration Study Guide
No ratings yet
Cloudera Administration Study Guide
3 pages
SPARQL & RDF: A Guide for Developers
No ratings yet
SPARQL & RDF: A Guide for Developers
39 pages
The Hadoop Distributed File System
No ratings yet
The Hadoop Distributed File System
44 pages
Apache Hive Tutorial
No ratings yet
Apache Hive Tutorial
139 pages
Database Systems Introduction
No ratings yet
Database Systems Introduction
35 pages
MEAN Stack vs LAMP: A Comparison
No ratings yet
MEAN Stack vs LAMP: A Comparison
39 pages
SS1123 - D2T - Apache Cassandra Overview PDF
100% (1)
SS1123 - D2T - Apache Cassandra Overview PDF
45 pages
Pentaho Data Integration
No ratings yet
Pentaho Data Integration
99 pages
5 Kafka Producer Advanced
No ratings yet
5 Kafka Producer Advanced
152 pages
Numpy Final - Removed
No ratings yet
Numpy Final - Removed
46 pages
Mysql Interview Questions PDF
No ratings yet
Mysql Interview Questions PDF
5 pages
Hive Is A Data Warehouse Infrastructure Tool To Process Structured Data in Hadoop
No ratings yet
Hive Is A Data Warehouse Infrastructure Tool To Process Structured Data in Hadoop
30 pages
Python Mastery - Exquisite Notes by Eminent Durga Sir Part-1
No ratings yet
Python Mastery - Exquisite Notes by Eminent Durga Sir Part-1
212 pages
Lecture 07 - Key-Value Databases
No ratings yet
Lecture 07 - Key-Value Databases
75 pages
Krishna Resume
No ratings yet
Krishna Resume
2 pages
Scala PDF
No ratings yet
Scala PDF
29 pages
Hadoop Echosystem and Ibm Big Insights: Rafie Tarabay Eng - Rafie@Mans - Edu.Eg
No ratings yet
Hadoop Echosystem and Ibm Big Insights: Rafie Tarabay Eng - Rafie@Mans - Edu.Eg
112 pages
Unit 3 - IoT-new
No ratings yet
Unit 3 - IoT-new
31 pages
Technical Interview Questions For Freshers - With Answers (2024)
No ratings yet
Technical Interview Questions For Freshers - With Answers (2024)
7 pages
Hadoop Training #4: Programming With Hadoop
100% (2)
Hadoop Training #4: Programming With Hadoop
46 pages
OnkarPramodKurle (3 0)
No ratings yet
OnkarPramodKurle (3 0)
7 pages
Oozie Workflow Guide
No ratings yet
Oozie Workflow Guide
84 pages
Apache Kafka
No ratings yet
Apache Kafka
17 pages
Kafka
No ratings yet
Kafka
12 pages
Kafka
No ratings yet
Kafka
43 pages
Kafka for Developers and Engineers
No ratings yet
Kafka for Developers and Engineers
7 pages
Backend SQL - Getting Started
No ratings yet
Backend SQL - Getting Started
5 pages
An Object-Oriented and Executable Sysml Framework For Rapid Model Development
No ratings yet
An Object-Oriented and Executable Sysml Framework For Rapid Model Development
10 pages
Capstone Project
No ratings yet
Capstone Project
57 pages
Replication Publishing Model Overview
No ratings yet
Replication Publishing Model Overview
14 pages
Chapter 4 - Database Design - (Normalization)
No ratings yet
Chapter 4 - Database Design - (Normalization)
43 pages
Dbms (w2)
No ratings yet
Dbms (w2)
2 pages
Challenge Yourself 23
No ratings yet
Challenge Yourself 23
3 pages
DBMS MCQs
No ratings yet
DBMS MCQs
9 pages
Jasmitha
No ratings yet
Jasmitha
20 pages
Search Engine
No ratings yet
Search Engine
42 pages
DBMS Concepts for IT Students
No ratings yet
DBMS Concepts for IT Students
10 pages
Installation Monitoring Tools Cacti - CentOS 7
No ratings yet
Installation Monitoring Tools Cacti - CentOS 7
3 pages
Solutions DatabaseSystemConcepts 7thed
No ratings yet
Solutions DatabaseSystemConcepts 7thed
193 pages
Ch05 - Physical Database Design and Performance
No ratings yet
Ch05 - Physical Database Design and Performance
38 pages
Cs PDF
No ratings yet
Cs PDF
49 pages
Week1 Intro
No ratings yet
Week1 Intro
35 pages
Loading Metadata Using The Outline Load Utility
No ratings yet
Loading Metadata Using The Outline Load Utility
3 pages
4 Directory Services
No ratings yet
4 Directory Services
16 pages
DBMS Proposal
No ratings yet
DBMS Proposal
4 pages
Database Normalization Guide
No ratings yet
Database Normalization Guide
23 pages
Snowflake Stages and Data Loading
No ratings yet
Snowflake Stages and Data Loading
3 pages
Big Data Analytics - 7th Sem VTU 2018 Scheme - Class 3
No ratings yet
Big Data Analytics - 7th Sem VTU 2018 Scheme - Class 3
10 pages
CSC321
No ratings yet
CSC321
48 pages
DynamicsMinds FO Vilmos Kintera Advanced D365FO Developer Tooling and Technologies For XppGroupies Slides
No ratings yet
DynamicsMinds FO Vilmos Kintera Advanced D365FO Developer Tooling and Technologies For XppGroupies Slides
18 pages
D427 - Lab Practice Guide
No ratings yet
D427 - Lab Practice Guide
4 pages
MCS-023 Introduction To Database Management Systems
No ratings yet
MCS-023 Introduction To Database Management Systems
14 pages
The Salesforce Developer's Guide To The Summer '24 Release - Salesforce Develope
No ratings yet
The Salesforce Developer's Guide To The Summer '24 Release - Salesforce Develope
22 pages
Knowledge Management: Corporate Executive Briefing
No ratings yet
Knowledge Management: Corporate Executive Briefing
25 pages
Fbin CV
No ratings yet
Fbin CV
1 page
QP - 12CS - PB-I - 23-24 Set 2
50% (2)
QP - 12CS - PB-I - 23-24 Set 2
8 pages

Kafka & Confluent: A Technical Guide

Uploaded by

Kafka & Confluent: A Technical Guide

Uploaded by

Kafka - Spark streaming - ESGI 2020.

Motivations Start Kafka

What is Kafka ? ● Conﬂuent

Ippon Technologies 2018

Ippon Technologies 2018

● MOM (Message Oriented Middleware)

Ippon Technologies 2018

● Created by Linkedin in 2009

● Spinoﬀ company Conﬂuent created in 2014

Ippon Technologies 2018

● Founded in 2014 by the creators of Kafka

The importance of real time

● The birth of Kafka

● In a traditional system, data is dispatched over some databases

● At ﬁrst, it is easy to connect several systems, several data sources to

● But eventually it becomes hard to maintain

● Batch processing is traditional and well known

● Nowadays, it is really common to have real time processing needs

● Kafka has been created to solve 2 issues

● Kafka separate data production and consumption

● Kafka is designed to work in a cluster

● Once the data in Kafka, It can be read by several / different consumers

● Data sent into Kafka are messages

● The 4 key components of Kafka are

● It has the task of sending messages to the Kafka cluster

● It is a key / value pair

● Topics are splitted into partitions

● The broker is the heart of Kafka

● A Kafka cluster usually contains several brokers

● It reads messages from Kafka

Kafka versus ● CAP Theorem

● Kafka is similar to products like RabbitMQ

● Distributed ﬁle system

● Consistency, Availability, Partition tolerance

Advanced ● Commit log

● Consumer group and offset

● Each topic is divided into partitions

Each topic is divided into

Partitions are distributed

● With partitions, we can scale. Data is no longer centralised mais

● Data of each partition are persisted in a commit log

● Several consumers can consume together as a consumer group

Consumers in a group Single consumers

● The number of useful consumers is limited by the number of partitions

● It is possible (and recommended) to replicate partitions

Produce and ● Dependencies

● Download Zookeeper and Kafka

● Kafka provides command line tools to manipulate topics, consume

● Kafka is a Scala dependency

● Kafka is a Scala dependency

● To start, we need to instantiate a producer

● Then we need to conﬁgure the producer. There are 3 mandatory properties:

● Kafka provides a utility class to simplify the conﬁguration

● There is a lot of possible parameters

● The call to producer.send() is asynchronous (non blocking)

● To get the result, there are two ways

● The method producer.send() can also take a function as parameter, a

● As for the producer, we need to instantiate the consumer

● As for the producer, we need to conﬁgure the consumer

● Several other parameters can be set

● Then, we need to subscribe to topics we wish to consume

● Then we can fetch the results

● If we set the parameter enable.auto.commit to be false, we will have to

● We can also asynchronously commit

● High level API to consume and producer messages between topics

● Kafka streams has 2 concepts

● Kafka streams offers usual high level functions :

● Word count example

You might also like