Kafka

Apache Kafka is a distributed streaming platform used for building real-time data pipelines and streaming apps. It is used for use cases like metrics, log aggregation, and stream processing. The key components are brokers that run the Kafka cluster, producers that write data to topics, consumers that read data from topics, topics that act as categories for messages, and partitions that split topics across brokers for parallelism. Kafka uses APIs for publishing and processing streams of records and Zookeeper for coordination between brokers, producers, and consumers.

Uploaded by

PHƯƠNG THẢO

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

63 views

Kafka

Uploaded by

PHƯƠNG THẢO

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 23

Apache Kafka

Sajan Kedia
Agenda
1. What is kafka?
2. Use cases
3. Key components
4. Kafka APIs
5. How kafka works?
6. Real world examples
7. Zookeeper
8. Install & get started
9. Live Demo - Getting Tweets in Real Time & pushing in a Kafka topic by Producer
What is Kafka?
● Kafka is a distributed streaming platform:
○ publish-subscribe messaging system
■ A messaging system lets you send messages between processes, applications, and
servers.
○ Store streams of records in a fault-tolerant durable way.
○ Process streams of records as they occur.
● kafka is used for building real-time data pipelines and streaming apps
● It is horizontally scalable, fault-tolerant, fast and runs in production in
thousands of companies.
● Originally started by LinkedIn, later open sourced Apache in 2011.
Use Case
● Metrics − Kafka is often used for operational monitoring data. This involves
aggregating statistics from distributed applications to produce centralized feeds of
operational data.
● Log Aggregation Solution − Kafka can be used across an organization to collect logs
from multiple services and make them available in a standard format to multiple
consumers.
● Stream Processing − Popular frameworks such as Storm and Spark Streaming read
data from a topic, processes it, and write processed data to a new topic where it
becomes available for users and applications. Kafka’s strong durability is also very
useful in the context of stream processing.
Key Components of Kafka
● Broker
● Producers
● Consumers
● Topic
● Partitions
● Offset
● Consumer Group
● Replication
Broker
● Kafka run as a cluster on one or more servers that can span multiple
datacenters.
● An instance of the cluster is broker.
Producer & Consumer
Producer: It writes data to the brokers.

Consumer: It consumes data from brokers.

Kafka cluster can be running in multiple nodes.

Kafka Topic
● A Topic is a category/feed name to which messages are stored and published.
● If you wish to send a message you send it to a specific topic and if you wish
to read a message you read it from a specific topic.
● Why we need topic: In the same Kafka Cluster data from many different
sources can be coming at the same time. Ex. logs, web activities, metrics etc.
So Topics are useful to identify that this data is stored in a particular topic.
● Producer applications write data to topics and consumer applications read
from topics.
Partitions
● Kafka topics are divided into a number of partitions, which contains messages
in an unchangeable sequence(immutable).
● Each message in a partition is assigned and identified by its unique offset.
● A topic can also have multiple partition logs.This allows for multiple
consumers to read from a topic in parallel.
● Partitions allow you to parallelize a topic by splitting the data in a particular
topic across multiple brokers.
Partition Offset
Offset: Messages in the partitions are each assigned a unique (per partition) and
sequential id called the offset

Consumers track their pointers via (offset, partition, topic) tuples

Consumer & Consumer Group
● Consumers can read messages starting from a specific offset and are allowed
to read from any offset point they choose.
● This allows consumers to join the cluster at any point in time.
● Consumers can join a group called a consumer group.
● A consumer group includes the set of consumer processes that are
subscribing to a specific topic.
Replication
● In Kafka, replication is implemented at the partition level. Helps to prevent data loss.
● The redundant unit of a topic partition is called a replica.
● Each partition usually has one or more replicas meaning that partitions contain messages that are
replicated over a few Kafka brokers in the cluster. As we can see in the pictures - the click-topic is
replicated to Kafka node 2 and Kafka node 3.
Kafka APIs
Kafka has four core APIs:

● The Producer API allows an application to publish a stream of records to one or more
Kafka topics.
● The Consumer API allows an application to subscribe to one or more topics and
process the stream of records.
● The Streams API allows an application to act as a stream processor, consuming an
input stream from one or more topics and producing an output stream to one or more
output topics, effectively transforming the input streams to output streams.
● The Connector API allows building and running reusable producers or consumers that
connect Kafka topics to existing applications or data systems. For example, a
connector to a relational database might capture every change to a table.
How Kafka Works?
● Producers writes data to the topic
● As a message record is written to a partition of the topic, it’s offset is
increased by 1.
● Consumers consume data from the topic. Each consumers read data based
on the offset value.
Real World Example
● Website activity tracking.
● Let’s take example of Flipkart, when you visit flipkart & perform any action like
search, login, click on a product etc all of these events are captured.
● Tracking event will create a message stream for this based on the kind of
event it’ll go to a specific topic by Kafka Producer.
● This kind of activity tracking often require a very high volume of throughput,
messages are generated for each action.
Steps
1. A user clicks on a button on website.
2. The web application publishes a message to partition 0 in topic "click".
3. The message is appended to its commit log and the message offset is
incremented.
4. The consumer can pull messages from the click-topic and show monitoring
usage in real-time or for any other use case.
Another Example
Zookeeper
● ZooKeeper is used for managing and coordinating Kafka broker.
● ZooKeeper service is mainly used to notify producer and consumer about the
presence of any new broker in the Kafka system or failure of the broker in the
Kafka system.
● As per the notification received by the Zookeeper regarding presence or
failure of the broker then producer and consumer takes decision and starts
coordinating their task with some other broker.
● The ZooKeeper framework was originally built at Yahoo!
How to install & get started?
1. Download Apache kafka & zookeeper
2. Start Zookeeper server then kafka & run a single broker
> bin/zookeeper-server-start.sh config/zookeeper.properties
> bin/kafka-server-start.sh config/server.properties

3. Create a topic named test

> bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
> bin/kafka-topics.sh --list --zookeeper localhost:2181
test

4. Run the producer & send some messages

> bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
This is a message

This is another message

5. Start a consumer
> bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning
This is a message

This is another message

Live Demo
● Live Demo of Getting Tweets in Real Time by Calling Twitter API
● Pushing all the Tweets to a Kafka Topic by Creating Kafka Producer in Real
Time
● Code in Jupyter
Thanks :)

References Used:

● Research Paper - “Kafka: a Distributed Messaging System for Log Processing” : http://notes.stephenholiday.com/Kafka.pdf
● https://cwiki.apache.org/confluence/display/KAFKA/Kafka+papers+and+presentations
● https://kafka.apache.org/
● https://www.cloudkarafka.com

Kafka Using Spring Boot
No ratings yet
Kafka Using Spring Boot
136 pages
Understanding Apache Kafka White Paper
No ratings yet
Understanding Apache Kafka White Paper
7 pages
Apache Kafka
No ratings yet
Apache Kafka
9 pages
Apache Kafka Documentation
No ratings yet
Apache Kafka Documentation
419 pages
COMSOFT
100% (1)
COMSOFT
97 pages
20761B 01
100% (1)
20761B 01
21 pages
Apache Kafka Long Polling
No ratings yet
Apache Kafka Long Polling
20 pages
Apache Kafka
No ratings yet
Apache Kafka
17 pages
Kafka
No ratings yet
Kafka
12 pages
Cours - Kafka
No ratings yet
Cours - Kafka
72 pages
kafka
No ratings yet
kafka
43 pages
Kafka Notes
No ratings yet
Kafka Notes
7 pages
Unit 5 Apache Kafka Notes
No ratings yet
Unit 5 Apache Kafka Notes
54 pages
Fundamentals and Architecture of Apache Kafka
No ratings yet
Fundamentals and Architecture of Apache Kafka
30 pages
Documentation
No ratings yet
Documentation
105 pages
AK
No ratings yet
AK
22 pages
Kafka Clustering v1.0.0
No ratings yet
Kafka Clustering v1.0.0
20 pages
Kafka Using Spring Boot v2
No ratings yet
Kafka Using Spring Boot v2
150 pages
unit 3
No ratings yet
unit 3
26 pages
4. Introduction to Apache Kafka and its setup (3)
No ratings yet
4. Introduction to Apache Kafka and its setup (3)
29 pages
Big Data - Group 14
No ratings yet
Big Data - Group 14
26 pages
Apache Kafka | Thi Nguyen's Blog
No ratings yet
Apache Kafka | Thi Nguyen's Blog
39 pages
Kafka Architectures Notes
No ratings yet
Kafka Architectures Notes
9 pages
Kafka
No ratings yet
Kafka
19 pages
Introduction To Apache Kafka
No ratings yet
Introduction To Apache Kafka
18 pages
Apache Kafka Key Concepts
100% (1)
Apache Kafka Key Concepts
8 pages
Configuring Kafka For High Throughput
No ratings yet
Configuring Kafka For High Throughput
11 pages
Apache Kafka
No ratings yet
Apache Kafka
27 pages
Apache Kafka Beginner Guide
No ratings yet
Apache Kafka Beginner Guide
40 pages
Kafka Topic Questions
No ratings yet
Kafka Topic Questions
9 pages
08_Apache_Kafka
No ratings yet
08_Apache_Kafka
45 pages
Kafka
No ratings yet
Kafka
3 pages
Pache Kafka Is An Open-Source Distr
No ratings yet
Pache Kafka Is An Open-Source Distr
1 page
Apache Kafka
No ratings yet
Apache Kafka
13 pages
? Kafka
No ratings yet
? Kafka
2 pages
Apache Kafka(1)
No ratings yet
Apache Kafka(1)
10 pages
Apache_Kafka_360_1631077800
No ratings yet
Apache_Kafka_360_1631077800
137 pages
KAFKAExample2
No ratings yet
KAFKAExample2
12 pages
KAFKA PPT
No ratings yet
KAFKA PPT
11 pages
Apache Kafka
No ratings yet
Apache Kafka
130 pages
Apache Kafka Essentials
No ratings yet
Apache Kafka Essentials
10 pages
Kafka Notes
No ratings yet
Kafka Notes
7 pages
BDA Lab A7
No ratings yet
BDA Lab A7
10 pages
Kafka and Mongodb
No ratings yet
Kafka and Mongodb
15 pages
Kafka Patterns and Anti-Patterns
No ratings yet
Kafka Patterns and Anti-Patterns
7 pages
Kafka With Spring Boot
No ratings yet
Kafka With Spring Boot
48 pages
Apache Kafka Tutorial
No ratings yet
Apache Kafka Tutorial
6 pages
Instaclustr Understanding Apache Kafka White Paper
No ratings yet
Instaclustr Understanding Apache Kafka White Paper
8 pages
KAFKA PRESENTATION (1)
No ratings yet
KAFKA PRESENTATION (1)
16 pages
Kafka
No ratings yet
Kafka
5 pages
Data Engineering 101 - Kafka Concept
No ratings yet
Data Engineering 101 - Kafka Concept
76 pages
Apache Kafka
No ratings yet
Apache Kafka
17 pages
Big Data-Kafka
No ratings yet
Big Data-Kafka
14 pages
SITA1603 Unit 3 Material
No ratings yet
SITA1603 Unit 3 Material
45 pages
Kafka Sparkstreaming
No ratings yet
Kafka Sparkstreaming
75 pages
1646412329504-CCDAK_study_guide
No ratings yet
1646412329504-CCDAK_study_guide
56 pages
Apache Kafka Description
No ratings yet
Apache Kafka Description
36 pages
Kafka Up and Running for Network DevOps: Set Your Network Data in Motion
From Everand
Kafka Up and Running for Network DevOps: Set Your Network Data in Motion
Eric Chou
No ratings yet
Kafka Developer Certified: The Essential Guide
From Everand
Kafka Developer Certified: The Essential Guide
SUJAN
No ratings yet
Advanced Apache Kafka: Engineering High-Performance Streaming Applications
From Everand
Advanced Apache Kafka: Engineering High-Performance Streaming Applications
Peter Jones
No ratings yet
Kafka Streams - Real-time Streams Processing
From Everand
Kafka Streams - Real-time Streams Processing
Prashant Kumar Pandey
5/5 (2)
Confluent Certified Developer for Apache Kafka® Exam kit
From Everand
Confluent Certified Developer for Apache Kafka® Exam kit
PRIYANKA
No ratings yet
U It Data Backup and Recovery Policy
No ratings yet
U It Data Backup and Recovery Policy
7 pages
Aarsh Mehtani - Batch 2025 - B.tech - CSE
No ratings yet
Aarsh Mehtani - Batch 2025 - B.tech - CSE
1 page
PCO Review Report
No ratings yet
PCO Review Report
16 pages
XXX Department PBC List
No ratings yet
XXX Department PBC List
4 pages
A Risk Based Story Prioritization Technique in An Agile Environment
No ratings yet
A Risk Based Story Prioritization Technique in An Agile Environment
10 pages
David Bourdeau 2024
No ratings yet
David Bourdeau 2024
8 pages
AD Enterprise 7.1 Installation & Upgrade Guide
No ratings yet
AD Enterprise 7.1 Installation & Upgrade Guide
12 pages
MODULE 3.Ppt Ict
No ratings yet
MODULE 3.Ppt Ict
8 pages
6 OOP Concepts in Java With Examples
No ratings yet
6 OOP Concepts in Java With Examples
15 pages
Fresher 2 Page Resume
No ratings yet
Fresher 2 Page Resume
2 pages
1-Data Modeling-ER Diagrams
No ratings yet
1-Data Modeling-ER Diagrams
50 pages
IoT Unit 2
No ratings yet
IoT Unit 2
11 pages
List of Manufacturing Methods
No ratings yet
List of Manufacturing Methods
4 pages
Unit - I 1.data Base System
No ratings yet
Unit - I 1.data Base System
98 pages
Function SaveReportAs
No ratings yet
Function SaveReportAs
2 pages
Project Objective: Student Management
No ratings yet
Project Objective: Student Management
4 pages
Find
No ratings yet
Find
91 pages
Software AG Presentation
No ratings yet
Software AG Presentation
15 pages
Metadata Tool Comparison
0% (1)
Metadata Tool Comparison
72 pages
Introduction To Clean-Room Software Engineering
No ratings yet
Introduction To Clean-Room Software Engineering
3 pages
db0744ed-214f-4ce5-8608-5545dce2f058_M5___CFE_HSX
No ratings yet
db0744ed-214f-4ce5-8608-5545dce2f058_M5___CFE_HSX
34 pages
Cleaning Data With PySpark Chapter1
0% (1)
Cleaning Data With PySpark Chapter1
20 pages
Education:, Password Policy And, Changes in Policy
No ratings yet
Education:, Password Policy And, Changes in Policy
1 page
Nitin_Kumar_Resume
No ratings yet
Nitin_Kumar_Resume
1 page
Cloudsim Architecture
0% (1)
Cloudsim Architecture
22 pages
Guia de Implementación de Integración BIB - SAP HCM To SAP SFSF
No ratings yet
Guia de Implementación de Integración BIB - SAP HCM To SAP SFSF
272 pages
Malware Analysis Report July 2013 PDF
No ratings yet
Malware Analysis Report July 2013 PDF
20 pages
R2021 DBMS Record Lab
No ratings yet
R2021 DBMS Record Lab
58 pages

Kafka

Uploaded by

Kafka

Uploaded by

Apache Kafka

Consumer: It consumes data from brokers.

Kafka cluster can be running in multiple nodes.

Consumers track their pointers via (offset, partition, topic) tuples

3. Create a topic named test

4. Run the producer & send some messages

This is another message

This is another message

You might also like