Kafka
Kafka
Sajan Kedia
Agenda
1. What is kafka?
2. Use cases
3. Key components
4. Kafka APIs
5. How kafka works?
6. Real world examples
7. Zookeeper
8. Install & get started
9. Live Demo - Getting Tweets in Real Time & pushing in a Kafka topic by Producer
What is Kafka?
● Kafka is a distributed streaming platform:
○ publish-subscribe messaging system
■ A messaging system lets you send messages between processes, applications, and
servers.
○ Store streams of records in a fault-tolerant durable way.
○ Process streams of records as they occur.
● kafka is used for building real-time data pipelines and streaming apps
● It is horizontally scalable, fault-tolerant, fast and runs in production in
thousands of companies.
● Originally started by LinkedIn, later open sourced Apache in 2011.
Use Case
● Metrics − Kafka is often used for operational monitoring data. This involves
aggregating statistics from distributed applications to produce centralized feeds of
operational data.
● Log Aggregation Solution − Kafka can be used across an organization to collect logs
from multiple services and make them available in a standard format to multiple
consumers.
● Stream Processing − Popular frameworks such as Storm and Spark Streaming read
data from a topic, processes it, and write processed data to a new topic where it
becomes available for users and applications. Kafka’s strong durability is also very
useful in the context of stream processing.
Key Components of Kafka
● Broker
● Producers
● Consumers
● Topic
● Partitions
● Offset
● Consumer Group
● Replication
Broker
● Kafka run as a cluster on one or more servers that can span multiple
datacenters.
● An instance of the cluster is broker.
Producer & Consumer
Producer: It writes data to the brokers.
● The Producer API allows an application to publish a stream of records to one or more
Kafka topics.
● The Consumer API allows an application to subscribe to one or more topics and
process the stream of records.
● The Streams API allows an application to act as a stream processor, consuming an
input stream from one or more topics and producing an output stream to one or more
output topics, effectively transforming the input streams to output streams.
● The Connector API allows building and running reusable producers or consumers that
connect Kafka topics to existing applications or data systems. For example, a
connector to a relational database might capture every change to a table.
How Kafka Works?
● Producers writes data to the topic
● As a message record is written to a partition of the topic, it’s offset is
increased by 1.
● Consumers consume data from the topic. Each consumers read data based
on the offset value.
Real World Example
● Website activity tracking.
● Let’s take example of Flipkart, when you visit flipkart & perform any action like
search, login, click on a product etc all of these events are captured.
● Tracking event will create a message stream for this based on the kind of
event it’ll go to a specific topic by Kafka Producer.
● This kind of activity tracking often require a very high volume of throughput,
messages are generated for each action.
Steps
1. A user clicks on a button on website.
2. The web application publishes a message to partition 0 in topic "click".
3. The message is appended to its commit log and the message offset is
incremented.
4. The consumer can pull messages from the click-topic and show monitoring
usage in real-time or for any other use case.
Another Example
Zookeeper
● ZooKeeper is used for managing and coordinating Kafka broker.
● ZooKeeper service is mainly used to notify producer and consumer about the
presence of any new broker in the Kafka system or failure of the broker in the
Kafka system.
● As per the notification received by the Zookeeper regarding presence or
failure of the broker then producer and consumer takes decision and starts
coordinating their task with some other broker.
● The ZooKeeper framework was originally built at Yahoo!
How to install & get started?
1. Download Apache kafka & zookeeper
2. Start Zookeeper server then kafka & run a single broker
> bin/zookeeper-server-start.sh config/zookeeper.properties
> bin/kafka-server-start.sh config/server.properties
5. Start a consumer
> bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning
This is a message
References Used:
● Research Paper - “Kafka: a Distributed Messaging System for Log Processing” : http://notes.stephenholiday.com/Kafka.pdf
● https://cwiki.apache.org/confluence/display/KAFKA/Kafka+papers+and+presentations
● https://kafka.apache.org/
● https://www.cloudkarafka.com