Introduction to Confluent Kafka Python Producer
Last Updated :
01 Dec, 2022
Apache Kafka is a publish-subscribe messaging queue used for real-time streams of data. Apache Kafka lets you send and receive messages between various Microservices. In this article, we will see how to send JSON messages using Python and Confluent-Kafka Library.JavaScript Object Notation (JSON) is a standard text-based format for representing structured data. It is a common data format with diverse uses in electronic data interchange, including that of web applications with servers.
Prerequisites:
- Good knowledge of Kafka Basic Concepts (e.g. Kafka Topics, Brokers, Partitions, Offset, Producer, Consumer, etc).
- Good knowledge of Python Basics (pip install <package>, writing python methods).
Solution :
Kafka Python Producer has different syntax and behaviors based on the Kafka Library we are using. So the First Step is choosing the Right Kafka Library for our Python Program.
Popular Kafka Libraries for Python:
While working on Kafka Automation with Python we have 3 popular choices of Libraries on the Internet:
- PyKafka
- Kafka-python
- Confluent Kafka
Each of these Libraries has its own Pros and Cons So we will have to choose based on our Project Requirements.
Step 1: Choosing the right Kafka Library
If we are using Amazon MSK clusters then We can build our Kafka Framework using PyKafka or Kafka-python (both are Open Source and most popular for Apache Kafka). If we are using Confluent Kafka clusters then We have to use Confluent Kafka Library as we will get Library support for Confluent specific features like ksqlDB, REST Proxy, and Schema Registry.
We will use Confluent Kafka Library for Python Kafka Producer as we can handle both Apache Kafka cluster and Confluent Kafka cluster with this Library.
We need Python 3.x and Pip already installed. We can execute the below command to install the Library in our System.
pip install confluent-kafka
Step 2: Kafka Authentication Setup.
Unlike most of the Kafka Python Tutorials available on the Internet, We will not work on localhost. Instead, We will try to connect to the Remote Kafka cluster with SSL Authentication. In order to connect to Kafka clusters, Generally, We get 1 JKS File and one Password for this JKS file from the Infra Support Team. This JKS file works fine with Java/Spring but not with Python.
So our job is to convert this JKS file into the appropriate format (as expected by the Python Kafka Library).
For Confluent Kafka Library, We need to convert the JKS file into PKCS12 format in order to connect to remote Kafka clusters.
To learn more visit the below pages:
- How to convert JKS to PKCS12?
- How to receive messages using Confluent Kafka Python Consumer
Step 3: Confluent Kafka Python Producer with SSL Authentication.
We will use the same PKCS12 file that was generated during JKS to the PKCS conversion step mentioned above.
Python3
import time
import json
from uuid import uuid4
from confluent_kafka import Producer
jsonString1 = """ {"name":"Gal", "email":"[email protected]", "salary": "8345.55"} """
jsonString2 = """ {"name":"Dwayne", "email":"[email protected]", "salary": "7345.75"} """
jsonString3 = """ {"name":"Momoa", "email":"[email protected]", "salary": "3345.25"} """
jsonv1 = jsonString1.encode()
jsonv2 = jsonString2.encode()
jsonv3 = jsonString3.encode()
def delivery_report(errmsg, msg):
"""
Reports the Failure or Success of a message delivery.
Args:
errmsg (KafkaError): The Error that occurred while message producing.
msg (Actual message): The message that was produced.
Note:
In the delivery report callback the Message.key() and Message.value()
will be the binary format as encoded by any configured Serializers and
not the same object that was passed to produce().
If you wish to pass the original object(s) for key and value to delivery
report callback we recommend a bound callback or lambda where you pass
the objects along.
"""
if errmsg is not None:
print("Delivery failed for Message: {} : {}".format(msg.key(), errmsg))
return
print('Message: {} successfully produced to Topic: {} Partition: [{}] at offset {}'.format(
msg.key(), msg.topic(), msg.partition(), msg.offset()))
kafka_topic_name = "kf.topic.empdev"
#Change your Kafka Topic Name here. For this Example: lets assume our Kafka Topic has 3 Partitions==> 0,1,2
#And We are producing messages uniformly to all partitions.
#We are sending the message as ByteArray.
#If We want read the same message from a Java Consumer Program
#We can configure KEY_DESERIALIZER_CLASS_CONFIG = ByteArrayDeserializer.class
# and VALUE_DESERIALIZER_CLASS_CONFIG = ByteArrayDeserializer.class
mysecret = "yourjksPassword"
#you can call remote API to get JKS password instead of hardcoding like above
print("Starting Kafka Producer")
conf = {
'bootstrap.servers' : 'm1.msk.us-east.aws.com:9094, m2.msk.us-east.aws.com:9094, m3.msk.us-east.aws.com:9094',
'security.protocol' : 'SSL',
'ssl.keystore.password' : mysecret,
'ssl.keystore.location' : './certkey.p12'
}
print("connecting to Kafka topic...")
producer1 = Producer(conf)
# Trigger any available delivery report callbacks from previous produce() calls
producer1.poll(0)
try:
# Asynchronously produce a message, the delivery report callback
# will be triggered from poll() above, or flush() below, when the message has
# been successfully delivered or failed permanently.
producer1.produce(topic=kafka_topic_name, key=str(uuid4()), value=jsonv1, on_delivery=delivery_report)
producer1.produce(topic=kafka_topic_name, key=str(uuid4()), value=jsonv2, on_delivery=delivery_report)
producer1.produce(topic=kafka_topic_name, key=str(uuid4()), value=jsonv3, on_delivery=delivery_report)
# Wait for any outstanding messages to be delivered and delivery report
# callbacks to be triggered.
producer1.flush()
except Exception as ex:
print("Exception happened :",ex)
print("\n Stopping Kafka Producer")
Sample Output of this Above Code :
Starting Kafka Producer
connecting to Kafka topic...
Message: b'4acef7b3-dx55-5f89-b69r-18b3188f919z' successfully produced to Topic: kf.topic.empdev Partition: [1] at offset 43211
Message: b'98xff6y4-crl5-gfgx-dq1r-k3z5122h611v' successfully produced to Topic: kf.topic.empdev Partition: [2] at offset 43210
Message: b'rus3v9xx-0bd9-astn-mrtn-yyz1920evl6r' successfully produced to Topic: kf.topic.empdev Partition: [0] at offset 43211
Stopping Kafka Producer
Conclusion :
We have got some idea on How to publish JSON messages on Kafka Topic using Python. So we can extend this Code as per our Project needs and continue modifying and developing our Kafka Automation Framework. We can also send all messages based on some condition to a specific Kafka Partition instead of sending equally to all partitions. To explore more on Confluent kafka Python Library we can visit: Confluent Docs
Similar Reads
Introduction to Psycopg2 module in Python
Psycopg is the most popular PostgreSQL adapter used in  Python.  Its works on the principle of the whole implementation of Python DB API 2.0 along with the thread safety (the same connection is shared by multiple threads). It is designed to perform heavily multi-threaded applications that usually cr
4 min read
Introduction to Python Pydantic Library
In modern Python development, data validation and parsing are essential components of building robust and reliable applications. Whether we're developing APIs, working with configuration files, or handling data from various sources, ensuring that our data is correctly validated and parsed is crucial
7 min read
Introduction to Python jsonschema
JSON schemas define the structure and constraints for JSON data, ensuring it meets specific rules and formats. By using jsonschema, developers can enforce data integrity, detect errors early, and ensure that data exchanged between systems adheres to expected standards. In this article, we will explo
6 min read
Introduction to Tornado Framework
Tornado is a robust Python asynchronous networking library and web framework, is available as an open-source project. Given that it is made to manage non-blocking, asynchronous processes, it is appropriate for developing high-performance, scalable web apps. Since its creation, Tornadoâwhich was firs
3 min read
HAUS Connect - Python Project
One of the most common problems faced by college students is that we have very erratic timetables and it is a cumbersome task to keep up with them, these issues have been amplified enormously during these pandemic times where everyone is on their own. The purpose of this application is to help stude
15+ min read
How to Capture udp Packets in Python
Capturing UDP (User Datagram Protocol) packets is a fundamental skill for network programmers and security enthusiasts. Python, being a versatile and powerful programming language, offers various libraries and tools to facilitate packet capture. In this article, we'll explore how to capture UDP pack
3 min read
Integrating Java with Python
While programming in a language, a developer may feel the need to use some functionality that may have better support in another language. For example, suppose, an application has already been developed in Java, and we wish to use it in Python code. To invoke an existing Java application in Python,
4 min read
How to communicate JSON data between Python and Node.js ?
The following article covers how to communicate JSON data between Python and Node.js. Suppose we are working with the Node.js application, and we want to make use of a specific library that is only available in python or vice versa. We should be able to share the results from one language to another
7 min read
Python Falcon - Environment Setup
Falcon is a Python library renowned for its proficiency in crafting indispensable REST APIs and microservices. Capable of accommodating both the WSGI (Web Server Gateway Interface) and ASGI (Asynchronous Server Gateway Interface) standards, Falcon has gained widespread recognition since its inceptio
3 min read
Python VLC MediaPlayer - Getting Event Manager object
In this article we will see how we can get the event manager of the MediaPlayer object in the python vlc module. VLC media player is a free and open-source portable cross-platform media player software and streaming media server developed by the VideoLAN project. MediPlyer object is the basic object
2 min read