0% found this document useful (0 votes)
148 views

Apache Kafka

Apache Kafka is a distributed streaming platform that allows for publishing and subscribing to streams of records. It includes key concepts like topics (to categorize streams), brokers (to host partitions of topics), producers (to publish data to topics), and consumers (to read data from topics). Transformation Hub uses Kafka as its underlying streaming platform, allowing ArcSight components to receive event streams from SmartConnectors while smoothing event loads. It distributes events to topics based on the replication factor configured during setup to provide redundancy and prevent data loss if brokers fail.

Uploaded by

Jason Gomez
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
148 views

Apache Kafka

Apache Kafka is a distributed streaming platform that allows for publishing and subscribing to streams of records. It includes key concepts like topics (to categorize streams), brokers (to host partitions of topics), producers (to publish data to topics), and consumers (to read data from topics). Transformation Hub uses Kafka as its underlying streaming platform, allowing ArcSight components to receive event streams from SmartConnectors while smoothing event loads. It distributes events to topics based on the replication factor configured during setup to provide redundancy and prevent data loss if brokers fail.

Uploaded by

Jason Gomez
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Apache Kafka Essentials

Important Terms

 Brokers
 Producers
 Consumers
 Kafka Cluster
 Partitions
 Topics
 Partitions
What is a Topic?

A topic is a category of the thing that you want to keep track of separated by partitions across brokers to give
you that resilient, sustainable system

 Category / Feed
 Separate log
 Partitioned

What is a Broker?

A broker is a logical separation of partitions of topics, it's just containers or buckets. They handle many topics
and they give you that resiliency.

 Logical
 Many Topics
 Resiliency
 Leader per topic
 Followers

If Broker 3 goes down


Topic leaders move to another Broker
In this case Broker #1 becomes overloaded
What is a Producer?

 Publishes data to topics


 Uses partitioner
 Replication complete before read
 Focus on throughput

Producer Configuration

Message durability (Ack)


Ordering / retries (Chronological Order)
Batching and compression
Queuing Limits (Unsent messages)

What is a consumer?

 Read data from topics


 Organized into groups
 Partitions divided among them
 Rebalanced
o Zookeeper > Broker Coordinator

Consumer Configuration

 Group ID (Consumer group name)


 Session timeout (default 30 sec)
 Heartbeat (with consumer coordinator and zookeeper)
 Auto commit (default 5 sec) (Offsets)
Hardware Recommendations (Medium Size Machines)

Enough memory to hold the data while it gets written on the disk, about 32 GB

24 Cores per machine

Multiple and fast drives for best performance

Adding and Removing Topics

Create a topic:
# kubectl exec th-zookeeper-0 -n arcsight-installer-y0tyk -- kafka-topics --zookeeper localhost:2181 --create --
topic arcsight2 --replication-factor 2 --partitions 2

Add Partitions:
# kubectl exec th-zookeeper-0 -n arcsight-installer-y0tyk -- kafka-topics --zookeeper localhost:2181 --alter --
topic arcsight2 --replication-factor 2 --partitions 5 (Keep in mind that you can increase the number of
partitions but you cannot reduce the number of partitions)

List of topics:
# kubectl exec th-zookeeper-0 -n arcsight-installer-y0tyk -- kafka-topics --zookeeper localhost:2181 –list

Monitoring Kafka

Kafka-monitor

Mirror Maker

Kafka Auditing Options

Kafka Audit (Roll your own - Mirrormaker)

Chaperone (Uber)

Confluent Control Center (Commercial components)


Kafka In Transformation Hub

 This Kafka-based platform allows ArcSight components like Logger, ESM, and Investigate to
receive the event stream, while smoothing event spikes, and functioning as an extended cache

 Once configured with a Transformation Hub destination, the SmartConnector sends events to
Transformation Hub's Kafka cluster, which can then further distribute events to real-time
analysis and data warehousing systems

 Performance impact due to leader acks is a known Kafka behavior. Exact details of the impact
will depend on your specific configuration, but could reduce the event rate by half or more

 For CEF topics, SmartConnector (version 7.7 or later) encodes its own IP address as meta-
data in the Kafka message for consumers that require that information, such as Logger Device
Groups

 Kafka consumers can take up to 24 hours for the broker nodes to balance the partitions among
the consumers. Check the Transformation Hub Kafka Manager Consumers page to confirm
all consumers are consuming from the topic.

 Transformation Hub is designed with support for third-party tools. You can create a standard
Kafka consumer and configure it to subscribe to Transformation Hub topics. By doing this you
can pull Transformation Hub events into your own data lake.

 Custom consumers must use Kafka client libraries of version 0.11 or later

 Events are sent in standard CEF (CEF text) and binary (exclusively for ESM consumption).
Any software application that can consume from Kafka and understand CEF text can process
events.

 TLS performance impact is a known Kafka behavior. Exact details of the impact will depend on
your specific configuration, but could reduce the event rate by half or more

 Transformation Hub provides the open-source Transformation Hub Kafka Manager to help you
monitor and manage its Kafka services

 Kafka Manager: https://github.com/yahoo/kafka-manager.

 For each Kafka broker node, the license check result is logged both the Kafka pod log and in
the file /opt/arcsight/k8s-hostpath-volume/th/autopass/license.log. If there is a valid
license, the log will include the text: TH licensed capacity: <eps number>

 If a worker node is uninstalled, data remains on the node by default under /opt/arcsight/k8s-
hostpath-volume/th/kafka

 When setting up a Transformation Hub, you can specify the number of copies (replicas) of
each topic Transformation Hub should distribute
Kafka In Transformation Hub

Kafka automatically distributes each event in a topic to the number of broker nodes indicated by the
topic replication level specified during the Transformation Hub configuration. While replication does
decrease throughput slightly, ArcSight recommends that you configure a replication factor of at least
2. You need at least one node for each replica. For example, a topic replication level of 5 requires at
least five nodes; one replica would be stored on each node.

A topic replication level of 1 means that only one broker will receive that event. If that broker goes
down, that event data will be lost. However, a replication level of 2 means that two broker nodes will
receive that same event. If one goes down, the event data would still be present on the other, and
would be restored to the first broker node when it came back up. Both broker nodes would have to go
down simultaneously for the data to be lost. A topic replication level of 3 means that three broker
nodes will receive that event. All three would have to go down simultaneously for the event data to be
lost.

Commands you can use:

# kubectl get nodes -L kafka


# kubectl get pods -n arcsight-installer-xxxx | grep kafka-manager

You might also like