Apache Kafka
Apache Kafka
Important Terms
Brokers
Producers
Consumers
Kafka Cluster
Partitions
Topics
Partitions
What is a Topic?
A topic is a category of the thing that you want to keep track of separated by partitions across brokers to give
you that resilient, sustainable system
Category / Feed
Separate log
Partitioned
What is a Broker?
A broker is a logical separation of partitions of topics, it's just containers or buckets. They handle many topics
and they give you that resiliency.
Logical
Many Topics
Resiliency
Leader per topic
Followers
Producer Configuration
What is a consumer?
Consumer Configuration
Enough memory to hold the data while it gets written on the disk, about 32 GB
Create a topic:
# kubectl exec th-zookeeper-0 -n arcsight-installer-y0tyk -- kafka-topics --zookeeper localhost:2181 --create --
topic arcsight2 --replication-factor 2 --partitions 2
Add Partitions:
# kubectl exec th-zookeeper-0 -n arcsight-installer-y0tyk -- kafka-topics --zookeeper localhost:2181 --alter --
topic arcsight2 --replication-factor 2 --partitions 5 (Keep in mind that you can increase the number of
partitions but you cannot reduce the number of partitions)
List of topics:
# kubectl exec th-zookeeper-0 -n arcsight-installer-y0tyk -- kafka-topics --zookeeper localhost:2181 –list
Monitoring Kafka
Kafka-monitor
Mirror Maker
Chaperone (Uber)
This Kafka-based platform allows ArcSight components like Logger, ESM, and Investigate to
receive the event stream, while smoothing event spikes, and functioning as an extended cache
Once configured with a Transformation Hub destination, the SmartConnector sends events to
Transformation Hub's Kafka cluster, which can then further distribute events to real-time
analysis and data warehousing systems
Performance impact due to leader acks is a known Kafka behavior. Exact details of the impact
will depend on your specific configuration, but could reduce the event rate by half or more
For CEF topics, SmartConnector (version 7.7 or later) encodes its own IP address as meta-
data in the Kafka message for consumers that require that information, such as Logger Device
Groups
Kafka consumers can take up to 24 hours for the broker nodes to balance the partitions among
the consumers. Check the Transformation Hub Kafka Manager Consumers page to confirm
all consumers are consuming from the topic.
Transformation Hub is designed with support for third-party tools. You can create a standard
Kafka consumer and configure it to subscribe to Transformation Hub topics. By doing this you
can pull Transformation Hub events into your own data lake.
Custom consumers must use Kafka client libraries of version 0.11 or later
Events are sent in standard CEF (CEF text) and binary (exclusively for ESM consumption).
Any software application that can consume from Kafka and understand CEF text can process
events.
TLS performance impact is a known Kafka behavior. Exact details of the impact will depend on
your specific configuration, but could reduce the event rate by half or more
Transformation Hub provides the open-source Transformation Hub Kafka Manager to help you
monitor and manage its Kafka services
For each Kafka broker node, the license check result is logged both the Kafka pod log and in
the file /opt/arcsight/k8s-hostpath-volume/th/autopass/license.log. If there is a valid
license, the log will include the text: TH licensed capacity: <eps number>
If a worker node is uninstalled, data remains on the node by default under /opt/arcsight/k8s-
hostpath-volume/th/kafka
When setting up a Transformation Hub, you can specify the number of copies (replicas) of
each topic Transformation Hub should distribute
Kafka In Transformation Hub
Kafka automatically distributes each event in a topic to the number of broker nodes indicated by the
topic replication level specified during the Transformation Hub configuration. While replication does
decrease throughput slightly, ArcSight recommends that you configure a replication factor of at least
2. You need at least one node for each replica. For example, a topic replication level of 5 requires at
least five nodes; one replica would be stored on each node.
A topic replication level of 1 means that only one broker will receive that event. If that broker goes
down, that event data will be lost. However, a replication level of 2 means that two broker nodes will
receive that same event. If one goes down, the event data would still be present on the other, and
would be restored to the first broker node when it came back up. Both broker nodes would have to go
down simultaneously for the data to be lost. A topic replication level of 3 means that three broker
nodes will receive that event. All three would have to go down simultaneously for the event data to be
lost.