0% found this document useful (0 votes)

9 views10 pages

Unit 5

The document outlines the architecture of a streaming system utilizing technologies such as Java, Apache Kafka, and Apache Storm. It details the components of the architecture including the collection tier, message queuing tier, analysis tier, and data access tier, along with the data flow process for collecting RSVP messages from Meetup. Additionally, it provides step-by-step instructions for installing and configuring Kafka, integrating the collection service, and building a Storm topology for analyzing RSVP data.

Uploaded by

ssitavinya2022

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views10 pages

Unit 5

Uploaded by

ssitavinya2022

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

UNIT 5

Chapter 9

1.Explain Architecture of streaming system with technology choices.

Architecture of streaming system with technology choices

Technologies Used:

 Java 1.8 for implementation

 Apache Maven 3.3.9 for building the project
 JavaScript for the web-based frontend
 All components use open-source tools, mostly Apache projects

Architecture Components:

1. Collection Tier (Netty):

This tier uses Netty to connect to the Meetup streaming RSVP API and collect real-time
RSVP data from the source.
2. Message Queuing Tier (Apache Kafka):
After data is collected, it is passed to Apache Kafka, which acts as the message broker to
queue and stream data to the next stages.
3. Analysis Tier (Apache Storm):
The queued data is consumed by Apache Storm, which performs real-time analysis such
as filtering, aggregating, or transforming the data.
4. In-Memory Data Store (Apache Kafka):
The results from the analysis tier are temporarily stored in Kafka again, this time as an
in-memory data store to enable fast access.
5. Data Access Tier (Netty):
This tier, also using Netty, serves the processed and stored data to the frontend
application.
6. Web Browser (JavaScript):
The final output is displayed to users via a web browser interface built using JavaScript,
providing a real-time data visualization experience.

2. Collection service data flow with a neat diagram.

When building our collection service, we want to take into consideration the following
capabilities:

 Managing the connection to the Meetup API

 Ensuring that we don’t lose data
 Integrating with the message queuing tier
This diagram shows how our collection service works behind the scenes to collect, log, and
deliver live RSVP messages from Meetup to Apache Kafka, a message queue system that helps
us handle large-scale data safely.

1. Connect: Our service opens a WebSocket connection to Meetup’s RSVP API. This lets
us listen to RSVP events in real-time, like tuning into a live radio channel.
2. Create Client Handler: Once connected, we spin up a client handler — a component
that knows how to process each incoming message.
3. Initialize Logging and Kafka Producer: The handler prepares the message logger (so
nothing gets lost) and sets up the Kafka producer, which will be responsible for
forwarding the data.
4. Receive Messages: Now, as people RSVP, messages flow in through the WebSocket.
We catch them in real-time.
5. Record Message: Before doing anything else, we log the message using a
HybridMessage Logger. This acts like a safety net — making sure we don’t lose data
even if Kafka fails.
6. Send Message to Producer: The client handler sends the message to the RSVP producer,
which prepares and forwards it to Kafka.
7. Produce Message in Kafka: Kafka receives the message and stores it in a queue so that
downstream services can pick it up.
8. Acknowledge and Clean Up:
Once Kafka confirms successful delivery:
• We remove the message from our logs (it’s safely stored now).
• If Kafka fails to confirm, we move the message to a ―failed‖ list for future retries.

3.Explain the step by step procedure to Installing and configuring Kafka

Downloading and installing Apache Kafka:

Download Kafka version 0.10.0.1 from the official Apache Kafka site.Kafka Installation:

$> wget http://www-us.apache.org/dist/kafka/0.10.0.1/kafka_2.11-0.10.0.1.tgz

$> tar -xvf kafka_2.11-0.10.0.1.tgz

Four main key Kafka Concepts:

 Producer – Sends messages to Kafka.

 Consumer – Receives and processes messages from Kafka.
 Broker – A Kafka server that manages message storage and delivery.
 Topic – A logical channel where messages are published and from where consumers read.
Topics help in organizing the data stream.

Starting Kafka, Apache ZooKeeper, and Creating a Topic

 Change directory to the Kafka installation folder.

 Start Apache ZooKeeper (required by Kafka for metadata storage).
 Start the Kafka server (broker).
 Create a topic called meetup-raw-rsvps with 1 partition and replication factor of 1.

$> cd kafka_2.11-0.10.0.1 1

$> bin/zookeeper-server-start.sh -daemon config/zookeeper.properties 2

$> bin/kafka-server-start.sh -daemon config/server.properties 3

$> bin/kafka-topics.sh --zookeeper localhost:2181 --create \ 4

--topic meetup-raw-rsvps --partitions 1 --replication-factor 1

$> Created topic "meetup-raw-rsvps". 5

Explanation of steps:

1. Navigate to the Kafka directory.

2. Start ZooKeeper.
3. Start the Kafka broker.
4. Create the topic.
5. Confirmation message indicating successful topic creation.

Verifying the Created Topic:

Use the --list command to confirm that the topic was successfully created.

$> bin/kafka-topics.sh --zookeeper localhost:2181 --list 1

meetup-raw-rsvps 2

Explanation:

1. Command to list all Kafka topics.

2. Expected output showing the meetup-raw-rsvps topic.

Kafka is now fully set up and ready to receive messages from your collection service.
4. Explain how to Integrating the collection service and Kafka (OR)

Describe how to build and run the collection service in streaming system.

 After setting up Kafka and the collection service, the next step is to integrate them and
verify that data is flowing correctly.
 To test the integration, one console window is used to start the Kafka consumer which
listens for messages from a specific topic.
 In a second console window, the collection service is built and run, which connects to the
Meetup API and begins sending live data to Kafka, confirming successful integration
when messages appear.

Running the Kafka console consumer

$> bin/kafka-console-consumer.sh --zookeeper localhost:2181\

--topic meetup-raw-rsvps

 When the Kafka consumer is started, no output appears initially because the topic has not
received any messages yet — this is expected behavior.
 The next step is to start the collection service, which will begin sending data to the Kafka
topic, allowing the consumer to display incoming messages.

Building and running the collection service

$> cd $EXAMPLE_CODE_HOME/Chapter9/collection-service 1
$> mvn clean package 2
$> java -jar target/collection-service-0.0.1.jar 3
WebSocket Client connected and ready to consume RSVPs!

1. Navigate to the directory where the source code for the collection service is located.
2. Use Maven to build the project and generate the necessary artifacts.
3. Run the generated JAR file to start the collection service.
5. Explain step by step procedure to Installing Storm and preparing Kafka

Apache Kafka is a message queue — it stores messages and passes them along for further
processing. Here, we’re setting up a topic where RSVP messages will be stored.

Installing Apache Storm :

Step 1: Download Apache Storm

 This step fetches the Apache Storm software package from its official website. You need
this to run and build real-time data processing applications.

Step 2: Decompress the Archive

 Running the tar -xvzf command unpacks the downloaded archive. It extracts all the
required files and folders needed to configure and run Apache Storm.

Setting Up Kafka for Analysis Topic :

Step 1: Navigate to Kafka Installation Directory

 This step places you inside Kafka’s installation folder so you can run Kafka-related
commands properly from the terminal.

Step 2: Create a Kafka Topic

This command creates a topic named meetup-topn-rsvps. Topics in Kafka are used to store and
organize streams of messages.

• It connects to ZooKeeper (Kafka’s coordination service) on port 2181.

• It sets up the topic with 1 partition and 1 replica, which is enough for testing or local
setups.

Step 3: Verify the Topic is Created

 After topic creation, Kafka returns a confirmation message. This ensures the topic is
ready to receive data (like RSVP messages from Meetup).
6.Explain how to Build the top n Storm topology

In Apache Storm, a topology is a directed acyclic graph (DAG) made up of spouts and bolts.
Spouts pull data from sources and emit it as tuples, while bolts process this data through filtering,
aggregation, or storage.
For analyzing top N Meetup RSVPs, a multi-bolt topology can be used to structure the logic
efficiently.

Multi-bolt approach to top n

1. Apache Kafka: Source of raw data.

2. Kafka Spout: Reads from the raw topic.

3. Rolling Count Bolt:

 Does local counting per partition (field grouping).

 Can be parallelized for performance.

4. Intermediate Ranking Bolt:

 Maintains partial top-N rankings from each count bolt.

 Works in parallel.
5. Total Ranking Bolt:

 Receives data via global grouping from all intermediate bolts.

 Merges partial rankings to produce the final Top-N result.

6. Kafka Bolt: Publishes final Top-N list to Kafka.

Use case: Best for high-volume streaming data needing scalable, distributed ranking. Handles
Top-N computation efficiently in stages.

Topology using streaming summary

1. Apache Kafka: Produces raw data to a topic.

2. Kafka Spout: Reads data from Kafka topic.

3. RSVP Summarizer Bolt:

 Performs the summary computation (e.g., counting, filtering).

 Needs a global grouping to consolidate data from all spout partitions.
4. Kafka Bolt: Writes the final Top-N summary back to Kafka.

Use case: Best for simple global summarization tasks with moderate data volumes. Everything is
centralized in one summarizer bolt.

Kafka
No ratings yet
Kafka
43 pages
Getting To Know Kafka: Ola Is The First Course in The Series of Courses Covering All The Aspects of Kafka
100% (1)
Getting To Know Kafka: Ola Is The First Course in The Series of Courses Covering All The Aspects of Kafka
23 pages
BDA Lab A7
100% (1)
BDA Lab A7
10 pages
Kafka
No ratings yet
Kafka
140 pages
Real Time Analytics With Apache Kafka and Spark: Rahul Jain
100% (1)
Real Time Analytics With Apache Kafka and Spark: Rahul Jain
54 pages
Kafkha
No ratings yet
Kafkha
32 pages
Kafka Clustering v1.0.0
No ratings yet
Kafka Clustering v1.0.0
20 pages
AK
No ratings yet
AK
22 pages
Apache Kafka for Tech Students
No ratings yet
Apache Kafka for Tech Students
21 pages
Big Data Concepts - Spark & Streaming
No ratings yet
Big Data Concepts - Spark & Streaming
35 pages
Apache Kafka
No ratings yet
Apache Kafka
13 pages
BDA Unit V
No ratings yet
BDA Unit V
21 pages
Kafka
No ratings yet
Kafka
28 pages
Kafka Notes 20250814
No ratings yet
Kafka Notes 20250814
6 pages
Kafka
No ratings yet
Kafka
50 pages
Kafka & Spring Boot for Developers
No ratings yet
Kafka & Spring Boot for Developers
150 pages
Kafka Architecture
No ratings yet
Kafka Architecture
5 pages
Bda Assign2
No ratings yet
Bda Assign2
4 pages
HD Mod011 Kafka
No ratings yet
HD Mod011 Kafka
29 pages
Kafka Using Spring Boot
No ratings yet
Kafka Using Spring Boot
136 pages
Real Time Analytics With Apache Kafka and Spark: @rahuldausa
No ratings yet
Real Time Analytics With Apache Kafka and Spark: @rahuldausa
54 pages
Kafka Setup for DevOps Logging
No ratings yet
Kafka Setup for DevOps Logging
3 pages
Kafka Interview Preparation
No ratings yet
Kafka Interview Preparation
13 pages
Kafka Presentation
No ratings yet
Kafka Presentation
16 pages
Fundamentals and Architecture of Apache Kafka
No ratings yet
Fundamentals and Architecture of Apache Kafka
30 pages
Kafka
No ratings yet
Kafka
15 pages
Kafka and NiFI
No ratings yet
Kafka and NiFI
8 pages
Kafka As A Solution
No ratings yet
Kafka As A Solution
10 pages
Apache Kafka Essentials
No ratings yet
Apache Kafka Essentials
10 pages
Kafka Setup and Streaming Guide
No ratings yet
Kafka Setup and Streaming Guide
111 pages
Apache Kafka 101
No ratings yet
Apache Kafka 101
26 pages
Kafka
No ratings yet
Kafka
23 pages
Kafka
No ratings yet
Kafka
20 pages
Apache Kafka
No ratings yet
Apache Kafka
9 pages
Handbook Version Confluent Exercise
No ratings yet
Handbook Version Confluent Exercise
160 pages
Kafka Sparkstreaming
No ratings yet
Kafka Sparkstreaming
75 pages
Apache Kafka Documentation
No ratings yet
Apache Kafka Documentation
419 pages
Kafka Ebook SoftwareMill
100% (1)
Kafka Ebook SoftwareMill
27 pages
12lecture - Technology and Tools (Ù SqoobFlume)
No ratings yet
12lecture - Technology and Tools (Ù SqoobFlume)
48 pages
Kafka
No ratings yet
Kafka
12 pages
Slide 13 - Kafka
No ratings yet
Slide 13 - Kafka
109 pages
Confluent Developer Skills For Building Apache Kafka
No ratings yet
Confluent Developer Skills For Building Apache Kafka
3 pages
Apache Kafka
No ratings yet
Apache Kafka
17 pages
Understanding Apache Kafka White Paper
No ratings yet
Understanding Apache Kafka White Paper
7 pages
Kafka Interview Guide
No ratings yet
Kafka Interview Guide
4 pages
5a - Streaming Data Analytics PDF
No ratings yet
5a - Streaming Data Analytics PDF
37 pages
Kafka and Spark Assignment Guide
No ratings yet
Kafka and Spark Assignment Guide
3 pages
Introduction To Apache Kafka
No ratings yet
Introduction To Apache Kafka
18 pages
Kafka & Confluent: A Technical Guide
No ratings yet
Kafka & Confluent: A Technical Guide
72 pages
Kafka My Kafka Note v67
No ratings yet
Kafka My Kafka Note v67
55 pages
Advanced Kafka Training for Developers
No ratings yet
Advanced Kafka Training for Developers
8 pages
Apache Kafka Comprehensive Guide
No ratings yet
Apache Kafka Comprehensive Guide
6 pages
Netflix Kafka 150325105558 Conversion Gate01
No ratings yet
Netflix Kafka 150325105558 Conversion Gate01
49 pages
Kafka
No ratings yet
Kafka
5 pages
Introduction To Confluent Components
No ratings yet
Introduction To Confluent Components
68 pages
01 - Chapter Introduction To AMQ Streams
No ratings yet
01 - Chapter Introduction To AMQ Streams
10 pages
Defence
No ratings yet
Defence
3 pages
Oceonology 1-500
No ratings yet
Oceonology 1-500
17 pages
PM Honours
No ratings yet
PM Honours
1 page
Infinity 2 Merged
No ratings yet
Infinity 2 Merged
34 pages
1
No ratings yet
1
4 pages
Full CET2024 CS Branches
No ratings yet
Full CET2024 CS Branches
3 pages
Trigonometry Quiz 4
No ratings yet
Trigonometry Quiz 4
10 pages
Wa0000.
No ratings yet
Wa0000.
1 page
DL Module 4 Notes
No ratings yet
DL Module 4 Notes
27 pages
Final List
No ratings yet
Final List
6 pages
Pre Placement 2024 25
No ratings yet
Pre Placement 2024 25
2 pages
RTA 6th Chapter Notes1
No ratings yet
RTA 6th Chapter Notes1
7 pages
Python Installation Tutorial
No ratings yet
Python Installation Tutorial
15 pages
Chapter 7 Notes Final
No ratings yet
Chapter 7 Notes Final
13 pages
Unit 4
No ratings yet
Unit 4
11 pages
Insurance Management System
No ratings yet
Insurance Management System
38 pages
Resume 1
No ratings yet
Resume 1
1 page
Unit-7 PPS
No ratings yet
Unit-7 PPS
17 pages
JD - SNR Java Engineer AWS
No ratings yet
JD - SNR Java Engineer AWS
1 page
Cooper Caruso Resume 2023
No ratings yet
Cooper Caruso Resume 2023
1 page
Arangodb
No ratings yet
Arangodb
7 pages
Compiler Design 1
No ratings yet
Compiler Design 1
26 pages
Saurabh Pandey Resume
No ratings yet
Saurabh Pandey Resume
1 page
File Handling in Python
No ratings yet
File Handling in Python
5 pages
Aspiring Mobile Developer Profile
No ratings yet
Aspiring Mobile Developer Profile
3 pages
Bca 2 Sem Practical 206 1 2019
No ratings yet
Bca 2 Sem Practical 206 1 2019
2 pages
Java Programming Masterclass Covering Java 11 & Java 17
No ratings yet
Java Programming Masterclass Covering Java 11 & Java 17
235 pages
PAMPHLET 25 Groups
No ratings yet
PAMPHLET 25 Groups
34 pages
Trust Manager: Problems Importing Certificate Responses
No ratings yet
Trust Manager: Problems Importing Certificate Responses
4 pages
Distributed Systems Communication
No ratings yet
Distributed Systems Communication
2 pages
Blue Modern Presentation
No ratings yet
Blue Modern Presentation
15 pages
Step by Step Guide For Client Export Import
No ratings yet
Step by Step Guide For Client Export Import
2 pages
Proazure: Welcome To Proazure Software Solutions Pvt. Ltd. Pune
No ratings yet
Proazure: Welcome To Proazure Software Solutions Pvt. Ltd. Pune
6 pages
CPU Scheduling for CS Students
No ratings yet
CPU Scheduling for CS Students
5 pages
SpeedMarkUserGuide en 3 4
No ratings yet
SpeedMarkUserGuide en 3 4
177 pages
E-Tech 11 Quarter 1 Quiz #1
No ratings yet
E-Tech 11 Quarter 1 Quiz #1
1 page
Stop Sleeping, Start Awaiting!: Johan Haleby
No ratings yet
Stop Sleeping, Start Awaiting!: Johan Haleby
21 pages
Grasp 1
No ratings yet
Grasp 1
28 pages
Full Interview Prep
No ratings yet
Full Interview Prep
4 pages
SE - Unit One
No ratings yet
SE - Unit One
54 pages
Tafj Calljee
No ratings yet
Tafj Calljee
8 pages
GC 2025 03 11
No ratings yet
GC 2025 03 11
8 pages
SAP ABAP Beginner's Guide
No ratings yet
SAP ABAP Beginner's Guide
66 pages
Cookbook GenderDomainChanges
No ratings yet
Cookbook GenderDomainChanges
10 pages
K-Map Logic Simplification Guide
No ratings yet
K-Map Logic Simplification Guide
21 pages

Unit 5

Uploaded by

Unit 5

Uploaded by

UNIT 5

1.Explain Architecture of streaming system with technology choices.

Architecture of streaming system with technology choices

 Java 1.8 for implementation

1. Collection Tier (Netty):

2. Collection service data flow with a neat diagram.

 Managing the connection to the Meetup API

3.Explain the step by step procedure to Installing and configuring Kafka

Downloading and installing Apache Kafka:

$> wget http://www-us.apache.org/dist/kafka/0.10.0.1/kafka_2.11-0.10.0.1.tgz

$> tar -xvf kafka_2.11-0.10.0.1.tgz

Four main key Kafka Concepts:

 Producer – Sends messages to Kafka.

Starting Kafka, Apache ZooKeeper, and Creating a Topic

 Change directory to the Kafka installation folder.

$> bin/zookeeper-server-start.sh -daemon config/zookeeper.properties 2

$> bin/kafka-server-start.sh -daemon config/server.properties 3

$> bin/kafka-topics.sh --zookeeper localhost:2181 --create \ 4

--topic meetup-raw-rsvps --partitions 1 --replication-factor 1

$> Created topic "meetup-raw-rsvps". 5

1. Navigate to the Kafka directory.

Verifying the Created Topic:

$> bin/kafka-topics.sh --zookeeper localhost:2181 --list 1

1. Command to list all Kafka topics.

Running the Kafka console consumer

$> bin/kafka-console-consumer.sh --zookeeper localhost:2181\

Building and running the collection service

Installing Apache Storm :

Step 1: Download Apache Storm

Step 2: Decompress the Archive

Setting Up Kafka for Analysis Topic :

Step 1: Navigate to Kafka Installation Directory

Step 2: Create a Kafka Topic

• It connects to ZooKeeper (Kafka’s coordination service) on port 2181.

Step 3: Verify the Topic is Created

Multi-bolt approach to top n

1. Apache Kafka: Source of raw data.

2. Kafka Spout: Reads from the raw topic.

3. Rolling Count Bolt:

 Does local counting per partition (field grouping).

4. Intermediate Ranking Bolt:

 Maintains partial top-N rankings from each count bolt.

 Receives data via global grouping from all intermediate bolts.

6. Kafka Bolt: Publishes final Top-N list to Kafka.

Topology using streaming summary

1. Apache Kafka: Produces raw data to a topic.

2. Kafka Spout: Reads data from Kafka topic.

3. RSVP Summarizer Bolt:

 Performs the summary computation (e.g., counting, filtering).

You might also like