0% found this document useful (0 votes)
5 views17 pages

Pipeline Presentation

Uploaded by

ravanbilalov03
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views17 pages

Pipeline Presentation

Uploaded by

ravanbilalov03
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 17

Processing Real-Time

Transaction Data
Scalable
Big Data
Pipeline
Design & Presented by:
Impleme Yusif Nuri
ntation Ravan Bilalov
Ehtesham Husain
Project Overview

REAL-TIME DATA STREAMLIT DASHBOARD DOCKERIZED


PROCESSING USING KAFKA FOR VISUALIZATION ENVIRONMENT FOR
DEPLOYMENT
Technology Stack

- APACHE KAFKA - PYTHON (KAFKA, - STREAMLIT & - DOCKER &


PANDAS, PLOTLY DOCKER COMPOSE
STREAMLIT)
- OnlineRetail.csv
dataset

- Contains real-world
Data Source retail transactions

- Used for real-time


streaming and analysis
Architecture
Overview
• Producer ➝ Kafka ➝
Consumer ➝
Dashboard
• Kafka handles real-
time data flow
• Consumer processes
data for visualization
Producer • Reads data from OnlineRetail.csv
(Kafka • Sends each row as a message to Kafka
topic: 'retail_data'

Producer)
Kafka • - Manages real-time data flow

Broker
• - Ensures data consistency and scalability
Consumer
(Kafka
Consumer)
• Listens to
'retail_data' topic
• Processes incoming
messages for
dashboard
visualization
Dashboard (Streamlit)

• - Displays real-time transaction


data
• - Key features:
• Live transaction table
• Time-Series Sales Chart
• Top Products Sold
• Sales Distribution by Country
Deployment using
Docker

• - Dockerfile: Defines Python


environment
• - docker-compose.yaml:
Manages Kafka, Producer,
Consumer, Dashboard services
\

Introduction & Prerequisites

•Objective: Securely upload and deploy files on AWS EC2.


•Requirements:
•AWS account with EC2 access
•SSH key pair (.pem
file)

•Running EC2 instance (Ubuntu/Amazon Linux)


•Docker installed locally
Setting Up EC2 & Key Pair

•Creating a Key Pair:


•Navigate to AWS Console → EC2 Dashboard
•Click Key Pairs → Create Key Pair
•Download .pem file & set permissions:
chmod 400 your-key.pem
•Launching EC2 Instance:
•Select Ubuntu or Amazon Linux AMI
•Choose instance type (e.g., t2.micro for free tier)
•Attach key pair
•Configure security groups to allow SSH (port 22)
Uploading Files via SCP

•Find EC2 Public IP:


•Go to EC2 Dashboard → Copy Public IPv4 Address
•Connect via SSH:
ssh -i your-key.pem ubuntu@your-ec2-public-ip
scp -i your-key.pem -r docker/ ubuntu@your-ec2-public-ip:~/docker-project
•-i your-key.pem: Uses SSH key
•-r docker/: Uploads the docker/ folder
•ubuntu@your-ec2-public-ip:~/docker-project: Remote destination
•Upload Files:
Deploying the Project & Next Steps

•Verify Upload:
ssh -i your-key.pem ubuntu@your-ec2-
public-ip
cd ~/docker-project
ls -l
•Install & Start Docker on EC2:
sudo apt update && sudo apt install -y
docker.io
sudo systemctl start docker
sudo systemctl enable docker
•Run Project (if using docker-compose):
cd ~/docker-project
docker-compose up -d
Challenges Faced

KAFKA CONNECTION DATA PROCESSING REAL-TIME


ISSUES OPTIMIZATIONS DASHBOARD UPDATES
Results & Insights

Successfully Live dashboard for Potential


implemented real- monitoring sales improvements for
time data pipeline trends scalability
Conclusion

- Gained experience in real-time data processing

- Future enhancements: Add machine learning


predictions for sales forecasting

You might also like