0% found this document useful (0 votes)
32 views42 pages

Building A Real-Time E-Commerce Data Pipeline With Kafka, Flink, PostgreSQL, and Elasticsearch - by Shijun Ju - Apr, 2025 - Medium

This document outlines the creation of a real-time e-commerce data pipeline using Apache Kafka, Apache Flink, PostgreSQL, and Elasticsearch, designed to process transaction data for immediate insights. It details the setup of the environment, data ingestion, processing, storage, and visualization, providing a comprehensive guide for implementing the pipeline. The article emphasizes the importance of real-time analytics in e-commerce for optimizing operations and enhancing customer experiences.

Uploaded by

sasa332138
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views42 pages

Building A Real-Time E-Commerce Data Pipeline With Kafka, Flink, PostgreSQL, and Elasticsearch - by Shijun Ju - Apr, 2025 - Medium

This document outlines the creation of a real-time e-commerce data pipeline using Apache Kafka, Apache Flink, PostgreSQL, and Elasticsearch, designed to process transaction data for immediate insights. It details the setup of the environment, data ingestion, processing, storage, and visualization, providing a comprehensive guide for implementing the pipeline. The article emphasizes the importance of real-time analytics in e-commerce for optimizing operations and enhancing customer experiences.

Uploaded by

sasa332138
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...

Building a Real-Time E-Commerce


Data Pipeline with Kafka, Flink,
PostgreSQL, and Elasticsearch
Shijun Ju 9 min read · Apr 24, 2025

1 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...

2 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...

In the dynamic realm of e-commerce, the ability to process and analyze


transaction data in real-time is a game-changer. From tracking customer
preferences to optimizing inventory, real-time insights empower businesses
to stay agile and competitive. This article walks you through the creation of a
robust real-time data pipeline using Apache Kafka, Apache Flink,
PostgreSQL, and Elasticsearch, culminating in dynamic visualizations with
Kibana. Designed and tested on Windows (with Ubuntu WSL) environment,
this pipeline showcases a seamless flow from data ingestion to actionable
analytics.

3 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...

Table of Contents
1. Introduction

2. Environment Setup

3. Generating and Streaming Data with Kafka

4. Processing Data with Apache Flink

5. Storing Data in PostgreSQL

6. Real-Time Analytics with Elasticsearch and Kibana

7. Conclusion

Introduction
Real-time data pipelines bridge the gap between data generation and
decision-making by processing information as it arrives. In e-commerce,
this capability translates to immediate insights into sales trends, customer
behavior, and operational efficiency. This project leverages a powerful stack
of open-source technologies:

• Apache Kafka: A distributed streaming platform for ingesting and


distributing real-time data.

4 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...

• Apache Flink: A stream processing framework offering low-latency, high-


throughput data transformations.

• PostgreSQL: A reliable relational database for structured data storage and


historical analysis.

• Elasticsearch: A search engine for indexing data, enabling fast queries


and analytics.

• Kibana: A visualization tool for crafting interactive dashboards from


Elasticsearch data.

The pipeline ingests simulated e-commerce transactions, processes them in


real-time, stores aggregates in PostgreSQL, and indexes them in Elasticsearch
for visualization — all orchestrated with precision and scalability in mind.

Environment Setup

Overview
To bring this pipeline to life, we use Docker to manage services and Apache
Flink for processing. This section outlines the setup process, ensuring all

5 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...

components are ready to communicate seamlessly.

Prerequisites

• Docker: For containerized services.

• Python 3.x: For data generation (with pip installed).

• Apache Flink 1.18.0: For stream processing.

• Maven: For building the Flink project.

• Java 11: For Flink development.

Docker Configuration
The `docker-compose.yml` file orchestrates five services critical to the
pipeline:

• Zookeeper: Coordinates Kafka brokers for reliable messaging.

• Kafka (Broker): Streams data between producers and consumers.

• PostgreSQL: Stores processed data for warehousing.

• Elasticsearch ( 7.17.28 ): Indexes data for real-time search and analytics.

• Kibana ( 7.17.28 ): Visualizes data from Elasticsearch.

6 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...

Start the services:

docker-compose up -d

Verify Kafka is operational:

7 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...

docker exec -it broker kafka-topics --list --bootstrap-server broker:29092

• Expect no topics initially.

Python Environment
Set up a virtual environment and install dependencies:

• Packages required: confluent-kafka==2.3.0, Faker==20.1.0, python-


dateutil==2.8.2, simplejson==3.19.2, six==1.16.0

python -m venv venv


source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt

Apache Flink Setup


Download Flink 1.18.0 and configure flink-conf.yaml to speed up jobs by

8 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...

enabling more parallelism

• taskmanager.numberOfTaskSlots: 4

• parallelism.default: 2

Start the Flink cluster (adjust based on your Flink folder):

/mnt/c/flink-1.18.0/bin/start-cluster.sh

Access the dashboard at `localhost:8081`.

9 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...

Generating and Streaming Data with Kafka

Overview
Kafka acts as the pipeline’s ingestion layer, receiving simulated e-commerce
transactions from a Python script.

Data Generation
The `main.py` script generates random sales data with fields like

10 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...

transactionId, productName, productCategory, totalAmount, and


transactionDate, publishing it to the “financial_transactions” topic in Kafka.

Run the script:

python main.py

Verification
Confirm data is streaming:

docker exec -it broker kafka-console-consumer --topic financial_transactions --bootstrap-server br

You should see JSON-formatted transaction records.

11 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...

Processing Data with Apache Flink

Overview
Flink processes the Kafka stream, transforming raw data into structured
outputs for storage and analytics.

Project Setup

12 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...

Create a Maven project in IntelliJ IDEA:

• JDK: 11

• Archetype: org.apache.flink:flink-quickstart-java:1.18.0

• GroupId: FlinkKafkaEcommerce

Add dependencies in `pom.xml`:

• flink-connector-kafka: Connects Flink to Kafka.

• postgresql: JDBC driver for PostgreSQL.

• flink-sql-connector-elasticsearch7: Integrates with Elasticsearch.

• lombok: Simplifies Java boilerplate.

• flink-connector-jdbc: Enables JDBC sinks.

<dependencies>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-kafka</artifactId>
<version>3.0.1-1.18</version>

13 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...

</dependency>
<dependency>
<groupId>org.postgresql</groupId>
<artifactId>postgresql</artifactId>
<version>42.6.0</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-sql-connector-elasticsearch7</artifactId>
<version>3.0.1-1.17</version>
</dependency>
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<version>1.18.24</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-jdbc</artifactId>
<version>3.1.1-1.17</version>
</dependency>
</dependencies>

Data Model
Define the Transaction DTO — reflect the data generated in main.py:

package Dto;

import lombok.Data;

14 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...

import java.sql.Timestamp;

@Data
public class Transaction {
private String transactionId;
private String productId;
private String productName;
private String productCategory;
private double productPrice;
private int productQuantity;
private String productBrand;
private double totalAmount;
private String currency;
private String customerId;
private Timestamp transactionDate;
private String paymentMethod;
}

Deserialization
Create a deserializer for Kafka messages:

package Deserializer;

import Dto.Transaction;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.apache.flink.api.common.serialization.DeserializationSchema;
import org.apache.flink.api.common.typeinfo.TypeInformation;

import java.io.IOException;

15 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...

public class JSONValueDeserializationSchema implements DeserializationSchema<Transaction> {


private final ObjectMapper objectMapper = new ObjectMapper();

@Override
public Transaction deserialize(byte[] bytes) throws IOException {
return objectMapper.readValue(bytes, Transaction.class);
}

@Override
public boolean isEndOfStream(Transaction transaction) {
return false;
}

@Override
public TypeInformation<Transaction> getProducedType() {
return TypeInformation.of(Transaction.class);
}
}

Flink Job
The DataStreamJob class sets up the stream:

package FlinkKafkaEcommerce; // adjust based on your program

import Deserializer.JSONValueDeserializationSchema;
import Dto.Transaction;
import org.apache.flink.api.common.eventtime.WatermarkStrategy;
import org.apache.flink.connector.kafka.source.KafkaSource;
import org.apache.flink.connector.kafka.source.enumerator.initializer.OffsetsInitializer;
import org.apache.flink.streaming.api.datastream.DataStream;

16 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...

import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;

public class DataStreamJob {


public static void main(String[] args) throws Exception {
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(

String topic = "financial_transactions";

KafkaSource<Transaction> source = KafkaSource.<Transaction>builder()


.setBootstrapServers("localhost:9092")
.setTopics(topic)
.setGroupId("flink-group")
.setStartingOffsets(OffsetsInitializer.earliest())
.setValueOnlyDeserializer(new JSONValueDeserializationSchema())
.build();

DataStream<Transaction> transactionStream = env.fromSource(source, WatermarkStrategy.noWat

transactionStream.print();

env.execute("E-Commerce Data Pipeline");


}
}

Build:

mvn clean compile package

17 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...

Run:

• If you run it in Windows, open a Ubuntu WSL terminal in Intellij Idea,


then run it

18 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...

/mnt/c/<your-flink-folder>/bin/flink run -c FlinkKafkaEcommerce.DataStreamJob target/Flink-Kafka-E

Check the Flink dashboard (localhost:8081) for job status.

19 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...

Storing Data in PostgreSQL

Overview

20 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...

PostgreSQL stores raw transactions and aggregates like sales by category,


day, and month.

Connection Setup
Define JDBC options:

private static final String jdbcUrl = "jdbc:postgresql://localhost:5432/postgres"


private static final String username = "postgres";
private static final String password = "postgres";

JdbcExecutionOptions execOptions = new JdbcExecutionOptions.Builder()


.withBatchSize(1000)
.withBatchIntervalMs(200)
.withMaxRetries(5)
.build();

JdbcConnectionOptions connOptions = new JdbcConnectionOptions.JdbcConnectionOptionsBuilder()


.withUrl(jdbcUrl)
.withDriverName("org.postgresql.Driver")
.withUsername(username)
.withPassword(password)
.build();

Transactions Table
Store raw data:

21 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...

transactionStream.addSink(JdbcSink.sink(
"CREATE TABLE IF NOT EXISTS transactions (" +
"transaction_id VARCHAR(255) PRIMARY KEY, " +
"product_id VARCHAR(255), " +
"product_name VARCHAR(255), " +
"product_category VARCHAR(255), " +
"product_price DOUBLE PRECISION, " +
"product_quantity INTEGER, " +
"product_brand VARCHAR(255), " +
"total_amount DOUBLE PRECISION, " +
"currency VARCHAR(255), " +
"customer_id VARCHAR(255), " +
"transaction_date TIMESTAMP, " +
"payment_method VARCHAR(255)" +
")",
(JdbcStatementBuilder<Transaction>) (preparedStatement, transaction) -> {},
execOptions,
connOptions
)).name("Create Transactions Table Sink");

transactionStream.addSink(JdbcSink.sink(
"INSERT INTO transactions(transaction_id, product_id, product_name, product_category, product_
"product_quantity, product_brand, total_amount, currency, customer_id, transaction_date, p
"VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) " +
"ON CONFLICT (transaction_id) DO UPDATE SET " +
"product_id = EXCLUDED.product_id, " +
"product_name = EXCLUDED.product_name, " +
"product_category = EXCLUDED.product_category, " +
"product_price = EXCLUDED.product_price, " +
"product_quantity = EXCLUDED.product_quantity, " +
"product_brand = EXCLUDED.product_brand, " +
"total_amount = EXCLUDED.total_amount, " +

22 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...

"currency = EXCLUDED.currency, " +


"customer_id = EXCLUDED.customer_id, " +
"transaction_date = EXCLUDED.transaction_date, " +
"payment_method = EXCLUDED.payment_method",
(JdbcStatementBuilder<Transaction>) (preparedStatement, transaction) -> {
preparedStatement.setString(1, transaction.getTransactionId());
preparedStatement.setString(2, transaction.getProductId());
preparedStatement.setString(3, transaction.getProductName());
preparedStatement.setString(4, transaction.getProductCategory());
preparedStatement.setDouble(5, transaction.getProductPrice());
preparedStatement.setInt(6, transaction.getProductQuantity());
preparedStatement.setString(7, transaction.getProductBrand());
preparedStatement.setDouble(8, transaction.getTotalAmount());
preparedStatement.setString(9, transaction.getCurrency());
preparedStatement.setString(10, transaction.getCustomerId());
preparedStatement.setTimestamp(11, transaction.getTransactionDate());
preparedStatement.setString(12, transaction.getPaymentMethod());
},
execOptions,
connOptions
)).name("Insert into transactions table sink");

Aggregates Tables: Sales by Category, Sales by Day, and Sales of Month


Define DTOs:

package Dto;

import lombok.AllArgsConstructor;
import lombok.Data;
import java.sql.Date;

23 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...

@Data
@AllArgsConstructor
public class SalesPerCategory {
private Date transactionDate;
private String category;
private Double totalSales;
}

package Dto;

import lombok.AllArgsConstructor;
import lombok.Data;
import java.sql.Date;

@Data
@AllArgsConstructor
public class SalesPerDay {
private Date transactionDate;
private Double totalSales;
}

package Dto;

import lombok.AllArgsConstructor;
import lombok.Data;

@Data
@AllArgsConstructor
public class SalesPerMonth {
private int year;

24 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...

private int month;


private double totalSales;
}

Add sinks: Create tables and insert data in PostgreSQL database

import org.apache.flink.api.java.tuple.Tuple2;
import java.sql.Date;
import java.time.ZoneId;

// Sales per Category


transactionStream.map(
transaction -> {
Date transactionDate = new Date(transaction.getTransactionDate().getTime());
String category = transaction.getProductCategory();
double totalSales = transaction.getTotalAmount();
return new SalesPerCategory(transactionDate, category, totalSales);
}
).keyBy(salesPerCategory -> Tuple2.of(salesPerCategory.getTransactionDate(), salesPerCategory.getC
.reduce((sales1, sales2) -> {
sales1.setTotalSales(sales1.getTotalSales() + sales2.getTotalSales());
return sales1;
}).addSink(JdbcSink.sink(
"INSERT INTO sales_per_category(transaction_date, category, total_sales) " +
"VALUES (?, ?, ?) " +
"ON CONFLICT (transaction_date, category) DO UPDATE SET " +
"total_sales = EXCLUDED.total_sales",
(JdbcStatementBuilder<SalesPerCategory>) (preparedStatement, salesPerCategory) -> {
preparedStatement.setDate(1, salesPerCategory.getTransactionDate());
preparedStatement.setString(2, salesPerCategory.getCategory());

25 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...

preparedStatement.setDouble(3, salesPerCategory.getTotalSales());
},
execOptions,
connOptions
)).name("Insert into sales per category table");

// Sales per Day


transactionStream.map(
transaction -> {
Date transactionDate = new Date(transaction.getTransactionDate().getTime());
double totalSales = transaction.getTotalAmount();
return new SalesPerDay(transactionDate, totalSales);
}
).keyBy(SalesPerDay::getTransactionDate)
.reduce((sales1, sales2) -> {
sales1.setTotalSales(sales1.getTotalSales() + sales2.getTotalSales());
return sales1;
}).addSink(JdbcSink.sink(
"INSERT INTO sales_per_day(transaction_date, total_sales) " +
"VALUES (?, ?) " +
"ON CONFLICT (transaction_date) DO UPDATE SET " +
"total_sales = EXCLUDED.total_sales",
(JdbcStatementBuilder<SalesPerDay>) (preparedStatement, salesPerDay) -> {
preparedStatement.setDate(1, salesPerDay.getTransactionDate());
preparedStatement.setDouble(2, salesPerDay.getTotalSales());
},
execOptions,
connOptions
)).name("Insert into sales per day table");

// Sales per Month


transactionStream.map(
transaction -> {
Timestamp ts = transaction.getTransactionDate();
java.time.LocalDate localDate = ts.toInstant().atZone(ZoneId.systemDefault()).toLocalDate(
int year = localDate.getYear();

26 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...

int month = localDate.getMonthValue();


double totalSales = transaction.getTotalAmount();
return new SalesPerMonth(year, month, totalSales);
}
).keyBy(salesPerMonth -> Tuple2.of(salesPerMonth.getYear(), salesPerMonth.getMonth()))
.reduce((sales1, sales2) -> {
sales1.setTotalSales(sales1.getTotalSales() + sales2.getTotalSales());
return sales1;
}).addSink(JdbcSink.sink(
"INSERT INTO sales_per_month(year, month, total_sales) " +
"VALUES (?, ?, ?) " +
"ON CONFLICT (year, month) DO UPDATE SET " +
"total_sales = EXCLUDED.total_sales",
(JdbcStatementBuilder<SalesPerMonth>) (preparedStatement, salesPerMonth) -> {
preparedStatement.setInt(1, salesPerMonth.getYear());
preparedStatement.setInt(2, salesPerMonth.getMonth());
preparedStatement.setDouble(3, salesPerMonth.getTotalSales());
},
execOptions,
connOptions
)).name("Insert into sales per month table");

Build and run, verify in PostgreSQL:

docker exec -it postgres psql -U postgres


\d
SELECT * FROM sales_per_category;

27 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...

28 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...

Generate more data (python main.py), then verify new data is streamed
through Kafka and Fink, and reflected in PostgreSQL

You can also check the running of jobs in Flink dashboard

29 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...

Real-Time Analytics with Elasticsearch and Kibana

Overview
Elasticsearch indexes transactions for fast querying, while Kibana visualizes
the data in real-time.

30 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...

Elasticsearch Integration
Convert transactions to JSON:

package utils;

import Dto.Transaction;
import com.fasterxml.jackson.core.JsonProcessingException;
import com.fasterxml.jackson.databind.ObjectMapper;

public class JsonUtil {


private static final ObjectMapper objectMapper = new ObjectMapper();

public static String convertTransactionToJson(Transaction transaction) {


try {
return objectMapper.writeValueAsString(transaction);
} catch (JsonProcessingException e) {
e.printStackTrace();
return null;
}
}
}

Add the sink to Elasticsearch

31 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...

import org.apache.flink.connector.elasticsearch.sink.Elasticsearch7SinkBuilder;
import org.apache.flink.elasticsearch7.shaded.org.apache.http.HttpHost;
import org.apache.flink.elasticsearch7.shaded.org.elasticsearch.action.index.IndexRequest
import org.apache.flink.elasticsearch7.shaded.org.elasticsearch.client.Requests;
import org.apache.flink.elasticsearch7.shaded.org.elasticsearch.common.xcontent.XContentType

transactionStream.sinkTo(
new Elasticsearch7SinkBuilder<Transaction>()
.setHosts(new HttpHost("localhost", 9200, "http"))
.setEmitter((transaction, runtimeContext, requestIndexer) -> {
String json = JsonUtil.convertTransactionToJson(transaction);
IndexRequest indexRequest = Requests.indexRequest()
.index("transactions")
.id(transaction.getTransactionId())
.source(json, XContentType.JSON);
requestIndexer.add(indexRequest);
})
.build()
).name("Elasticsearch Sink");

Run the job and verify:

32 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...

• In Elasticsearch UI > Dev Tools

GET transactions/_search

33 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...

Kibana Visualization
1. Index Pattern: In Kibana (localhost:5601), go to Management > Index
Patterns > Create, and set “transactions”.

2. Visualizations: Create:
- Line chart: Total sales over time.
- Donut chart: Product category distribution.
- Bar chart: Sales by brand.

3. Dashboard: Combine visualizations and adjust Refresh frequency (5s for

34 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...

this project).

Test real-time updates by running `python main.py` and observing the


dashboard.

Conclusion

35 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...

This real-time data pipeline harnesses Kafka’s streaming, Flink’s processing,


PostgreSQL’s storage, and Elasticsearch’s analytics to deliver immediate
insights into e-commerce data. It’s a scalable foundation for advanced
features like predictive analytics or anomaly detection, showcasing the
power of modern data engineering.

Lastly, a sincere shoutout to CodeWithYu for his excellent design, coding,

36 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...

and teaching on YouTube, from which I’ve learned a great deal.

Code: https://github.com/shj37/flink-kafka-elasticsearch-ecommerce-java

References
• CodeWithYu https://www.youtube.com/watch?
v=deepQRXnniM&ab_channel=CodeWithYu

• https://github.com/airscholar/FlinkCommerce

All credit to @CodeWithYu for the original project design and code.

Flink Streaming Kafka Ecommerce Elasticsearch Docker

Written by Shijun Ju
48 followers · 2 following

Aspiring data engineer, data scientist, AI engineer, educator

37 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...

No responses yet

Sava Matic

More from Shijun Ju

38 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...

Shijun Ju Shijun Ju

Building a Real-Time Data Pipeline Implementing a Medallion


with Apache Airflow, Kafka, Spar… Architecture Data Pipeline on…
In this article, we explore the design and This tutorial will guide you through building
implementation of a real-time data pipeline… an end-to-end data pipeline on Azure using …

Apr 7 100 Mar 27 10

Shijun Ju Shijun Ju

Building a Real-Time Voting Building a Real-Time Anomaly


Analytics Dashboard with Kafka,… Detection Pipeline for Stock…
In this project, I designed and implemented a In the high-stakes world of stock trading,
real-time analytics dashboard to visualize… identifying anomalies in real-time can mean…

Apr 26 3 May 18 2

See all from Shijun Ju

39 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...

Recommended from Medium

In itversity by Durga Gadiraju In DevOps.dev by Yunus Gurguz

Getting Started with Kafka and Kafka Real-Time Streaming


Spark Integration Monitoring with Grafana and…
A Comprehensive Guide to Creating Kafka Github Repository
Topics and Reading Data with Spark APIs

Jan 19 3 May 13 113

40 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...

Shijun Ju In Data Engineering Indonesia by Agusmahari

Building a Real-Time Data Pipeline Real-time CDC using Mysql,


with Apache Airflow, Kafka, Spar… Postgres, MsSql, Debezium, Kafk…
In this article, we explore the design and Learn how to implement CDC, using
implementation of a real-time data pipeline… Debezium with Redpanda.

Apr 7 100 Mar 12 19

41 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...

young Gary In Level Up Coding by Tauseef Ameen

Building a Scalable Movie Kafka without ZooKeeper: Building


Recommendation Pipeline Using… Single and Multi-Broker Clusters i…
“From raw data to dashboards — an end-to- Learn How to Create a Scalable Kafka Cluster
end Big Data project using real-world movie… with KRaft and Docker Compose

May 29 Feb 6 110


See more recommendations

Help Status About Careers Press Blog Privacy Rules Terms Text to speech

42 of 42 6/14/2025, 10:18 AM

You might also like