Building A Real-Time E-Commerce Data Pipeline With Kafka, Flink, PostgreSQL, and Elasticsearch - by Shijun Ju - Apr, 2025 - Medium
Building A Real-Time E-Commerce Data Pipeline With Kafka, Flink, PostgreSQL, and Elasticsearch - by Shijun Ju - Apr, 2025 - Medium
1 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...
2 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...
3 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...
Table of Contents
1. Introduction
2. Environment Setup
7. Conclusion
Introduction
Real-time data pipelines bridge the gap between data generation and
decision-making by processing information as it arrives. In e-commerce,
this capability translates to immediate insights into sales trends, customer
behavior, and operational efficiency. This project leverages a powerful stack
of open-source technologies:
4 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...
Environment Setup
Overview
To bring this pipeline to life, we use Docker to manage services and Apache
Flink for processing. This section outlines the setup process, ensuring all
5 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...
Prerequisites
Docker Configuration
The `docker-compose.yml` file orchestrates five services critical to the
pipeline:
6 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...
docker-compose up -d
7 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...
Python Environment
Set up a virtual environment and install dependencies:
8 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...
• taskmanager.numberOfTaskSlots: 4
• parallelism.default: 2
/mnt/c/flink-1.18.0/bin/start-cluster.sh
9 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...
Overview
Kafka acts as the pipeline’s ingestion layer, receiving simulated e-commerce
transactions from a Python script.
Data Generation
The `main.py` script generates random sales data with fields like
10 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...
python main.py
Verification
Confirm data is streaming:
11 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...
Overview
Flink processes the Kafka stream, transforming raw data into structured
outputs for storage and analytics.
Project Setup
12 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...
• JDK: 11
• Archetype: org.apache.flink:flink-quickstart-java:1.18.0
• GroupId: FlinkKafkaEcommerce
<dependencies>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-kafka</artifactId>
<version>3.0.1-1.18</version>
13 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...
</dependency>
<dependency>
<groupId>org.postgresql</groupId>
<artifactId>postgresql</artifactId>
<version>42.6.0</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-sql-connector-elasticsearch7</artifactId>
<version>3.0.1-1.17</version>
</dependency>
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<version>1.18.24</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-jdbc</artifactId>
<version>3.1.1-1.17</version>
</dependency>
</dependencies>
Data Model
Define the Transaction DTO — reflect the data generated in main.py:
package Dto;
import lombok.Data;
14 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...
import java.sql.Timestamp;
@Data
public class Transaction {
private String transactionId;
private String productId;
private String productName;
private String productCategory;
private double productPrice;
private int productQuantity;
private String productBrand;
private double totalAmount;
private String currency;
private String customerId;
private Timestamp transactionDate;
private String paymentMethod;
}
Deserialization
Create a deserializer for Kafka messages:
package Deserializer;
import Dto.Transaction;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.apache.flink.api.common.serialization.DeserializationSchema;
import org.apache.flink.api.common.typeinfo.TypeInformation;
import java.io.IOException;
15 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...
@Override
public Transaction deserialize(byte[] bytes) throws IOException {
return objectMapper.readValue(bytes, Transaction.class);
}
@Override
public boolean isEndOfStream(Transaction transaction) {
return false;
}
@Override
public TypeInformation<Transaction> getProducedType() {
return TypeInformation.of(Transaction.class);
}
}
Flink Job
The DataStreamJob class sets up the stream:
import Deserializer.JSONValueDeserializationSchema;
import Dto.Transaction;
import org.apache.flink.api.common.eventtime.WatermarkStrategy;
import org.apache.flink.connector.kafka.source.KafkaSource;
import org.apache.flink.connector.kafka.source.enumerator.initializer.OffsetsInitializer;
import org.apache.flink.streaming.api.datastream.DataStream;
16 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
transactionStream.print();
Build:
17 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...
Run:
18 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...
19 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...
Overview
20 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...
Connection Setup
Define JDBC options:
Transactions Table
Store raw data:
21 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...
transactionStream.addSink(JdbcSink.sink(
"CREATE TABLE IF NOT EXISTS transactions (" +
"transaction_id VARCHAR(255) PRIMARY KEY, " +
"product_id VARCHAR(255), " +
"product_name VARCHAR(255), " +
"product_category VARCHAR(255), " +
"product_price DOUBLE PRECISION, " +
"product_quantity INTEGER, " +
"product_brand VARCHAR(255), " +
"total_amount DOUBLE PRECISION, " +
"currency VARCHAR(255), " +
"customer_id VARCHAR(255), " +
"transaction_date TIMESTAMP, " +
"payment_method VARCHAR(255)" +
")",
(JdbcStatementBuilder<Transaction>) (preparedStatement, transaction) -> {},
execOptions,
connOptions
)).name("Create Transactions Table Sink");
transactionStream.addSink(JdbcSink.sink(
"INSERT INTO transactions(transaction_id, product_id, product_name, product_category, product_
"product_quantity, product_brand, total_amount, currency, customer_id, transaction_date, p
"VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) " +
"ON CONFLICT (transaction_id) DO UPDATE SET " +
"product_id = EXCLUDED.product_id, " +
"product_name = EXCLUDED.product_name, " +
"product_category = EXCLUDED.product_category, " +
"product_price = EXCLUDED.product_price, " +
"product_quantity = EXCLUDED.product_quantity, " +
"product_brand = EXCLUDED.product_brand, " +
"total_amount = EXCLUDED.total_amount, " +
22 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...
package Dto;
import lombok.AllArgsConstructor;
import lombok.Data;
import java.sql.Date;
23 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...
@Data
@AllArgsConstructor
public class SalesPerCategory {
private Date transactionDate;
private String category;
private Double totalSales;
}
package Dto;
import lombok.AllArgsConstructor;
import lombok.Data;
import java.sql.Date;
@Data
@AllArgsConstructor
public class SalesPerDay {
private Date transactionDate;
private Double totalSales;
}
package Dto;
import lombok.AllArgsConstructor;
import lombok.Data;
@Data
@AllArgsConstructor
public class SalesPerMonth {
private int year;
24 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...
import org.apache.flink.api.java.tuple.Tuple2;
import java.sql.Date;
import java.time.ZoneId;
25 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...
preparedStatement.setDouble(3, salesPerCategory.getTotalSales());
},
execOptions,
connOptions
)).name("Insert into sales per category table");
26 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...
27 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...
28 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...
Generate more data (python main.py), then verify new data is streamed
through Kafka and Fink, and reflected in PostgreSQL
29 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...
Overview
Elasticsearch indexes transactions for fast querying, while Kibana visualizes
the data in real-time.
30 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...
Elasticsearch Integration
Convert transactions to JSON:
package utils;
import Dto.Transaction;
import com.fasterxml.jackson.core.JsonProcessingException;
import com.fasterxml.jackson.databind.ObjectMapper;
31 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...
import org.apache.flink.connector.elasticsearch.sink.Elasticsearch7SinkBuilder;
import org.apache.flink.elasticsearch7.shaded.org.apache.http.HttpHost;
import org.apache.flink.elasticsearch7.shaded.org.elasticsearch.action.index.IndexRequest
import org.apache.flink.elasticsearch7.shaded.org.elasticsearch.client.Requests;
import org.apache.flink.elasticsearch7.shaded.org.elasticsearch.common.xcontent.XContentType
transactionStream.sinkTo(
new Elasticsearch7SinkBuilder<Transaction>()
.setHosts(new HttpHost("localhost", 9200, "http"))
.setEmitter((transaction, runtimeContext, requestIndexer) -> {
String json = JsonUtil.convertTransactionToJson(transaction);
IndexRequest indexRequest = Requests.indexRequest()
.index("transactions")
.id(transaction.getTransactionId())
.source(json, XContentType.JSON);
requestIndexer.add(indexRequest);
})
.build()
).name("Elasticsearch Sink");
32 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...
GET transactions/_search
33 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...
Kibana Visualization
1. Index Pattern: In Kibana (localhost:5601), go to Management > Index
Patterns > Create, and set “transactions”.
2. Visualizations: Create:
- Line chart: Total sales over time.
- Donut chart: Product category distribution.
- Bar chart: Sales by brand.
34 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...
this project).
Conclusion
35 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...
36 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...
Code: https://github.com/shj37/flink-kafka-elasticsearch-ecommerce-java
References
• CodeWithYu https://www.youtube.com/watch?
v=deepQRXnniM&ab_channel=CodeWithYu
• https://github.com/airscholar/FlinkCommerce
All credit to @CodeWithYu for the original project design and code.
Written by Shijun Ju
48 followers · 2 following
37 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...
No responses yet
Sava Matic
38 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...
Shijun Ju Shijun Ju
Shijun Ju Shijun Ju
Apr 26 3 May 18 2
39 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...
40 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...
41 of 42 6/14/2025, 10:18 AM
Building a Real-Time E-Commerce Data Pipeline with Kafka, Flink, PostgreSQL, and Elastic... https://medium.com/@jushijun/building-a-real-time-e-commerce-data-pipeline-with-kafka-fli...
Help Status About Careers Press Blog Privacy Rules Terms Text to speech
42 of 42 6/14/2025, 10:18 AM