0% found this document useful (0 votes)
27 views18 pages

Paper 1

This paper presents a novel hybrid deep learning model combining Gated Recurrent Units (GRU) and Convolutional Neural Networks (CNN) for anomaly detection in Industrial Internet of Things (IIoT) systems, achieving an accuracy of 94.94%. The study emphasizes the importance of model selection for effective anomaly detection and suggests future exploration of XGBoost with hybrid architectures. The proposed approach significantly enhances IIoT security by addressing the unique challenges of edge computing environments.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views18 pages

Paper 1

This paper presents a novel hybrid deep learning model combining Gated Recurrent Units (GRU) and Convolutional Neural Networks (CNN) for anomaly detection in Industrial Internet of Things (IIoT) systems, achieving an accuracy of 94.94%. The study emphasizes the importance of model selection for effective anomaly detection and suggests future exploration of XGBoost with hybrid architectures. The proposed approach significantly enhances IIoT security by addressing the unique challenges of edge computing environments.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

A Secure Hybrid Deep Learning Technique for Anomaly Detection

in IIoT Edge Computing


Bharath Konatham1 , Tabassum Simra1 , Fathi Amsaad1 , Mohamed I Ibrahem1 , and Noor
Zaman Jhanjhi1
1
Affiliation not available
Posted on 26 Jan 2024 — CC-BY 4.0 — [Link] — e-Prints posted on TechRxiv are preliminary reports that are not peer reviewed. They should not b...

January 26, 2024

Abstract
The IIoT network involves smart sensors, actuators, and technologies extending IoT capabilities across industrial sectors. With
the rapid development in connected technology and communications in industrial applications, IIoT networks and devices are
increasingly integrated into less secure physical environments. Anomaly detection in IIoT is crucial for cybersecurity. This
paper proposes a novel anomaly detection model for IIoT systems, leveraging a hybrid deep learning (DL) model. The hybrid
DL approach combines Gated Recurrent Units (GRU) and Convolutional Neural Networks (CNN) for anomaly detection in IoT
edge computing. The proposed CNN+GRU model achieves a notable 94.94% accuracy, underscoring the importance of careful
model selection for IIoT anomaly detection. The paper suggests exploring XGBoost with hybrid CNN+GRU architectures as a
future direction for high accuracy in complex IIoT contexts. The Experimental results indicate a 96.41% accuracy, excelling in
metrics like false alarm rate (FAR), recall, precision, and F1score. Based on these findings, we recommend future researchers
consider advanced hybrid architectures and enhance efficiency using XGBoost with hybrid CNN+GRU. This approach holds
promise for significant contributions to IIoT systems’ security and Performance evolution.

1
1

A Secure Hybrid Deep Learning Technique for


Anomaly Detection in IIoT Edge Computing
Bharath Konatham, Tabassum Simra, Student Members, IEEE;
Fathi Amsaad, Mohamed I. Ibrahem, and Noor Zaman Jhanjhi, Senior Members, IEEE

Abstract—The IIoT network involves smart sensors, actuators, deep learning models to address the challenges of anomaly
and technologies extending IoT capabilities across industrial detection [10], [11].
sectors. With the rapid development in connected technology Figure 1 shows an overview of the industrial anomaly detec-
and communications in industrial applications, IIoT networks
and devices are increasingly integrated into less secure physical tion paradigm, wherein IoT device-generated data undergoes
environments. Anomaly detection in IIoT is crucial for cyberse- preprocessing and is subsequently input into pre-trained deep
curity. This paper proposes a novel anomaly detection model for learning models for anomaly identification. Recent research
IIoT systems, leveraging a hybrid deep learning (DL) model. The has directed attention toward harnessing deep learning mod-
hybrid DL approach combines Gated Recurrent Units (GRU) and els, encompassing Long Short-Term Memory (LSTM), Gated
Convolutional Neural Networks (CNN) for anomaly detection in
IoT edge computing. The proposed CNN+GRU model achieves a Recurrent Units (GRU), and Convolutional Neural Networks
notable 94.94% accuracy, underscoring the importance of careful (CNN) to elevate the accuracy and efficiency of anomaly
model selection for IIoT anomaly detection. The paper suggests detection. These models exhibit remarkable adeptness in cap-
exploring XGBoost with hybrid CNN+GRU architectures as a turing intricate patterns and temporal correlations within data,
future direction for high accuracy in complex IIoT contexts. The thus augmenting the efficacy of anomaly identification [12].
Experimental results indicate a 96.41% accuracy, excelling in
metrics like false alarm rate (FAR), recall, precision, and F1- This study undertakes a comprehensive evaluation of well-
score. Based on these findings, we recommend future researchers established deep learning models, including CNN, GRU,
consider advanced hybrid architectures and enhance efficiency LSTM, and Hybrid models such as CNN+GRU, Autoen-
using XGBoost with hybrid CNN+GRU. This approach holds coder+CNN, Autoencoder+LSTM, Autoencoder+GRU, along-
promise for significant contributions to IIoT systems’ security
side the potent gradient boosting algorithm XGBoost to ascer-
and Performance evolution .
tain their proficiency in detecting anomalies within industrial
Index Terms—Cybersecurity, Anomaly Detection, Industrial IoT systems [13], [14].
Internet of Things (IIoT), Edge Computing, Deep Learning. Anomaly detection is vital across diverse domains, in-
cluding network security, finance, and industrial systems, to
identify deviations from expected patterns or abnormal be-
I. INTRODUCTION
havior. Within the context of edge-based Internet of Things
Anomaly detection plays a crucial role across various (IoT) systems, anomalies manifest as unexpected or irregular
domains, including network security, financial systems, and patterns within data collected from edge devices. Detecting
industrial operations [1]. Its primary objective is to identify un- these anomalies is pivotal for preserving system reliability,
expected or abnormal behavior that deviates from established security, and optimal Performance.
patterns, facilitating prompt intervention and the maintenance In response to the challenges posed by growing data com-
of system integrity. As the digital landscape becomes in- plexity, deep learning has emerged as a cornerstone of research
creasingly data-rich, traditional rule-based and statistical meth- in IoT systems; with the capacity to handle extensive datasets
ods [2] face challenges in effectively uncovering anomalies. and capture intricate patterns, deep learning methods are well-
The manifestation of anomalies in edge IIoT data refers to suited for analysis within IoT contexts. As these methods
unforeseen or irregular patterns observed within data collected generate data representations and integrate seamlessly into IoT
from edge devices. Detecting such anomalies in edge IoT ecosystems, they offer promising avenues for anomaly detec-
data is paramount for ensuring system reliability, security, and tion. This study delves into anomaly detection in Industrial IoT
optimal performance [3]. (IIoT) systems by evaluating a range of deep learning models,
The dynamic scale of data generation in the digital era GRU, LSTM, and CNN, as hybrid ML model variations. By
has propelled the ascendancy of deep learning in IoT sys- comparing the Performance of standalone models and hybrid
tems [4]–[7]. Deep learning’s ability to handle extensive combinations, we aim to uncover their strengths, limitations,
datasets surpasses conventional machine learning techniques, and capabilities in anomaly detection within IIoT data.
rendering it apt for analysis within IoT contexts. Its capacity This paper contributes to advancing edge IIoT security and
to dynamically generate data representations [8] and seamless anomaly detection knowledge. We provide insights into their
integration with IoT ecosystems [9] positions it as a valuable Performance and applicability through rigorous evaluation
asset. Consider a smart home scenario wherein IoT devices and comparison of various deep learning models. Our study
autonomously interact, birthing a fully intelligent dwelling [4]. enhances the existing body of knowledge by illuminating the
This synergy has prompted researchers to explore advanced strengths and weaknesses of CNN, GRU, CNN+GRU, and
2

Trusted and Secure


IIoT channel

IIoT Pakets
1 2 3 4

Fig. 1: Anomaly detection in IIoT Applications

LSTM models. It offers a comprehensive understanding of DL- learning, specifically a CNN model, has been presented in
based techniques for IIoT anomaly detection. Moreover, this previous work [15]. The outcomes of this approach reveal its
research lays the groundwork for future investigations into ad- effectiveness in adequately addressing local and global pat-
vanced techniques and hybrid models. These models leverage terns within time series data. This capability enables efficient
the diverse strengths of different deep learning architectures, analysis and anomaly detection across a spectrum of IIoT
potentially leading to more effective solutions for anomaly applications.
detection in IIoT systems. A Long Short-Term Memory (LSTM) ML approach is
The next sections of the paper are organized as follows. Sec- proposed to address the training challenges in traditional Re-
tion 2 summarizes the existing research in anomaly detection current Neural Networks (RNNs) [16]. The proposed LSTM
in edge IIoT systems, focusing on the models under evaluation approach demonstrated higher Performance in the experiments
in our study. We highlight the strengths and limitations of than other recurrent network algorithms, such as real-time re-
these approaches to contextualize our work. Section 3 outlines current learning, back-propagation through time, and recurrent
the specific contributions our research offers, detailing the cascade correlation. As an enhanced LSTM-based approach,
novel aspects and insights derived from our comprehensive a new technique known as Encoder-Decoder architecture is
evaluation of deep learning models. Section 4 explains our introduced for anomaly detection in time series data, present-
research methodology, encompassing the dataset used, the se- ing [17]. An Electrocardiogram (ECG), leveraging DL-based
lection and configuration of models, and the evaluation metrics Short-Term Memory (DLSTM) networks, is proposed [18].
employed to assess their Performance. Section 5 presents our This research highlights the effectiveness of LSTM networks
experimental setup, results, and an in-depth analysis of each in anomaly detection within time series data, demonstrating
model’s Performance. This section sheds light on the compar- adaptability across diverse domains and datasets.
ative efficacy of the evaluated models. Section 6 summarizes Another robust approach for anomaly detection is discussed
our findings, discusses the implications of our research, and in [19]. This method incorporates complete principal com-
provides recommendations for future investigations, paving the ponent analysis (PCA) into training deep autoencoders to
way for advancements in IIoT anomaly detection. It showcases enhance anomaly detection. This integration improves the
the potential of DL techniques for detecting IIoT Anomalies, model’s resilience to outliers and Performance in detecting
offering a solid foundation for further progress in this crucial anomalies. An enhanced research effort proposes a multivari-
domain. ate anomaly detection technique utilizing Generative Adver-
sarial Networks (GANs) and Gated Recurrent Units (GRUs)
II. RELATED WORK in their MAD-GAN framework, as outlined in [20]. This
Anomaly detection in edge IIoT (Industrial Internet of innovative approach combines the strengths of GANs and
Things) systems has garnered considerable interest among GRUs to effectively learn the underlying structure of time
researchers due to the increasing need to guarantee the depend- series data and accurately detect anomalies.
ability and safety of these systems. Researchers have investi- A new IIoT anomaly detection model, ESN-AE (Echo State
gated numerous ML-based methods to address this challenge. Network - Autoencoder), is introduced in recent research [21].
In this section, we examine existing research on detecting As highlighted in the documentation, the ESN-AE effectively
anomalies in edge IIoT systems, particularly emphasizing the combines neural networks with Echo State Networks (ESNs),
models assessed in this paper. making it particularly suitable for edge devices with resource
ML-based techniques employing CNNs find widespread constraints. Additionally, a composite autoencoder model tai-
application in anomaly detection within IIoT systems, offer- lored for anomaly detection in IIoT systems is put forward in
ing improved capabilities to capture spatial features in IoT another study [22]. Diverging from conventional autoencoders,
datasets. A data-driven fault diagnosis approach utilizing deep this model predicts and concurrently reconstructs input data,
3

leading to improved anomaly detection capabilities. A. Problem Statement


An unsupervised machine learning method is introduced This study addresses the imperative need for robust anomaly
for Anomaly in a different version of time dataset [23]. detection in IIoT environments, where the convergence of
This proposed unsupervised machine learning method utilizes diverse and dynamic data streams requires effective anomaly
a deep neural model that employs CNN and autoencoders detection methods. The challenge lies in developing a model
to improve effectiveness across various real-world datasets, that can simultaneously capture spatial and temporal patterns
underscoring the potential to address anomaly detection tasks. to accurately distinguish between normal and malicious activ-
In [24], a network intrusion detection system designed ities, ensuring the security and reliability of IIoT networks.
explicitly for imbalanced data is introduced. The proposed The research aims to devise a better solution that combines
method innovatively combines XGBoost with a weighted loss Gated Recurrent Units (GRU) and Convolutional Neural Net-
function to effectively address the challenges of imbalanced works (CNN) to tackle these intricacies and enhance anomaly
datasets. detection accuracy in Edge IIoT systems.
Another research effort to enhance IIoT network intrusion
detection is presented in [25]. This study introduces a deep
B. Novelty
hybrid learning model that integrates Attention-based Machine
Learning with a Fully Convolutional Neural Network (FCN), In this section, we delve deeper into the novel contributions
along with Gradient Boosting techniques (XGBoost and Ad- of our research. This research embodies a novel and holistic
aBoost) and Long Short Term Memory (LSTM). The results approach to anomaly detection, custom-tailored for Edge IoT
demonstrate the model’s efficiency in identifying anomalies in environments. It pushes the boundaries of intrusion detection
the traffic data of IoT devices, showcasing high Performance by addressing the unique challenges of this domain. It offers
and efficacy in detecting various cybersecurity attacks. While robust and accurate detection capabilities for known and
the primary focus is on network intrusion detection, this emerging threats, thus significantly advancing IIoT security.
approach holds the potential for adaptation to other anomaly The following are the main novel contributions of this work:
detection tasks, including those related to IoT. 1) Hybrid CNN+GRU Architecture: Our work introduces
An enhanced Intrusion Detection System (IDS) is pro- a novel integration of a hybrid CNN+GRU architec-
posed to secure IIoT applications by Douiba et al. [26]. The ture for anomaly detection in Edge IoT environments.
model utilizes decision tree (DT) algorithms and gradient This innovative approach capitalizes on the strengths of
boosting (GB), specifically with the open-source Catboost Convolutional Neural Networks in extracting features,
framework, for efficient IIoT anomaly detection. The IDS combined with ML algorithms like Gated Recurrent
model is evaluated across multiple datasets and achieves high- Units for temporal sequence analysis. This fusion of
performance metrics, including precision, recall, and accuracy. spatial and temporal analysis techniques represents a
The results highlight the model’s effectiveness in detecting pioneering solution to enhance the accuracy and robust-
and characterizing anomalies within IoT devices. Furthermore, ness of intrusion detection in complex IIoT settings.
a comprehensive survey on various IIoT network anomaly 2) Tailored to Edge IoT: While many intrusion detec-
detection techniques, including machine learning-based ap- tion systems are designed for traditional network en-
proaches, is presented in the review by Ahmed et al. [2]. vironments, our model is specifically tailored to the
In this survey, the authors thoroughly discuss and compare unique challenges of Edge IoT environments. This adap-
different machine learning algorithms and their Performance in tation addresses the inherent limitations of resource-
anomaly detection, providing valuable insights into this field. constrained Edge devices, making our approach espe-
Detecting anomalies in encrypted Internet traffic has become cially relevant and impactful for emerging IoT applica-
a pivotal area of research, given the increasing reliance on tions at the network’s edge.
encrypted services to safeguard consumer privacy. In a recent 3) Diverse Attack Type Evaluation: We extend the novelty
study closely related to our research, hybrid deep learning of our research through a comprehensive evaluation
techniques are applied to identify anomalies in encrypted of the proposed model’s Performance across a broad
network traffic [27]. This research employed deep learning spectrum of attack types. By assessing its ability to
models with different publicly available datasets, including accurately detect common attacks and specific instances
RNN, CNN, and LSTM. However, while valid, this approach of novel and evolving threats, our study contributes
suffers from the limitation of the combined model and the to enhancing IIoT anomaly detection by pushing the
utilization of older datasets, not fully capturing the intricacies boundaries of a hybrid DL-based detection approach.
of contemporary cyber threats. In contrast, our research lever- 4) Robust Normal Sample Detection: Besides its prowess
ages a more robust hybrid deep learning solution with the most in identifying anomalies, our model excels in robustly
recent dataset, providing a solution proven to be more accurate detecting standard samples, a critical aspect of in-
and practical in detecting current cybersecurity threats. trusion detection often overlooked in previous works.
This unique capability ensures that false positives are
minimized, further enhancing our research’s practical
III. CONTRIBUTION OF RESEARCH
relevance and real-world applicability.
In this section, we outline the critical contributions of our 5) Practical Relevance: The practical relevance of our work
paper to the field of ML-assisted IIoT security. is underscored by its potential to be deployed in real-
4

Fig. 2: Distribution of samples after preprocessing dataset

world IIoT scenarios, where the accurate and timely space’s complexity while retaining important in-
detection of anomalies is paramount for maintaining formation. Dimension reduction not only conserves
system integrity and security. By addressing the pressing computational resources but also aids in mitigating
need for effective intrusion detection in Edge IoT, our the curse of dimensionality, which is particularly
research contributes to the advancement of IIoT security, pertinent in IoT data.
making it highly relevant in today’s evolving technolog- 2) The Proposed Hybrid Convolutional Neural Network
ical landscape. (CNN) and Gated Recurrent Units (GRU) Architecture:
C. Methodology or Approach a) To capture both spatial and temporal patterns
within the data, we designed a novel hybrid ar-
Our methodology encompasses a carefully crafted pipeline
chitecture that combines CNN and GRU.
that considers Edge IoT datasets’ specific challenges and
b) CNN Component: The CNN component focuses
constraints. It leverages a hybrid security approach to ex-
on spatial feature extraction. It excels at detecting
tract spatial and temporal patterns efficiently. The extensive
patterns and features within the data invariant to
experimentation and use of performance metrics ensure a
translation, essential for capturing spatial charac-
thorough evaluation of the model’s capabilities. Additionally,
teristics in IoT sensor data.
the comparison with XGBoost provides valuable insights into
c) GRU Component: The GRU component, on the
our approach’s novel contributions and potential advantages
other hand, specializes in analyzing temporal se-
in anomaly detection within Edge IoT environments. The
quences. It is well-suited for capturing time-
following are the main points of the followed methodology:
dependent patterns and behaviors in IoT data,
1) Data Preprocessing:
which is crucial for understanding the dynamics
a) Feature Extraction: Data preprocessing is a crucial of IoT environments.
step in any machine learning task. In our study, we
performed data preprocessing specifically tailored 3) Model Training and Evaluation:
for Edge IoT datasets. This involved the extraction a) Extensive Experimentation: We conducted a rig-
of relevant features from the raw data. Given the orous experimental phase involving diverse attack
resource-constrained nature of Edge devices, we scenarios to ensure that the DL model Performance
focused on extracting features that are essential for is thoroughly evaluated under various real-world
anomaly detection, ensuring efficiency and effec- conditions and could effectively detect a wide
tiveness. range of potential threats.
b) Dimension Reduction: To further optimize the b) ML Metrics: To assess the quality of our hybrid
model for resource-constrained Edge IoT environ- model, we used a comprehensive set of perfor-
ments, we employed dimension reduction tech- mance metrics. These included accuracy, recall,
niques. These techniques help reduce the feature precision, and F1-score. These performance mea-
5

sures are needed to test the capabilities of identi- impact extends to a wide range of industry appli-
fying anomalies and minimizing false positives. cations.
4) Comparison with XGBoost: b) For industries reliant on Edge IoT, such as man-
a) As part of our methodology, we compared the ufacturing, healthcare, and utilities, this research
Performance of our hybrid CNN+GRU model with provides a reliable and versatile tool for early
that of the gradient-boosting algorithm XGBoost. detection and prevention of cyber intrusions in
b) This comparative analysis aimed to identify the many industry applications where any disruption
strengths and weaknesses of the hybrid model can have far-reaching consequences, including fi-
concerning anomaly detection. By contrasting it nancial losses and threats to public safety.
with XGBoost, a well-established and widely-used 4) Mitigating Evolving Security Threats:
machine learning algorithm, we gained insights a) The constantly evolving nature of cybersecurity
into the unique advantages our hybrid approach threats necessitates adaptable and robust solutions.
brings to the table. Our hybrid CNN+GRU model is well-suited to this
dynamic landscape.
D. Impact and Significance b) By continuously improving the accuracy and effec-
The significance of this research lies in its pioneering tiveness of anomaly detection in Edge IoT envi-
approach to Edge IoT anomaly detection through the develop- ronments, this research contributes to the ongoing
ment of a hybrid CNN+GRU model. Its potential to enhance battle against cyber threats. It empowers organiza-
cybersecurity in real-world deployments, its adaptability to tions to stay ahead of adversaries and proactively
evolving threats, and its practical applicability across indus- protect their critical systems and sensitive data.
tries underscore the far-reaching impact of this work. It serves
as a beacon of innovation in IoT security, providing a valu-
E. Future Directions
able asset for safeguarding our modern world’s increasingly
interconnected and vital systems. Future directions of this work could explore the applica-
1) Innovative Hybrid CNN+GRU Model: bility of more advanced deep learning architectures, such as
Transformers, to capture complex temporal relationships and
a) At the heart of this research lies developing a
patterns in Edge IoT data. Investigating ensemble techniques
hybrid CNN+GRU model specifically tailored for
that combine multiple models could enhance overall anomaly
Edge IoT anomaly detection. This model represents
detection robustness.
a novel fusion of deep learning techniques, com-
bining (CNNs) for spatial feature extraction and
(GRUs) for temporal sequence analysis. IV. METHODOLOGY AND EXPERIMENTAL SETUP
b) The significance of this innovation cannot be over- A. Dataset Description
stated. Edge IoT environments often present com-
This study employed a comprehensive dataset to detect
plex, heterogeneous data streams that require a
anomalies within Industrial Internet of Things (IIoT) networks,
multifaceted approach for accurate anomaly de-
as documented in [28]. This dataset encompasses a wide array
tection. Our model addresses this challenge head-
of network traffic data, containing regular traffic and various
on by seamlessly integrating spatial and temporal
attack scenarios such as Port Scanning, XSS, Ransomware,
analysis, offering a more holistic understanding of
Fingerprinting, and MITM. Data samples are collected from a
the data.
real-world industrial environment, featuring multiple devices
2) Enhanced Cybersecurity in Real-World IoT Deploy- and communication protocols commonly encountered in IIoT
ments: networks to ensure the representativeness and reliability of the
a) One of the most striking outcomes of this research data. Through the utilization of this extensive dataset, we were
is the model’s ability to accurately detect com- able to effectively test the quality of different DL models for
mon attack types and various novel and evolving detecting and mitigating cybersecurity threats within the IIoT
threats. Our proposed solutions aim to advance the domain.[29-34].
knowledge and have implications for enhancing The dataset comprises 2,219,201 instances and 63 features,
cybersecurity in real-world IoT deployments. all meticulously collected to investigate and analyze cyberse-
b) As IoT continues to increase across industries, the curity threats within edge computing for IIoT applications.
security of these interconnected systems becomes It encompasses a wide array of information, including at-
increasingly critical. Our model’s capacity to iden- tributes related to network traffic, protocol-specific parame-
tify emerging threats, combined with its ability ters, and various attack types. These features exhibit diverse
to distinguish regular traffic, offers a formidable data types, encompassing numerical (float64) and categori-
defense mechanism for safeguarding these deploy- cal (object) data. Key features include network communi-
ments. cation attributes like IP addresses ([Link] host, [Link] host),
3) Practical Relevance and Industry Applications: ARP protocol details ([Link], [Link]), ICMP protocol
a) Beyond academic achievement, the practical rel- characteristics ([Link], [Link] le), HTTP protocol
evance of this research cannot be overstated. Its fields ([Link] length, [Link], [Link]),
6

and TCP/UDP protocol properties ([Link], [Link], [Link], Algorithm 1: Proposed Hybrid Deep Learning Archi-
[Link]). tecture
Additionally, the dataset contains features associated Define the Convolutional Neural Network (CNN) model
with domain name system (DNS) queries ([Link], cnn input ← Input(shape = input shape)
[Link]) and the MQTT protocol, Message Queu- cnn layer ← Conv1D(′relu′)(cnn input)
ing Telemetry Transport ([Link], [Link], cnn layer ← MaxPooling1D
[Link]). The dataset’s target variable, labeled ”At- Define the Gated Recurrent Unit (GRU) model
tack type,” is a categorical attribute that represents 15 distinct gru input ← Input(shape = input shape)
classes of cybersecurity threats. These classes encompass var- gru layer ← GRU (′tanh′)(gru input)
ious threats, including Distributed Denial of Service (DDoS), Concatenate the outputs of the CNN and GRU models
ransomware, man-in-the-middle (MITM) attacks, and port concat layer ← concatenate([cnn layer, gru layer])
scanning. Before analysis, the dataset undergoes preprocessing Classification layer
steps, including eliminating unnecessary columns, addressing output layer ←
missing and duplicate values, and randomizing the data order. Dense(num classes,′ softmax′)(concat layer)
Figure 2 provides a detailed breakdown of the attack types and Combined CNN and GRU Model
the corresponding number of instances for each attack class model ← Model(inputs =
before applying oversampling techniques. [cnn input, gru input], outputs = output layer)
For data transformation, categorical variables are subjected
to one-hot encoding, while the target variable undergoes label
encoding. To tackle the class imbalance issue, we employ the use filters to identify structures and patterns from datasets,
RandomOverSampler method, which involves oversampling enabling the model to acquire meaningful representations. The
the minority classes. This technique generates synthetic in- proposed model adeptly captures hierarchical representations
stances for the underrepresented classes to match the sample of the input by stacking multiple layers of the DL network
count of the majority class. The dataset attains a more balanced with increasing filter sizes.
distribution by introducing additional instances, allowing ma- Conversely, GRUs, recurrent neural networks (RNN), in-
chine learning models to gain insights from a broader range corporate gating mechanisms that selectively update and reset
of instances. their internal states dedicated to modeling data dependencies.
We utilize the RandomOverSampler from the learning li- This functionality allows the model to retain and propagate
brary to implement random oversampling. The rationale be- crucial information across time steps, capturing long-term
hind this approach is to provide the model with a more rep- dependencies in the sequence. The GRU layer within the
resentative view of the minority classes, facilitating enhanced model utilizes these gating mechanisms to learn and represent
anomaly detection within these less frequent categories. The temporal patterns within the data.
introduction of synthetic instances enables the model to cap- Combining CNN and GRU models through concatenating
ture the distinctive patterns and characteristics specific to the their output layers permits the fusion of both spatial and
minority classes, leading to improved overall Performance temporal features. This fusion affords comprehensive data
and accuracy. However, it’s crucial to exercise caution when comprehension, enabling the model to make precise predic-
employing oversampling techniques, including random ones, tions. Algorithm 1 provides a high-level overview of the
as they must be carefully evaluated to prevent potential issues model. By harnessing the complementary strengths of CNNs
like overfitting or introducing biases. Alternative methods may and GRUs, the CNN+GRU architecture strikes a balance be-
need to be considered depending on the dataset’s specific tween capturing local spatial features and modeling temporal
characteristics and research objectives. dynamics.
Following oversampling, our dataset consists of two main Through rigorous experimentation and evaluation, our work
parts: (training and testing sets). Subsequently, feature scaling has substantiated the efficacy of the CNN+GRU model. It has
is performed using MinMaxScaler, and the input data and consistently performed with high accuracy, precision, recall,
target variables are reshaped to meet the prerequisites of deep and F1 score in detecting anomalies within the industrial IoT
learning models. dataset. The ability to discern intricate spatial and temporal
patterns empowers it to accurately identify abnormal instances,
B. Proposed Hybrid CNN+GRU model: facilitating proactive security measures in industrial IoT sys-
tems. Below, we provide pseudocode, and Figure 3 illustrates
Our deep learning (DL) model leverages the strengths the architecture of this hybrid model.
of both Convolutional Neural Networks (CNNs) and Gated
Recurrent Units (GRUs) to excel in IIoT anomaly detection.
This architectural fusion effectively captures inherent spatial C. Model Descriptions
and temporal information in datasets suitable for analyzing 1) 1D-CNN Model Overview: The 1D-CNN model pro-
intricate sequences, as encountered in industrial IoT applica- posed in this study uses the Keras Sequential API. It comprises
tions. multiple layers designed to learn using valuable features from
In the CNN segment of the model, convolutional layers our dataset and generate predictions. This model commences
are employed to extract features from datasets. These layers with an input layer configured to accommodate data in a
7

CNN Layers
5 CNN1D, 5 Maxpool layers
Input Hidden
Dense layer
layers layers

Concatenate Feature vectors from


Dense layers

Sampling Softmax layer


Preprocessing
Data Class 0
x1
Train
x2 Class 1
x3
x4
Class 2
Test y1
y2
Class 3

Input layer Dense layer

2 GRU Layers

Fig. 3: The Architecture of the Proposed CNN+GRU Hybrid Deep Learning Model

CNN

CNN1D CNN1D
Sampling Maxpool layer Upsampling layers
Preprocessing
Data
Train

GRU

Test
Compressed
Encoder Representation Decoder

LSTM

Fig. 4: Autoencoder-based Models Architecture

shape aligned with the dataset’s dimensions. Subsequently, of tanh and sigmoid activation functions to update and reset
five Conv1D layers are added, each equipped with a distinct gates. The model also integrates two dense layers, employing
number of filters and a ReLU activation function. These layers ReLU activation functions, with 32 and 16 units, respectively.
are instrumental in extracting information using the dataset via The ultimate external layer uses a softmax activation function
filtering. Following each CNN layer, a MaxPooling1D layer with a number of units aligned with the dataset’s class count.
with a size of 2 is applied to mitigate data dimensionality This GRU-based model effectively captures short- and long-
and capture significant features. Subsequently, a flattened term dependencies within sequential data, which is well-suited
layer converts the feature maps into a one-dimensional vector. for diverse classification tasks.
Furthermore, the model incorporates two dense layers: the first 3) Overview of Hybrid CNN-GRU Model: The proposed
dense layer, comprising 64 neurons and a ReLU activation model represents a hybrid neural network, integrating Convo-
function, focuses on learning global features, while the second lutional Neural Networks (CNNs) and Gated Recurrent Units
dense layer, featuring the same number of neurons as the (GRUs) to proficiently process and learn from sequential data.
dataset’s class count, utilizes a softmax activation function to Constructed using the Keras Functional API, it is designed
yield class probabilities, enabling the final prediction. with two distinct branches: the CNN branch and the GRU
2) GRU Model: The proposed GRU model is constructed branch. The model’s architectural framework is outlined as
using the Keras Sequential API. It encompasses two GRU follows:
layers, with the initial layer comprising 32 units and the sub- CNN Branch:
sequent layer comprising 64. These GRU layers utilize a blend 1) Input Layer: The model ingests data with dimensions
8

aligned to the dataset’s structure. data and reduce its dimensionality. The decoder section uses
2) Convolutional Layers: The CNN branch comprises two upsampling and convolutional layers to reconstruct the original
convolutional layers, one with 64 and 128 filters, each input from the encoded representation. Figure 4 provides an
utilizing a ReLU activation function. architectural overview of the autoencoder models.
3) MaxPooling Layers: Max-pooling layers with a pool size Model 1:
of 2 are strategically placed between the convolutional Encoder:
layers to reduce spatial dimensions and enhance com- 1) Input Layer: The model takes input dimensions aligned
putational efficiency. with the dataset.
4) Flatten Layer: Following the final max-pooling layer, a 2) Convolutional Layers: Three convolutional layers with
flattened layer converts the 3D output from the convo- 32, 64, and 128 filters are used, each with a kernel size
lutional layers into a 1D vector. 3 and ReLU activation.
5) Dense Layer: The concluding layer within the CNN 3) MaxPooling Layers: Max-pooling layers with a pool size
branch is a dense layer with 64 units and a ReLU of 2 are placed between convolutional layers to reduce
activation function, enabling the model to grasp higher- dimensions.
level features derived from the spatial data. Decoder:
GRU Branch: 1) Convolutional Layers: The decoder comprises three con-
1) Input Layer: Similar to the CNN branch, the GRU volutional layers with 128, 64, and 32 filters, using
branch’s input layer accommodates data of the same ReLU activation.
dimensions. 2) UpSampling Layers: Upsampling layers with a size of
2) GRU Layer: The GRU layer, featuring 32 units, employs 2 restore spatial dimensions.
a ’tanh’ activation function for gate updates and a Autoencoder: The autoencoder combines the encoder and
’sigmoid’ activation function for reset gates. Recurrent decoder models with the same input as the encoder and output
dropout is disabled (set to 0), and the layer avoids from the decoder.
unrolling the recurrent loop for efficiency. Bias terms are Classifier:
integrated into the update and reset gate computations, 1) The autoencoder serves as the initial classifier layer.
and the hidden states reset after each sequence. 2) Conv1D and MaxPooling1D layers perform feature ex-
3) Dense Layer: After the GRU layer, a dense layer with traction and dimensionality reduction.
32 units and a ’tanh’ activation function is added, 3) The last MaxPooling1D output is flattened.
facilitating the model in discerning intricate patterns and 4) Dense layers with ReLU activation process features.
features from temporal data. 5) A final Dense layer with softmax activation provides
Integration of Branches: Upon processing input data class probabilities.
through the CNN and GRU branches, the outputs undergo con- Model 2:
catenation using the concatenate layer. This amalgamated rep- Classifier:
resentation encompasses spatial and temporal features gleaned 1) LSTM layers replace Conv1D and MaxPooling1D layers
from both branches, enriching the final decision-making pro- for sequence processing.
cess. 2) LSTM layers return sequences, extracting features from
Output Layer: The ultimate layer consists of a dense them.
layer with (num classes) units and a softmax activation func- 3) A Dense layer with softmax activation is used for
tion. The softmax function furnishes a probability distribution classification.
across classes, enabling the model to make a final prediction Model 3:
based on the highest probability. Figure 1 illustrates the Classifier:
structure of the unified CNN-GRU model.
1) Instead of Conv1D and MaxPooling1D layers, a GRU
(Gated Recurrent Unit) layer is employed for sequence
D. Autoencoder Models Overview processing.
Hybrid Models: The proposed models are hybrid neural 2) The GRU layer comprises 32 units and utilizes ’tanh’
networks combining an autoencoder with CNN, LSTM, and activation for gate updates and ’sigmoid’ activation for
GRU networks to process and learn from input data efficiently. reset gates.
Each model is designed using the Keras Functional API and 3) A Dense layer with ’tanh’ activation further processes
consists of two primary components: the encoder-decoder features.
(autoencoder) module and the CNN, LSTM, or GRU mod- 4) The final Dense layer with softmax activation provides
ules. The employed autoencoder type is a basic convolutional a class probability distribution for classification.
autoencoder, utilizing convolutional layers for both encoding 1) Overview of the LSTM Model: The proposed model is
and decoding. These layers excel in capturing spatial patterns an LSTM-based classifier designed specifically for time series
and features in the input data. classification tasks. This architecture comprises a sequence of
Encoder-Decoder (Autoencoder) Module: The encoder LSTM layers followed by dense layers for the classification
section of the autoencoder employs convolutional layers with task. The key components of the model’s architecture are
decreasing filters to extract essential features from the input outlined below:
9

Fig. 5: CNN Loss Fig. 7: GRU Loss

Fig. 6: CNN Accuracy Fig. 8: GRU Accuracy

V. EVALUATION AND RESULTS


1) LSTM Layer 1: The first LSTM layer serves as the This section comprehensively evaluates the Performance
input layer, necessitating the specification of the in- of various deep-learning models employed for anomaly de-
put shape. It incorporates 128 units and employs the tection within encrypted IoT traffic. It involves an in-depth
’tanh’ activation function for transformations within the examination of the outcomes, emphasizing the strengths and
LSTM units. Notably, this layer is configured to return weaknesses of each model while drawing comparisons based
sequences, ensuring it outputs a sequence of the same on performance metrics such as accuracy, precision, recall,
length for the subsequent layer’s use rather than just the and F1-score. The section also delves into the convergence
last timestep’s output. patterns of the models and scrutinizes noteworthy observations
2) LSTM Layer 2: The second LSTM layer comprises 256 or trends. The evaluation incorporates multiple performance
units and adopts the ’tanh’ activation function. Unlike metrics: Loss, Accuracy, Recall, Precision, F1-Score, and
the preceding layer, it does not return sequences, out- False Alarm Rate (FAR).
putting only the final output of the LSTM sequence. This
design facilitates seamless connectivity with a traditional
dense layer. A. Assessing Model Performance
3) Dense Layer (Output Layer): The ultimate layer in The XGBoost model is the top performer in accuracy, recall,
the model is a dense layer with several units equiva- precision, and F1 score. It attains an accuracy rate of 96.41%
lent to the number of classes in the classification task , indicating its proficiency in correctly categorizing most sam-
(num classes). This layer incorporates a softmax activa- ples. A recall rate of 96.50% underscores its effectiveness in
tion function, generating a probability distribution across identifying actual positive instances, while a precision rate of
the classes. This characteristic makes it particularly well- 98.57% signifies its ability to maintain a low false positive rate.
suited for multi-class classification tasks. The high F1-score, standing at 96.03%, indicates a harmonious
balance between precision and recall, reflecting the model’s
This model effectively harnesses the capabilities of LSTM overall Performance. In a related research endeavor that im-
layers for processing and learning from sequential data, es- plemented the Catboost model for intrusion detection in IoT
tablishing an efficient and robust solution for time-series systems [26], the reported results indicated a training accuracy
classification endeavors. of 100% and a validation accuracy of 99.27%. Furthermore,
10

TABLE I: Performance of the models


Model Loss Accuracy Recall Precision F1-Score FAR Training Time (min) Training Epochs
CNN 0.10211 0.94839 0.92303 0.98384 0.95196 0.00108 18 23
GRU 0.12445 0.93981 0.91766 0.97027 0.9428 0.002 35 17
GRU+CNN 0.09985 0.9494 0.92288 0.98494 0.95239 0.001 97 73
LSTM 0.12607 0.93939 0.91288 0.97455 0.94219 0.0017 36 14
Autoencoder+GRU 1.26009 0.71425 0.71425 0.71425 0.71425 0.02041 12 7
Autoencoder+CNN 0.11195 0.94695 0.92194 0.98104 0.95009 0.00127 32 30
Autoencoder+LSTM 0.23309 0.91507 0.87954 0.97054 0.92207 0.0019 37 12
XGBoost 0.0672 0.9641 0.965 0.9857 0.9603 0.0023 16 25

Fig. 9: CNU+GRU Loss Fig. 11: LSTM Loss

Fig. 10: CNN+GRU Accuracy Fig. 12: LSTM Accuracy

they achieved commendable values for precision (98.42%) 92.28% , highlighting its proficiency in accurately identifying
and recall (98.78%), signifying outstanding Performance in anomalies, particularly in the nuanced context of IoT data.
intrusion classification. Additionally, a precision rate of 98.49% and an F1-score of
Comparing the two models, it becomes evident that Cat- 95.24% further underscore its capacity to categorize anoma-
boost and XGBoost attained impressive accuracy rates and lies while effectively minimizing false positives. Moreover,
excelled in classifying intrusions. The Catboost model reported the meager False Alarm Rate (FAR) of 0.001% signifies
a slightly higher accuracy during training, but both models the model’s skill in avoiding unnecessary alerts, a critical
exhibited robust precision and recall scores. It is imperative characteristic in practical applications. Integrating spatial and
to consider a variety of evaluation metrics, assess potential temporal features through the incorporation of CNN and GRU
overfitting, and analyze the problem context before concluding layers plays a pivotal role in the exceptional Performance of
that high accuracy alone signifies a superior model. this model. The CNN+GRU model demonstrates the synergy
In our comprehensive assessment of various machine learn- between these two architectural components and showcases
ing models designed for IoT security anomaly detection, the its adaptability to intricate, multi-dimensional datasets such
XGBoost algorithm leads the way, closely followed by the as IoT traffic. These performance metrics firmly establish
impressive CNN+GRU model. The CNN+GRU model, which the CNN+GRU model as a potent tool for fortifying IoT
combines CNN GRU, stands out in multiple aspects. Notably, environments against cyber threats.
it demonstrates an accuracy rate of 94.94% and a recall rate of Shifting the focus to the other models, namely CNN, GRU,
11

Fig. 13: Autoencoder+CNN Loss Fig. 15: Autoencoder+GRU Loss

Fig. 14: Autoencoder+CNN Accuracy Fig. 16: Autoencoder+GRU Accuracy

and LSTM, these models consistently achieve high accuracy


levels ranging from 93.94% to 94.94%. While they may not
surpass the Performance of XGBoost, their accuracy rates
affirm their capability to classify most samples accurately.
The recall rates, ranging from 91.29% to 92.29% , indicate
their efficacy in capturing actual positive instances, while
the precision rates, ranging from 97.03% to 98.49% , reveal
their proficiency in minimizing false positive instances. Ad-
ditionally, the F1 scores, ranging from 94.22% to 95.24%,
further underscore the overall Performance of these models in
achieving a harmonious balance between precision and recall.
Conversely, the Autoencoder + GRU model exhibits the
poorest Performance among all the models across the metrics,
with an accuracy rate of 71.43%. These metrics reveal a Fig. 17: XGBoost Loss
significant disparity compared to the other models, suggesting
a reduced capability in classifying diverse attack types. A more
in-depth analysis is warranted to investigate the underlying classification capabilities across various attack types. Each row
causes of this subpar Performance and explore potential av- in the matrix represents the actual labels, while each column
enues for enhancing the model’s effectiveness. corresponds to the predicted labels. The numbers within the
It is essential to assess the strengths and limitations of each matrix denote the count of samples falling into each category.
model thoroughly. XGBoost, utilizing its gradient boosting al- The XGBoost model impressively exhibits accuracy and preci-
gorithm, consistently demonstrates robust Performance across sion in classifying diverse attack types. It effectively identifies
all metrics. Notably, it greatly benefits from oversampling the a substantial number of samples belonging to categories such
data for minority classes, rendering it a suitable choice for as DDoS TCP, DDoS HTTP, DDoS ICMP, MITM, Finger-
anomaly detection. Figure 22 depicts the confusion matrix printing, DDoS UDP, Password, Port Scanning, Ransomware,
of the XGBoost model, the top-performing model in our SQL Injection, Uploading, Vulnerability Scanner, and XSS.
study. This matrix offers valuable insights into the model’s Nonetheless, there are instances of misclassifications evi-
12

models consistently exhibit strong Performance in anomaly


detection. Table I and Figure 20 present an overview of the
performance metrics for each model, while Figure 23 offers
a comparative analysis of the models across each metric. The
Autoencoder + GRU model shows room for improvement and
warrants further investigation. These findings contribute to a
deeper understanding of model performance and can serve as
a guide for selecting appropriate models for anomaly detection
in encrypted IoT traffic.
The confusion matrix depicted in Figure 21 provides valu-
able insights into the CNN+GRU hybrid model’s Performance
in categorizing various types of attacks. In this matrix, the rows
represent the actual values, and the columns show the pre-
dicted values, with the numerical values indicating the sample
Fig. 18: Autoencoder+LSTM Loss
count for each category. The CNN+GRU model demonstrates
a high accuracy level in correctly identifying most attack types.
Notably, it effectively classifies a significant portion of samples
from categories such as Vulnerability scanning, DDoS TCP,
Uploading, DDoS HTTP, Ransomware, DDoS ICMP, XSS
MITM, Fingerprinting, DDoS UDP, Password, SQL injection,
and Port Scanning. However, it’s essential to acknowledge
that the model does experience some misclassifications. For
instance, misclassified samples are in the Backdoor category,
indicating a challenge in accurate distinction. Similarly, a few
samples in the Normal category are misclassified as other types
of attacks.
Figure 22 displays the confusion matrix for the XGBoost
model, which emerged as the top performer in our study.
Fig. 19: Autoencoder+LSTM Accuracy This matrix provides insights into the model’s classification
capabilities across various attack types. Each row corresponds
to the actual labels, and each column represents the predicted
dent in the confusion matrix. For example, the model may labels, with the matrix entries denoting sample counts within
encounter challenges when precisely distinguishing samples each category. The XGBoost model exhibits high accuracy
belonging to the Backdoor category, resulting in a few cases and precision in classifying various attack types. It effectively
needing to be misclassified. Additionally, a small number identifies a substantial number of samples from categories in-
of samples in the Normal category are erroneously labeled cluding Vulnerability scanning, DDoS TCP, Uploading, DDoS
as other types of attacks. The XGBoost model showcases HTTP, Ransomware, DDoS ICMP, XSS MITM, Fingerprint-
exceptional Performance by accurately classifying a broad ing, DDoS UDP, Password, SQL injection, and Port Scanning.
spectrum of attack types and boasting outstanding precision However, there are instances of misclassifications within the
and recall. Its minimal false alarm rate underscores its ability confusion matrix. Figure 22 provides valuable insights into
to effectively minimize false positives, thereby ensuring high the XGBoost model’s classification capabilities for various
accuracy in detecting anomalies within encrypted IoT traffic. attack types, making it evident that the model excels in
The deep learning models, including CNN, GRU, accurately categorizing many attack types. Each row in the
GRU+CNN, and LSTM, consistently exhibit commendable matrix corresponds to the actual labels, while each column
accuracy rates and demonstrate their effectiveness in captur- represents the predicted labels, and the matrix entries indicate
ing anomalies. Notably, the CNN model’s Performance was the sample counts for each category. The XGBoost model
solid on the original dataset. In contrast, the Performance demonstrates exceptional accuracy and precision in classifying
of the other models improved significantly when the sample various attack types, successfully identifying a significant
count for smaller classes was augmented using oversampling number of samples belonging to categories such as Vulner-
techniques. These techniques indicate that the CNN model’s ability scanning, DDoS TCP, Uploading, DDoS HTTP, Ran-
Performance is slightly degraded when confronted with an somware, DDoS ICMP, XSS MITM, Fingerprinting, DDoS
increased sample count for smaller classes. The Autoencoder UDP, Password, SQL injection, and Port Scanning. Nonethe-
+ GRU model, although not performing at the same level as less, some misclassifications are observed within the confusion
the others, still provides valuable insights into the potential of matrix.
combining autoencoder-based feature extraction with the GRU When comparing the quality of the XGBoost, CNN, and
architecture. GRU hybrid models in anomaly detection, it becomes apparent
In conclusion, the XGBoost model emerges as the top that each model possesses distinct strengths for different
performer, while the CNN, GRU, CNN+GRU, and LSTM attack types. The XGBoost model detects Normal and MITM
13

Fig. 20: Performance Comparison

Fig. 21: Confusion matrix for CNN+GRU model Fig. 22: Confusion Matrix of XGboost model

B. Convergence of the Models


attacks, demonstrating impressive accuracy and a reduced The convergence behavior of the models plays a vital role in
false positive rate. However, it faces challenges in accu- assessing their Performance, offering insights into the training
rately classifying Backdoor and Password attacks, indicating process, convergence speed, stability, and overall efficiency.
room for improvement. In contrast, the CNN+GRU hybrid In this study, we investigated the convergence of several
model exhibits outstanding precision in detecting Backdoor, models, including CNN, GRU, LSTM, CNN+GRU, Autoen-
DDoS TCP, and DDoS UDP attacks, highlighting its potential coder+CNN, Autoencoder+GRU, Autoencoder+LSTM, and
for handling network-based attacks effectively. Additionally, XGBoost.
it achieves high accuracy in identifying DDoS HTTP and Figures 5–17 provide valuable information about each
XSS attacks, showcasing its proficiency in recognizing web- model’s training progress, illustrating the trends in Loss and
based threats. Nevertheless, the CNN+GRU model may require accuracy across different epochs. Let’s delve into the individ-
further fine-tuning to address misclassifications related to ual Performance of each model:
Fingerprinting and Password attacks. Overall, both models For the CNN model (Figure 5 and Figure 6), we observe
show promising results, with the CNN+GRU hybrid model a gradual decrease in Loss and a simultaneous increase in
excelling in network-based and web-based attack detection. In accuracy as the number of epochs increases. The CNN archi-
contrast, the XGBoost model performs exceptionally well in tecture exhibited relatively rapid convergence, with an average
detecting Normal and MITM attacks with high precision. per-epoch runtime of 17.5 minutes and 23 training epochs.
14

Fig. 23: Individual Comparison of metrics for different models

Such a finding suggests that the CNN model efficiently learned the combined complexity of both architectures and the model’s
features from the dataset and quickly achieved convergence. need to extract spatial and temporal features simultaneously.
The model’s ability to capture spatial information through Nevertheless, despite the longer convergence duration, the
convolutional layers and its simplicity contributed to its swift hybrid model demonstrated superior Performance, highlighting
convergence. the effectiveness of integrating both CNN and GRU.
Likewise, the GRU model (Figure 7 and Figure 8) shows Conversely, the LSTM model follows a different trajectory,
an overall decreasing loss trend. However, it’s worth noting as shown in Figure 11 and Figure 12. While the Loss decreases
some training and validation accuracy fluctuations during the gradually over time, there is a notable dip in accuracy during
training process. Despite these fluctuations, the GRU model the initial epochs. This pattern may signify a slower conver-
exhibited efficient convergence, with an average per-epoch gence rate or difficulties in capturing temporal dependencies
runtime of 35 minutes and 17 training epochs. The GRU in the data. The LSTM model exhibited a slightly slower
architecture, belonging to the family of recurrent neural net- convergence rate than CNN and GRU, necessitating an average
works (RNNs), excels at capturing temporal dependencies in per-epoch runtime of 36 minutes and 14 training epochs
sequential data. The model successfully learned the tempo- to reach convergence. The LSTM’s proficiency in modeling
ral patterns in encrypted IoT traffic, leading to convergence long-term dependencies makes it suitable for handling intri-
within a reasonable number of epochs. The observed accuracy cate sequential data. However, the added complexity of the
fluctuations may indicate the model’s sensitivity to specific LSTM architecture and the extended sequence length present
data patterns. The CNN+GRU model, as depicted in Figure in encrypted IoT traffic likely contributed to the extended
9 and Figure 10, displays a consistently decreasing loss convergence duration.
curve paired with a corresponding increase in accuracy. This The Autoencoder-based models, namely Autoencoder+CNN
behavior indicates a steady convergence towards the optimal and Autoencoder+GRU, displayed distinctive patterns in their
solution, underscoring the efficacy of merging CNN and GRU training trajectories. As depicted in Figure 13 and Figure
architectures for anomaly detection. The CNN+GRU hybrid 14, the Autoencoder+CNN model initially exhibited a gradual
model, leveraging the strengths of both architectures, exhib- reduction in Loss over a few epochs, followed by a sharp
ited a more extended convergence time than the individual decline, and eventually settled into a gradual decrease until
models. It took 97 minutes and 73 training epochs to achieve convergence. Conversely, Figure 15 and Figure 16 demonstrate
convergence. The ample convergence time can be attributed to that the Autoencoder+GRU model maintained a consistent loss
15

curve without significant fluctuations. Interestingly, the Au- VI. CONCLUSION


toencoder+LSTM model showed rapid convergence within six In our exploration of deep-learning models for edge IIoT
epochs, succeeded by a noticeable spike in loss values, indica- anomaly detection, we have assessed various neural network
tive of overfitting. These models exhibited diverse convergence architectures, including CNN, GRU, CNN+GRU, LSTM, XG-
behaviors, as their objective is to acquire a compressed version Boost, and Autoencoder-based models. The top-performing
of the dataset through an unsupervised learning approach. model is XGBoost, consistently achieving high accuracy
The convergence time and number of epochs required for (96.41%), precision (98.57%), recall (96.50%), and F1 score
convergence varied based on the autoencoder architecture’s (96.03%). The CNN+GRU hybrid model closely follows, with
intricacy and the reconstruction error optimization. Generally, an accuracy of 99.94%. This hybrid model combines CNN’s
the convergence time was shorter than the deep learning mod- spatial feature extraction with GRU’s temporal sequence mod-
els, with per-epoch runtimes ranging from 12 to 37 minutes eling, proving its effectiveness in capturing the data’s local
and 7 to 30 training epochs. patterns and long-term dependencies.
In contrast to the Autoencoder-based models, the XGBoost However, Autoencoder-based models exhibit lower Perfor-
model exhibited a seamless convergence pattern, as depicted mance, with an accuracy of 71.43% and limited precision,
in Figure 17. The Loss consistently diminished over the recall, and F1 scores. This suggests that the unsupervised
epochs, mirroring the convergence behavior observed in the nature of Autoencoders may need to be revised to identify
CNN+GRU model. It also underscores the effectiveness of anomalies in complex edge IIoT data accurately. Further
the XGBoost algorithm in achieving a gradual reduction in improvements are necessary for these models.
Loss and fine-tuning the model’s Performance. The XGBoost Consideration of computational efficiency is vital for real-
model, renowned for its proficiency in handling tabular data world implementation. The CNN model is the most efficient,
and excelling in classification tasks, demonstrated efficient with faster convergence and shorter training times. In con-
convergence with a per-epoch runtime of 16 minutes and a trast, the CNN+GRU model demands more computational
total of 25 training epochs. XGBoost harnesses the power of resources but excels in detection performance. Striking a
gradient boosting to optimize the objective function, resulting balance between Performance and computational efficiency
in rapid convergence and high predictive accuracy. is crucial for resource-constrained edge IIoT environments.
Future work will explore optimization techniques, including
Figure 24 provides insights into the number of epochs and data augmentation, to enhance the efficiency and Performance
corresponding training time required by each model, shedding of the CNN+GRU model.
light on their computational efficiency. Models with shorter
training times generally exhibit higher efficiency and demand ACKNOWLEDGEMENT
fewer computational resources. A shorter training duration can
prove advantageous, especially when dealing with extensive The authors would like to thank the Air Force Research Lab
datasets or conducting hyperparameter tuning. (AFRL) at the US Wright Patterson Air Force Base (WPAFB),
Dayton, Ohio under the Assured Digital Microelectronics
Overall, upon scrutinizing the training curves of the models, Education and Training Ecosystem (ADMETE). This grant
it becomes evident that both the CNN+GRU and XGBoost is awarded to Wright State University, Dayton, Ohio, USA,
models showcase relatively stable and favorable convergence under the grant number 047814256.
trends. The CNN+GRU model consistently reduces Loss,
while the XGBoost model demonstrates a smooth and steady REFERENCES
decline in loss values. These models can be regarded as
[1] T. H. A. Musa and A. Bouras, “Anomaly detection: A survey,” in
top performers in convergence and optimization. Conversely, Proceedings of Sixth International Congress on Information and Com-
the Autoencoder-based models exhibit diverse convergence munication Technology, X.-S. Yang, S. Sherratt, N. Dey, and A. Joshi,
behaviors and may warrant further investigation to enhance Eds. Singapore: Springer Singapore, 2022, pp. 391–401.
[2] M. Ahmed, A. Naser Mahmood, and J. Hu, “A survey of network
their Performance and mitigate issues related to overfitting. anomaly detection techniques,” Journal of Network and Computer
Applications, vol. 60, pp. 19–31, 2016.
In summary, the training curves offer valuable insights into [3] A. Diro, N. Chilamkurti, V.-D. Nguyen, and W. Heyne, “A comprehen-
the convergence tendencies of the models. The convergence sive study of anomaly detection schemes in iot networks using machine
behaviors of these models varied, influenced by their respec- learning algorithms,” Sensors, vol. 21, no. 24, 2021.
[4] H. Li, K. Ota, and M. Dong, “Learning iot in edge: Deep learning for
tive architectures and the complexity of the task. The CNN and the internet of things with edge computing,” IEEE network, vol. 32,
GRU models demonstrated comparatively swift convergence, no. 1, pp. 96–101, 2018.
whereas the LSTM, hybrid CNN+GRU model and XGBoost [5] S. Shadroo, A. M. Rahmani, and A. Rezaee, “The two-phase scheduling
based on deep learning in the internet of things,” Computer Networks,
models demanded more time to reach convergence. However, vol. 185, p. 107684, 2021.
it’s worth noting that both the CNN+GRU and XGBoost [6] M. A. Rahman and M. S. Hossain, “An internet-of-medical-things-
models displayed desirable convergence patterns. In contrast, enabled edge computing framework for tackling covid-19,” IEEE In-
ternet of Things Journal, vol. 8, no. 21, pp. 15 847–15 854, 2021.
the Autoencoder-based models exhibited distinctive patterns [7] J. Xiong, S. Bharati, and P. Podder, “Machine and deep learning for iot
that warrant further investigation and attention. Understanding security and privacy: Applications, challenges, and future directions,”
these convergence characteristics aids in evaluating the train- Security and Communication Networks, vol. 2022, p. 8951961, 2022.
[8] F. Liang, W. Yu, X. Liu, D. Griffith, and N. Golmie, “Toward edge-
ing stability of the models and pinpointing areas with potential based deep learning in industrial internet of things,” IEEE Internet of
for improvement. Things Journal, vol. 7, no. 5, pp. 4329–4341, 2020.
16

Fig. 24: Training time and epoch for different models


[24] T.-T.-H. Le, Y. E. Oktian, and H. Kim, “Xgboost for imbalanced
multiclass classification-based industrial internet of things intrusion
[9] Z. M. Fadlullah, F. Tang, B. Mao, N. Kato, O. Akashi, T. Inoue, and detection systems,” Sustainability, vol. 14, no. 14, 2022.
K. Mizutani, “State-of-the-art deep learning: Evolving machine intel- [25] M. Shahin, F. F. Chen, A. Hosseinzadeh, H. Bouzary, and R. Rashidifar,
ligence toward tomorrow’s intelligent network traffic control systems,” “A deep hybrid learning model for detection of cyber attacks in industrial
IEEE Communications Surveys & Tutorials, vol. 19, no. 4, pp. 2432– IoT devices,” The International Journal of Advanced Manufacturing
2455, 2017. Technology, vol. 123, no. 5, pp. 1973–1983, 2022.
[10] R. Chalapathy and S. Chawla, “Deep learning for anomaly detection: A [26] M. Douiba, S. Benkirane, A. Guezzaz, and M. Azrour, “An improved
survey,” 2019. anomaly detection model for iot security using decision tree and gradient
boosting,” The Journal of Supercomputing, vol. 79, no. 3, pp. 3392–
[11] J. Schmidhuber, “Deep learning in neural networks: An overview,”
3411, 2023.
Neural Networks, vol. 61, pp. 85–117, 2015.
[27] K. Hayat, T. Bakhshi, and B. Ghita, “Anomaly detection in encrypted
[12] I. Ullah and Q. H. Mahmoud, “Design and development of rnn anomaly internet traffic using hybrid deep learning,” Security and Communication
detection model for iot networks,” IEEE Access, vol. 10, pp. 62 722– Networks, vol. 2021, p. 5363750, 2021.
62 750, 2022.
[28] M. A. Ferrag, O. Friha, D. Hamouda, L. Maglaras, and H. Janicke,
[13] W. Wu, C. Song, J. Zhao, and Z. Xu, “Physics-informed gated recurrent “Edge-iiotset: A new comprehensive realistic cyber security dataset of
graph attention unit network for anomaly detection in industrial cyber- iot and iiot applications for centralized and federated learning,” IEEE
physical systems,” Information Sciences, vol. 629, pp. 618–633, 2023. Access, vol. 10, pp. 40 281–40 306, 2022.
[14] R. Kale, Z. Lu, K. W. Fok, and V. L. L. Thing, “A hybrid deep learning [29] Adeyemo Victor Elijah, Azween Abdullah, NZ JhanJhi, Mahadevan
anomaly detection framework for intrusion detection,” 2022, pp. 137– Supramaniam and Balogun Abdullateef O, “Ensemble and Deep-
142. Learning Methods for Two-Class and Multi-Attack Anomaly Intrusion
[15] L. Wen, X. Li, L. Gao, and Y. Zhang, “A new convolutional neural Detection: An Empirical Study” International Journal of Advanced
network-based data-driven fault diagnosis method,” IEEE Transactions Computer Science and Applications(IJACSA), 10(9), 2019.
on Industrial Electronics, vol. 65, no. 7, pp. 5990–5998, 2018. [Link]
[16] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural [30] Ghosh, G., Verma, S., Jhanjhi, N. Z., & Talib, M. N. (2020, December).
Computation, vol. 9, no. 8, pp. 1735–1780, 1997. Secure surveillance system using chaotic image encryption technique. In
[17] P. Malhotra, A. Ramakrishnan, G. Anand, L. Vig, P. Agarwal, and IOP conference series: materials science and engineering (Vol. 993, No.
G. Shroff, “Lstm-based encoder-decoder for multi-sensor anomaly de- 1, p. 012062). IOP Publishing.
tection,” 2016. [31] Almusaylim, Z. A., Zaman, N., & Jung, L. T. (2018, August). Proposing
[18] S. Chauhan and L. Vig, “Anomaly detection in ecg time signals via a data privacy aware protocol for roadside accident video reporting
deep long short-term memory networks,” in 2015 IEEE International service using 5G in Vehicular Cloud Networks Environment. In 2018
Conference on Data Science and Advanced Analytics (DSAA), 2015, 4th International conference on computer and information sciences
pp. 1–7. (ICCOINS) (pp. 1-5). IEEE.
[19] C. Zhou and R. C. Paffenroth, “Anomaly detection with robust deep [32] Shahid, H., Ashraf, H., Javed, H., Humayun, M., Jhanjhi, N. Z., &
autoencoders,” ser. KDD ’17. New York, NY, USA: Association for AlZain, M. A. (2021). Energy optimised security against wormhole
Computing Machinery, 2017, p. 665–674. attack in iot-based wireless sensor networks. Comput. Mater. Contin,
[20] D. Li, D. Chen, B. Jin, L. Shi, J. Goh, and S. Ng, “Madgan: Multivariate 68(2), 1967-81.
anomaly detection for time series data with generative adversarial [33] Sennan, S., Somula, R., Luhach, A. K., Deverajan, G. G., Alnumay, W.,
networks: 703–716,” 2019. Jhanjhi, N. Z., ... & Sharma, P. (2021). Energy efficient optimal parent
[21] F. De Vita, G. Nocera, D. Bruneo, and S. K. Das, “A novel echo state selection based routing protocol for Internet of Things using firefly
network autoencoder for anomaly detection in industrial iot systems,” optimization algorithm. Transactions on Emerging Telecommunications
IEEE Transactions on Industrial Informatics, vol. 19, no. 8, pp. 8985– Technologies, 32(8), e4171.
8994, 2023. [34] Hussain, S. J., Ahmed, U., Liaquat, H., Mir, S., Jhanjhi, N. Z., &
[22] F. Lin, C. Wang, B. Wang, H. Liu, and H. Qu, “Anomaly detection for Humayun, M. (2019, April). IMIAD: intelligent malware identification
industrial control system based on autoencoder neural network,” Wireless for android platform. In 2019 International Conference on Computer
Communications and Mobile Computing, vol. 2020, p. 8897926, 2020. and Information Sciences (ICCIS) (pp. 1-6). IEEE.
[23] C. Zhang, D. Song, Y. Chen, X. Feng, C. Lumezanu, W. Cheng, J. Ni,
B. Zong, H. Chen, and N. V. Chawla, “A deep neural network for
unsupervised anomaly detection and diagnosis in multivariate time series
data,” Proceedings of the AAAI Conference on Artificial Intelligence,
vol. 33, no. 01, pp. 1409–1416, Jul. 2019.
17

Prof. NZ Jhanjhi Prof. Dr. Noor Zaman Jhan-


jhi (N.Z Jhanjhi) stands as a distinguished senior
Professor, Academician, Researcher, and Scientist
in the field of Computer Science, specializing in
Bharath Reedy Konatham (Student Member, Cybersecurity. Currently holding the position of
IEEE) He received a Master of Science in Professor at the School of Computer Science at
Computer Science from Wright State University Taylor’s University, Malaysia, he brings a wealth
in the spring of 2023. His research interests of experience and expertise to the academic and
include cyber security in web applications, research landscape. As the Program Director for
Applications of ML in IoT, and Security of the Postgraduate Research Degree Programmes in
Automatic Vehicles. Computer Science and the Director of the Center for
Smart Society (CSS5), Prof. Jhanjhi has played a pivotal role in shaping the
educational and research landscape at Taylor’s University. His leadership has
been instrumental in fostering a dynamic environment conducive to cutting-
edge research and academic excellence. Prof. Jhanjhi’s global recognition
Tabassum Simara She is currently pursuing a as one of the world’s top 2underscores his exceptional contributions to the
master’s in computer engineering, at Wright field. In Malaysia, he ranks among the top five computer science researchers,
State University. Her research interests include earning him the title of an Outstanding Faculty Member by MDEC Malaysia
the appli- cation of federated learning in the in 2022. His commitment to advancing knowledge is reflected in his highly in-
cybersecurity of the Internet of Things, Machine dexed publications in prestigious journals such as SCIE/WoS/ISI/SCI/Scopus,
Learning, Embed- ded systems boasting a collective research impact factor exceeding 900 points. Prof. Jhanjhi
has made significant contributions to literature, editing/authoring over 45
research books published by esteemed publishers, including Springer, Taylor
and Francis, Wiley, Intech Open, IGI Global USA, among others. In addition
to his prolific publications, Prof. Jhanjhi has displayed outstanding mentorship,
supervising and co-supervising a notable number of postgraduate students.
Fathi Amsaad (Senior Member, IEEE) is an Assis- Over 37 scholars have successfully graduated under his guidance, a testament
tant Professor of Computer Science and Engineering to his commitment to nurturing the next generation of scholars. His expertise
at Wright State University, Dayton, Ohio, USA. He extends to serving as an external Ph.D./Master thesis examiner/evaluator
received the Bachelor’s degree in Computer Science for several universities worldwide, having evaluated over 60 theses. Prof.
from the University of Benghazi, Libya, in 2002. Jhanjhi’s editorial roles in reputable journals, including Associate Editor
He received a dual Master’s degree in Computer and Editorial Assistant positions for journals like PeerJ Computer Science,
Science/ Computer Engineering from the University CMC Computers, Materials Continua, CSSE, Frontier in Communication and
of Bridgeport, CT, USA, in 2011/ 2012. He received Networks, reflect his standing in the academic community. Notably, he has
a Ph.D. in Engineering with emphases in Computer been honored with the Outstanding Associate Editor award for IEEE ACCESS.
Science and Engineering from the University of Prof. Jhanjhi’s commitment to advancing research is evident in his successful
Toledo, OH, USA, in 2017. He has supervised or completion of more than 40 internationally funded research grants. As a
currently more than many graduate students, including Tabassum Simra. He sought-after keynote and invited speaker, he has shared his insights in over 60
established the Semiconductor Microelectronics Assurance, Resilience, and international conferences and has chaired various conference sessions. With
Trust (SMART) Cybersecurity Research Lab at 490 Joshi Research Center, a rich academic background, including accreditation experience in ABET,
Computer Science and Engineering Department, Wright State University. At NCAAA, and NCEAC for a decade.
the SMART Cybersecurity Research Lab, Dr. Amsaad leads a research team of
several graduate students (Master’s and Ph.D.), a Postdoctoral Researcher, and
a Research Assistant Professor. His research interests include Assured and
Trusted Digital Microelecnoces, Secure Heterogeneous Integration and
Advanced Packaging, Blockchain-enabled Federated Learning, IoT Hardware
Security, Machine/Deep Learning for Cybersecurity, AI Distributed Cloud
Computing, Secure AI Hardware Accelerators, and Resilient Circuit Design
(Memory/Microprocessor/ASICs/FPGAs). Both government and industry fund
Dr. Amsaad’s research, including AFRL, AFOSR, Intel, NSA, and the
Ohio Department of Education. He has participated in several collaborative
research proposals that have led to a cumulative sum of about $33 Million
(including all partners along with Wright State University). He has served as an
Organizer, Program Chair, Technical Program Committee member, Gust
Editor, and on the Reviewer Board for several international conferences and
journals. In addition to his research activities, Dr. Amsaad has established
teaching experience in hardware security, IoT and embedded systems security,
distributed computing, digital systems, network administration, and security
curriculum.

.
Mohamed I. Ibrahem received the B.S. and M.S.
degrees in Electrical Engineering (electronics and
communications) from Benha University, Cairo,
Egypt in 2014 and 2018, respectively, and the Ph.D.
degree in electrical and computer engineering from
Tennessee Tech. University, USA, in 2021. He is
an Assistant Professor at the School of Computer
and Cyber Sciences, Augusta University, USA. He
also holds the position of Assistant Professor at
Benha University, Egypt. Dr. Ibrahem received the
Eminence Award for the Doctor of Philosophy Best
Paper from Tennessee Tech. University, USA. His research interests include
machine learning, cryptography and network security, and privacy-preserving
schemes for smart grid communication and AMI networks.

You might also like