Paper 1
Paper 1
Abstract
The IIoT network involves smart sensors, actuators, and technologies extending IoT capabilities across industrial sectors. With
the rapid development in connected technology and communications in industrial applications, IIoT networks and devices are
increasingly integrated into less secure physical environments. Anomaly detection in IIoT is crucial for cybersecurity. This
paper proposes a novel anomaly detection model for IIoT systems, leveraging a hybrid deep learning (DL) model. The hybrid
DL approach combines Gated Recurrent Units (GRU) and Convolutional Neural Networks (CNN) for anomaly detection in IoT
edge computing. The proposed CNN+GRU model achieves a notable 94.94% accuracy, underscoring the importance of careful
model selection for IIoT anomaly detection. The paper suggests exploring XGBoost with hybrid CNN+GRU architectures as a
future direction for high accuracy in complex IIoT contexts. The Experimental results indicate a 96.41% accuracy, excelling in
metrics like false alarm rate (FAR), recall, precision, and F1score. Based on these findings, we recommend future researchers
consider advanced hybrid architectures and enhance efficiency using XGBoost with hybrid CNN+GRU. This approach holds
promise for significant contributions to IIoT systems’ security and Performance evolution.
1
1
Abstract—The IIoT network involves smart sensors, actuators, deep learning models to address the challenges of anomaly
and technologies extending IoT capabilities across industrial detection [10], [11].
sectors. With the rapid development in connected technology Figure 1 shows an overview of the industrial anomaly detec-
and communications in industrial applications, IIoT networks
and devices are increasingly integrated into less secure physical tion paradigm, wherein IoT device-generated data undergoes
environments. Anomaly detection in IIoT is crucial for cyberse- preprocessing and is subsequently input into pre-trained deep
curity. This paper proposes a novel anomaly detection model for learning models for anomaly identification. Recent research
IIoT systems, leveraging a hybrid deep learning (DL) model. The has directed attention toward harnessing deep learning mod-
hybrid DL approach combines Gated Recurrent Units (GRU) and els, encompassing Long Short-Term Memory (LSTM), Gated
Convolutional Neural Networks (CNN) for anomaly detection in
IoT edge computing. The proposed CNN+GRU model achieves a Recurrent Units (GRU), and Convolutional Neural Networks
notable 94.94% accuracy, underscoring the importance of careful (CNN) to elevate the accuracy and efficiency of anomaly
model selection for IIoT anomaly detection. The paper suggests detection. These models exhibit remarkable adeptness in cap-
exploring XGBoost with hybrid CNN+GRU architectures as a turing intricate patterns and temporal correlations within data,
future direction for high accuracy in complex IIoT contexts. The thus augmenting the efficacy of anomaly identification [12].
Experimental results indicate a 96.41% accuracy, excelling in
metrics like false alarm rate (FAR), recall, precision, and F1- This study undertakes a comprehensive evaluation of well-
score. Based on these findings, we recommend future researchers established deep learning models, including CNN, GRU,
consider advanced hybrid architectures and enhance efficiency LSTM, and Hybrid models such as CNN+GRU, Autoen-
using XGBoost with hybrid CNN+GRU. This approach holds coder+CNN, Autoencoder+LSTM, Autoencoder+GRU, along-
promise for significant contributions to IIoT systems’ security
side the potent gradient boosting algorithm XGBoost to ascer-
and Performance evolution .
tain their proficiency in detecting anomalies within industrial
Index Terms—Cybersecurity, Anomaly Detection, Industrial IoT systems [13], [14].
Internet of Things (IIoT), Edge Computing, Deep Learning. Anomaly detection is vital across diverse domains, in-
cluding network security, finance, and industrial systems, to
identify deviations from expected patterns or abnormal be-
I. INTRODUCTION
havior. Within the context of edge-based Internet of Things
Anomaly detection plays a crucial role across various (IoT) systems, anomalies manifest as unexpected or irregular
domains, including network security, financial systems, and patterns within data collected from edge devices. Detecting
industrial operations [1]. Its primary objective is to identify un- these anomalies is pivotal for preserving system reliability,
expected or abnormal behavior that deviates from established security, and optimal Performance.
patterns, facilitating prompt intervention and the maintenance In response to the challenges posed by growing data com-
of system integrity. As the digital landscape becomes in- plexity, deep learning has emerged as a cornerstone of research
creasingly data-rich, traditional rule-based and statistical meth- in IoT systems; with the capacity to handle extensive datasets
ods [2] face challenges in effectively uncovering anomalies. and capture intricate patterns, deep learning methods are well-
The manifestation of anomalies in edge IIoT data refers to suited for analysis within IoT contexts. As these methods
unforeseen or irregular patterns observed within data collected generate data representations and integrate seamlessly into IoT
from edge devices. Detecting such anomalies in edge IoT ecosystems, they offer promising avenues for anomaly detec-
data is paramount for ensuring system reliability, security, and tion. This study delves into anomaly detection in Industrial IoT
optimal performance [3]. (IIoT) systems by evaluating a range of deep learning models,
The dynamic scale of data generation in the digital era GRU, LSTM, and CNN, as hybrid ML model variations. By
has propelled the ascendancy of deep learning in IoT sys- comparing the Performance of standalone models and hybrid
tems [4]–[7]. Deep learning’s ability to handle extensive combinations, we aim to uncover their strengths, limitations,
datasets surpasses conventional machine learning techniques, and capabilities in anomaly detection within IIoT data.
rendering it apt for analysis within IoT contexts. Its capacity This paper contributes to advancing edge IIoT security and
to dynamically generate data representations [8] and seamless anomaly detection knowledge. We provide insights into their
integration with IoT ecosystems [9] positions it as a valuable Performance and applicability through rigorous evaluation
asset. Consider a smart home scenario wherein IoT devices and comparison of various deep learning models. Our study
autonomously interact, birthing a fully intelligent dwelling [4]. enhances the existing body of knowledge by illuminating the
This synergy has prompted researchers to explore advanced strengths and weaknesses of CNN, GRU, CNN+GRU, and
2
IIoT Pakets
1 2 3 4
LSTM models. It offers a comprehensive understanding of DL- learning, specifically a CNN model, has been presented in
based techniques for IIoT anomaly detection. Moreover, this previous work [15]. The outcomes of this approach reveal its
research lays the groundwork for future investigations into ad- effectiveness in adequately addressing local and global pat-
vanced techniques and hybrid models. These models leverage terns within time series data. This capability enables efficient
the diverse strengths of different deep learning architectures, analysis and anomaly detection across a spectrum of IIoT
potentially leading to more effective solutions for anomaly applications.
detection in IIoT systems. A Long Short-Term Memory (LSTM) ML approach is
The next sections of the paper are organized as follows. Sec- proposed to address the training challenges in traditional Re-
tion 2 summarizes the existing research in anomaly detection current Neural Networks (RNNs) [16]. The proposed LSTM
in edge IIoT systems, focusing on the models under evaluation approach demonstrated higher Performance in the experiments
in our study. We highlight the strengths and limitations of than other recurrent network algorithms, such as real-time re-
these approaches to contextualize our work. Section 3 outlines current learning, back-propagation through time, and recurrent
the specific contributions our research offers, detailing the cascade correlation. As an enhanced LSTM-based approach,
novel aspects and insights derived from our comprehensive a new technique known as Encoder-Decoder architecture is
evaluation of deep learning models. Section 4 explains our introduced for anomaly detection in time series data, present-
research methodology, encompassing the dataset used, the se- ing [17]. An Electrocardiogram (ECG), leveraging DL-based
lection and configuration of models, and the evaluation metrics Short-Term Memory (DLSTM) networks, is proposed [18].
employed to assess their Performance. Section 5 presents our This research highlights the effectiveness of LSTM networks
experimental setup, results, and an in-depth analysis of each in anomaly detection within time series data, demonstrating
model’s Performance. This section sheds light on the compar- adaptability across diverse domains and datasets.
ative efficacy of the evaluated models. Section 6 summarizes Another robust approach for anomaly detection is discussed
our findings, discusses the implications of our research, and in [19]. This method incorporates complete principal com-
provides recommendations for future investigations, paving the ponent analysis (PCA) into training deep autoencoders to
way for advancements in IIoT anomaly detection. It showcases enhance anomaly detection. This integration improves the
the potential of DL techniques for detecting IIoT Anomalies, model’s resilience to outliers and Performance in detecting
offering a solid foundation for further progress in this crucial anomalies. An enhanced research effort proposes a multivari-
domain. ate anomaly detection technique utilizing Generative Adver-
sarial Networks (GANs) and Gated Recurrent Units (GRUs)
II. RELATED WORK in their MAD-GAN framework, as outlined in [20]. This
Anomaly detection in edge IIoT (Industrial Internet of innovative approach combines the strengths of GANs and
Things) systems has garnered considerable interest among GRUs to effectively learn the underlying structure of time
researchers due to the increasing need to guarantee the depend- series data and accurately detect anomalies.
ability and safety of these systems. Researchers have investi- A new IIoT anomaly detection model, ESN-AE (Echo State
gated numerous ML-based methods to address this challenge. Network - Autoencoder), is introduced in recent research [21].
In this section, we examine existing research on detecting As highlighted in the documentation, the ESN-AE effectively
anomalies in edge IIoT systems, particularly emphasizing the combines neural networks with Echo State Networks (ESNs),
models assessed in this paper. making it particularly suitable for edge devices with resource
ML-based techniques employing CNNs find widespread constraints. Additionally, a composite autoencoder model tai-
application in anomaly detection within IIoT systems, offer- lored for anomaly detection in IIoT systems is put forward in
ing improved capabilities to capture spatial features in IoT another study [22]. Diverging from conventional autoencoders,
datasets. A data-driven fault diagnosis approach utilizing deep this model predicts and concurrently reconstructs input data,
3
world IIoT scenarios, where the accurate and timely space’s complexity while retaining important in-
detection of anomalies is paramount for maintaining formation. Dimension reduction not only conserves
system integrity and security. By addressing the pressing computational resources but also aids in mitigating
need for effective intrusion detection in Edge IoT, our the curse of dimensionality, which is particularly
research contributes to the advancement of IIoT security, pertinent in IoT data.
making it highly relevant in today’s evolving technolog- 2) The Proposed Hybrid Convolutional Neural Network
ical landscape. (CNN) and Gated Recurrent Units (GRU) Architecture:
C. Methodology or Approach a) To capture both spatial and temporal patterns
within the data, we designed a novel hybrid ar-
Our methodology encompasses a carefully crafted pipeline
chitecture that combines CNN and GRU.
that considers Edge IoT datasets’ specific challenges and
b) CNN Component: The CNN component focuses
constraints. It leverages a hybrid security approach to ex-
on spatial feature extraction. It excels at detecting
tract spatial and temporal patterns efficiently. The extensive
patterns and features within the data invariant to
experimentation and use of performance metrics ensure a
translation, essential for capturing spatial charac-
thorough evaluation of the model’s capabilities. Additionally,
teristics in IoT sensor data.
the comparison with XGBoost provides valuable insights into
c) GRU Component: The GRU component, on the
our approach’s novel contributions and potential advantages
other hand, specializes in analyzing temporal se-
in anomaly detection within Edge IoT environments. The
quences. It is well-suited for capturing time-
following are the main points of the followed methodology:
dependent patterns and behaviors in IoT data,
1) Data Preprocessing:
which is crucial for understanding the dynamics
a) Feature Extraction: Data preprocessing is a crucial of IoT environments.
step in any machine learning task. In our study, we
performed data preprocessing specifically tailored 3) Model Training and Evaluation:
for Edge IoT datasets. This involved the extraction a) Extensive Experimentation: We conducted a rig-
of relevant features from the raw data. Given the orous experimental phase involving diverse attack
resource-constrained nature of Edge devices, we scenarios to ensure that the DL model Performance
focused on extracting features that are essential for is thoroughly evaluated under various real-world
anomaly detection, ensuring efficiency and effec- conditions and could effectively detect a wide
tiveness. range of potential threats.
b) Dimension Reduction: To further optimize the b) ML Metrics: To assess the quality of our hybrid
model for resource-constrained Edge IoT environ- model, we used a comprehensive set of perfor-
ments, we employed dimension reduction tech- mance metrics. These included accuracy, recall,
niques. These techniques help reduce the feature precision, and F1-score. These performance mea-
5
sures are needed to test the capabilities of identi- impact extends to a wide range of industry appli-
fying anomalies and minimizing false positives. cations.
4) Comparison with XGBoost: b) For industries reliant on Edge IoT, such as man-
a) As part of our methodology, we compared the ufacturing, healthcare, and utilities, this research
Performance of our hybrid CNN+GRU model with provides a reliable and versatile tool for early
that of the gradient-boosting algorithm XGBoost. detection and prevention of cyber intrusions in
b) This comparative analysis aimed to identify the many industry applications where any disruption
strengths and weaknesses of the hybrid model can have far-reaching consequences, including fi-
concerning anomaly detection. By contrasting it nancial losses and threats to public safety.
with XGBoost, a well-established and widely-used 4) Mitigating Evolving Security Threats:
machine learning algorithm, we gained insights a) The constantly evolving nature of cybersecurity
into the unique advantages our hybrid approach threats necessitates adaptable and robust solutions.
brings to the table. Our hybrid CNN+GRU model is well-suited to this
dynamic landscape.
D. Impact and Significance b) By continuously improving the accuracy and effec-
The significance of this research lies in its pioneering tiveness of anomaly detection in Edge IoT envi-
approach to Edge IoT anomaly detection through the develop- ronments, this research contributes to the ongoing
ment of a hybrid CNN+GRU model. Its potential to enhance battle against cyber threats. It empowers organiza-
cybersecurity in real-world deployments, its adaptability to tions to stay ahead of adversaries and proactively
evolving threats, and its practical applicability across indus- protect their critical systems and sensitive data.
tries underscore the far-reaching impact of this work. It serves
as a beacon of innovation in IoT security, providing a valu-
E. Future Directions
able asset for safeguarding our modern world’s increasingly
interconnected and vital systems. Future directions of this work could explore the applica-
1) Innovative Hybrid CNN+GRU Model: bility of more advanced deep learning architectures, such as
Transformers, to capture complex temporal relationships and
a) At the heart of this research lies developing a
patterns in Edge IoT data. Investigating ensemble techniques
hybrid CNN+GRU model specifically tailored for
that combine multiple models could enhance overall anomaly
Edge IoT anomaly detection. This model represents
detection robustness.
a novel fusion of deep learning techniques, com-
bining (CNNs) for spatial feature extraction and
(GRUs) for temporal sequence analysis. IV. METHODOLOGY AND EXPERIMENTAL SETUP
b) The significance of this innovation cannot be over- A. Dataset Description
stated. Edge IoT environments often present com-
This study employed a comprehensive dataset to detect
plex, heterogeneous data streams that require a
anomalies within Industrial Internet of Things (IIoT) networks,
multifaceted approach for accurate anomaly de-
as documented in [28]. This dataset encompasses a wide array
tection. Our model addresses this challenge head-
of network traffic data, containing regular traffic and various
on by seamlessly integrating spatial and temporal
attack scenarios such as Port Scanning, XSS, Ransomware,
analysis, offering a more holistic understanding of
Fingerprinting, and MITM. Data samples are collected from a
the data.
real-world industrial environment, featuring multiple devices
2) Enhanced Cybersecurity in Real-World IoT Deploy- and communication protocols commonly encountered in IIoT
ments: networks to ensure the representativeness and reliability of the
a) One of the most striking outcomes of this research data. Through the utilization of this extensive dataset, we were
is the model’s ability to accurately detect com- able to effectively test the quality of different DL models for
mon attack types and various novel and evolving detecting and mitigating cybersecurity threats within the IIoT
threats. Our proposed solutions aim to advance the domain.[29-34].
knowledge and have implications for enhancing The dataset comprises 2,219,201 instances and 63 features,
cybersecurity in real-world IoT deployments. all meticulously collected to investigate and analyze cyberse-
b) As IoT continues to increase across industries, the curity threats within edge computing for IIoT applications.
security of these interconnected systems becomes It encompasses a wide array of information, including at-
increasingly critical. Our model’s capacity to iden- tributes related to network traffic, protocol-specific parame-
tify emerging threats, combined with its ability ters, and various attack types. These features exhibit diverse
to distinguish regular traffic, offers a formidable data types, encompassing numerical (float64) and categori-
defense mechanism for safeguarding these deploy- cal (object) data. Key features include network communi-
ments. cation attributes like IP addresses ([Link] host, [Link] host),
3) Practical Relevance and Industry Applications: ARP protocol details ([Link], [Link]), ICMP protocol
a) Beyond academic achievement, the practical rel- characteristics ([Link], [Link] le), HTTP protocol
evance of this research cannot be overstated. Its fields ([Link] length, [Link], [Link]),
6
and TCP/UDP protocol properties ([Link], [Link], [Link], Algorithm 1: Proposed Hybrid Deep Learning Archi-
[Link]). tecture
Additionally, the dataset contains features associated Define the Convolutional Neural Network (CNN) model
with domain name system (DNS) queries ([Link], cnn input ← Input(shape = input shape)
[Link]) and the MQTT protocol, Message Queu- cnn layer ← Conv1D(′relu′)(cnn input)
ing Telemetry Transport ([Link], [Link], cnn layer ← MaxPooling1D
[Link]). The dataset’s target variable, labeled ”At- Define the Gated Recurrent Unit (GRU) model
tack type,” is a categorical attribute that represents 15 distinct gru input ← Input(shape = input shape)
classes of cybersecurity threats. These classes encompass var- gru layer ← GRU (′tanh′)(gru input)
ious threats, including Distributed Denial of Service (DDoS), Concatenate the outputs of the CNN and GRU models
ransomware, man-in-the-middle (MITM) attacks, and port concat layer ← concatenate([cnn layer, gru layer])
scanning. Before analysis, the dataset undergoes preprocessing Classification layer
steps, including eliminating unnecessary columns, addressing output layer ←
missing and duplicate values, and randomizing the data order. Dense(num classes,′ softmax′)(concat layer)
Figure 2 provides a detailed breakdown of the attack types and Combined CNN and GRU Model
the corresponding number of instances for each attack class model ← Model(inputs =
before applying oversampling techniques. [cnn input, gru input], outputs = output layer)
For data transformation, categorical variables are subjected
to one-hot encoding, while the target variable undergoes label
encoding. To tackle the class imbalance issue, we employ the use filters to identify structures and patterns from datasets,
RandomOverSampler method, which involves oversampling enabling the model to acquire meaningful representations. The
the minority classes. This technique generates synthetic in- proposed model adeptly captures hierarchical representations
stances for the underrepresented classes to match the sample of the input by stacking multiple layers of the DL network
count of the majority class. The dataset attains a more balanced with increasing filter sizes.
distribution by introducing additional instances, allowing ma- Conversely, GRUs, recurrent neural networks (RNN), in-
chine learning models to gain insights from a broader range corporate gating mechanisms that selectively update and reset
of instances. their internal states dedicated to modeling data dependencies.
We utilize the RandomOverSampler from the learning li- This functionality allows the model to retain and propagate
brary to implement random oversampling. The rationale be- crucial information across time steps, capturing long-term
hind this approach is to provide the model with a more rep- dependencies in the sequence. The GRU layer within the
resentative view of the minority classes, facilitating enhanced model utilizes these gating mechanisms to learn and represent
anomaly detection within these less frequent categories. The temporal patterns within the data.
introduction of synthetic instances enables the model to cap- Combining CNN and GRU models through concatenating
ture the distinctive patterns and characteristics specific to the their output layers permits the fusion of both spatial and
minority classes, leading to improved overall Performance temporal features. This fusion affords comprehensive data
and accuracy. However, it’s crucial to exercise caution when comprehension, enabling the model to make precise predic-
employing oversampling techniques, including random ones, tions. Algorithm 1 provides a high-level overview of the
as they must be carefully evaluated to prevent potential issues model. By harnessing the complementary strengths of CNNs
like overfitting or introducing biases. Alternative methods may and GRUs, the CNN+GRU architecture strikes a balance be-
need to be considered depending on the dataset’s specific tween capturing local spatial features and modeling temporal
characteristics and research objectives. dynamics.
Following oversampling, our dataset consists of two main Through rigorous experimentation and evaluation, our work
parts: (training and testing sets). Subsequently, feature scaling has substantiated the efficacy of the CNN+GRU model. It has
is performed using MinMaxScaler, and the input data and consistently performed with high accuracy, precision, recall,
target variables are reshaped to meet the prerequisites of deep and F1 score in detecting anomalies within the industrial IoT
learning models. dataset. The ability to discern intricate spatial and temporal
patterns empowers it to accurately identify abnormal instances,
B. Proposed Hybrid CNN+GRU model: facilitating proactive security measures in industrial IoT sys-
tems. Below, we provide pseudocode, and Figure 3 illustrates
Our deep learning (DL) model leverages the strengths the architecture of this hybrid model.
of both Convolutional Neural Networks (CNNs) and Gated
Recurrent Units (GRUs) to excel in IIoT anomaly detection.
This architectural fusion effectively captures inherent spatial C. Model Descriptions
and temporal information in datasets suitable for analyzing 1) 1D-CNN Model Overview: The 1D-CNN model pro-
intricate sequences, as encountered in industrial IoT applica- posed in this study uses the Keras Sequential API. It comprises
tions. multiple layers designed to learn using valuable features from
In the CNN segment of the model, convolutional layers our dataset and generate predictions. This model commences
are employed to extract features from datasets. These layers with an input layer configured to accommodate data in a
7
CNN Layers
5 CNN1D, 5 Maxpool layers
Input Hidden
Dense layer
layers layers
2 GRU Layers
Fig. 3: The Architecture of the Proposed CNN+GRU Hybrid Deep Learning Model
CNN
CNN1D CNN1D
Sampling Maxpool layer Upsampling layers
Preprocessing
Data
Train
GRU
Test
Compressed
Encoder Representation Decoder
LSTM
shape aligned with the dataset’s dimensions. Subsequently, of tanh and sigmoid activation functions to update and reset
five Conv1D layers are added, each equipped with a distinct gates. The model also integrates two dense layers, employing
number of filters and a ReLU activation function. These layers ReLU activation functions, with 32 and 16 units, respectively.
are instrumental in extracting information using the dataset via The ultimate external layer uses a softmax activation function
filtering. Following each CNN layer, a MaxPooling1D layer with a number of units aligned with the dataset’s class count.
with a size of 2 is applied to mitigate data dimensionality This GRU-based model effectively captures short- and long-
and capture significant features. Subsequently, a flattened term dependencies within sequential data, which is well-suited
layer converts the feature maps into a one-dimensional vector. for diverse classification tasks.
Furthermore, the model incorporates two dense layers: the first 3) Overview of Hybrid CNN-GRU Model: The proposed
dense layer, comprising 64 neurons and a ReLU activation model represents a hybrid neural network, integrating Convo-
function, focuses on learning global features, while the second lutional Neural Networks (CNNs) and Gated Recurrent Units
dense layer, featuring the same number of neurons as the (GRUs) to proficiently process and learn from sequential data.
dataset’s class count, utilizes a softmax activation function to Constructed using the Keras Functional API, it is designed
yield class probabilities, enabling the final prediction. with two distinct branches: the CNN branch and the GRU
2) GRU Model: The proposed GRU model is constructed branch. The model’s architectural framework is outlined as
using the Keras Sequential API. It encompasses two GRU follows:
layers, with the initial layer comprising 32 units and the sub- CNN Branch:
sequent layer comprising 64. These GRU layers utilize a blend 1) Input Layer: The model ingests data with dimensions
8
aligned to the dataset’s structure. data and reduce its dimensionality. The decoder section uses
2) Convolutional Layers: The CNN branch comprises two upsampling and convolutional layers to reconstruct the original
convolutional layers, one with 64 and 128 filters, each input from the encoded representation. Figure 4 provides an
utilizing a ReLU activation function. architectural overview of the autoencoder models.
3) MaxPooling Layers: Max-pooling layers with a pool size Model 1:
of 2 are strategically placed between the convolutional Encoder:
layers to reduce spatial dimensions and enhance com- 1) Input Layer: The model takes input dimensions aligned
putational efficiency. with the dataset.
4) Flatten Layer: Following the final max-pooling layer, a 2) Convolutional Layers: Three convolutional layers with
flattened layer converts the 3D output from the convo- 32, 64, and 128 filters are used, each with a kernel size
lutional layers into a 1D vector. 3 and ReLU activation.
5) Dense Layer: The concluding layer within the CNN 3) MaxPooling Layers: Max-pooling layers with a pool size
branch is a dense layer with 64 units and a ReLU of 2 are placed between convolutional layers to reduce
activation function, enabling the model to grasp higher- dimensions.
level features derived from the spatial data. Decoder:
GRU Branch: 1) Convolutional Layers: The decoder comprises three con-
1) Input Layer: Similar to the CNN branch, the GRU volutional layers with 128, 64, and 32 filters, using
branch’s input layer accommodates data of the same ReLU activation.
dimensions. 2) UpSampling Layers: Upsampling layers with a size of
2) GRU Layer: The GRU layer, featuring 32 units, employs 2 restore spatial dimensions.
a ’tanh’ activation function for gate updates and a Autoencoder: The autoencoder combines the encoder and
’sigmoid’ activation function for reset gates. Recurrent decoder models with the same input as the encoder and output
dropout is disabled (set to 0), and the layer avoids from the decoder.
unrolling the recurrent loop for efficiency. Bias terms are Classifier:
integrated into the update and reset gate computations, 1) The autoencoder serves as the initial classifier layer.
and the hidden states reset after each sequence. 2) Conv1D and MaxPooling1D layers perform feature ex-
3) Dense Layer: After the GRU layer, a dense layer with traction and dimensionality reduction.
32 units and a ’tanh’ activation function is added, 3) The last MaxPooling1D output is flattened.
facilitating the model in discerning intricate patterns and 4) Dense layers with ReLU activation process features.
features from temporal data. 5) A final Dense layer with softmax activation provides
Integration of Branches: Upon processing input data class probabilities.
through the CNN and GRU branches, the outputs undergo con- Model 2:
catenation using the concatenate layer. This amalgamated rep- Classifier:
resentation encompasses spatial and temporal features gleaned 1) LSTM layers replace Conv1D and MaxPooling1D layers
from both branches, enriching the final decision-making pro- for sequence processing.
cess. 2) LSTM layers return sequences, extracting features from
Output Layer: The ultimate layer consists of a dense them.
layer with (num classes) units and a softmax activation func- 3) A Dense layer with softmax activation is used for
tion. The softmax function furnishes a probability distribution classification.
across classes, enabling the model to make a final prediction Model 3:
based on the highest probability. Figure 1 illustrates the Classifier:
structure of the unified CNN-GRU model.
1) Instead of Conv1D and MaxPooling1D layers, a GRU
(Gated Recurrent Unit) layer is employed for sequence
D. Autoencoder Models Overview processing.
Hybrid Models: The proposed models are hybrid neural 2) The GRU layer comprises 32 units and utilizes ’tanh’
networks combining an autoencoder with CNN, LSTM, and activation for gate updates and ’sigmoid’ activation for
GRU networks to process and learn from input data efficiently. reset gates.
Each model is designed using the Keras Functional API and 3) A Dense layer with ’tanh’ activation further processes
consists of two primary components: the encoder-decoder features.
(autoencoder) module and the CNN, LSTM, or GRU mod- 4) The final Dense layer with softmax activation provides
ules. The employed autoencoder type is a basic convolutional a class probability distribution for classification.
autoencoder, utilizing convolutional layers for both encoding 1) Overview of the LSTM Model: The proposed model is
and decoding. These layers excel in capturing spatial patterns an LSTM-based classifier designed specifically for time series
and features in the input data. classification tasks. This architecture comprises a sequence of
Encoder-Decoder (Autoencoder) Module: The encoder LSTM layers followed by dense layers for the classification
section of the autoencoder employs convolutional layers with task. The key components of the model’s architecture are
decreasing filters to extract essential features from the input outlined below:
9
they achieved commendable values for precision (98.42%) 92.28% , highlighting its proficiency in accurately identifying
and recall (98.78%), signifying outstanding Performance in anomalies, particularly in the nuanced context of IoT data.
intrusion classification. Additionally, a precision rate of 98.49% and an F1-score of
Comparing the two models, it becomes evident that Cat- 95.24% further underscore its capacity to categorize anoma-
boost and XGBoost attained impressive accuracy rates and lies while effectively minimizing false positives. Moreover,
excelled in classifying intrusions. The Catboost model reported the meager False Alarm Rate (FAR) of 0.001% signifies
a slightly higher accuracy during training, but both models the model’s skill in avoiding unnecessary alerts, a critical
exhibited robust precision and recall scores. It is imperative characteristic in practical applications. Integrating spatial and
to consider a variety of evaluation metrics, assess potential temporal features through the incorporation of CNN and GRU
overfitting, and analyze the problem context before concluding layers plays a pivotal role in the exceptional Performance of
that high accuracy alone signifies a superior model. this model. The CNN+GRU model demonstrates the synergy
In our comprehensive assessment of various machine learn- between these two architectural components and showcases
ing models designed for IoT security anomaly detection, the its adaptability to intricate, multi-dimensional datasets such
XGBoost algorithm leads the way, closely followed by the as IoT traffic. These performance metrics firmly establish
impressive CNN+GRU model. The CNN+GRU model, which the CNN+GRU model as a potent tool for fortifying IoT
combines CNN GRU, stands out in multiple aspects. Notably, environments against cyber threats.
it demonstrates an accuracy rate of 94.94% and a recall rate of Shifting the focus to the other models, namely CNN, GRU,
11
Fig. 21: Confusion matrix for CNN+GRU model Fig. 22: Confusion Matrix of XGboost model
Such a finding suggests that the CNN model efficiently learned the combined complexity of both architectures and the model’s
features from the dataset and quickly achieved convergence. need to extract spatial and temporal features simultaneously.
The model’s ability to capture spatial information through Nevertheless, despite the longer convergence duration, the
convolutional layers and its simplicity contributed to its swift hybrid model demonstrated superior Performance, highlighting
convergence. the effectiveness of integrating both CNN and GRU.
Likewise, the GRU model (Figure 7 and Figure 8) shows Conversely, the LSTM model follows a different trajectory,
an overall decreasing loss trend. However, it’s worth noting as shown in Figure 11 and Figure 12. While the Loss decreases
some training and validation accuracy fluctuations during the gradually over time, there is a notable dip in accuracy during
training process. Despite these fluctuations, the GRU model the initial epochs. This pattern may signify a slower conver-
exhibited efficient convergence, with an average per-epoch gence rate or difficulties in capturing temporal dependencies
runtime of 35 minutes and 17 training epochs. The GRU in the data. The LSTM model exhibited a slightly slower
architecture, belonging to the family of recurrent neural net- convergence rate than CNN and GRU, necessitating an average
works (RNNs), excels at capturing temporal dependencies in per-epoch runtime of 36 minutes and 14 training epochs
sequential data. The model successfully learned the tempo- to reach convergence. The LSTM’s proficiency in modeling
ral patterns in encrypted IoT traffic, leading to convergence long-term dependencies makes it suitable for handling intri-
within a reasonable number of epochs. The observed accuracy cate sequential data. However, the added complexity of the
fluctuations may indicate the model’s sensitivity to specific LSTM architecture and the extended sequence length present
data patterns. The CNN+GRU model, as depicted in Figure in encrypted IoT traffic likely contributed to the extended
9 and Figure 10, displays a consistently decreasing loss convergence duration.
curve paired with a corresponding increase in accuracy. This The Autoencoder-based models, namely Autoencoder+CNN
behavior indicates a steady convergence towards the optimal and Autoencoder+GRU, displayed distinctive patterns in their
solution, underscoring the efficacy of merging CNN and GRU training trajectories. As depicted in Figure 13 and Figure
architectures for anomaly detection. The CNN+GRU hybrid 14, the Autoencoder+CNN model initially exhibited a gradual
model, leveraging the strengths of both architectures, exhib- reduction in Loss over a few epochs, followed by a sharp
ited a more extended convergence time than the individual decline, and eventually settled into a gradual decrease until
models. It took 97 minutes and 73 training epochs to achieve convergence. Conversely, Figure 15 and Figure 16 demonstrate
convergence. The ample convergence time can be attributed to that the Autoencoder+GRU model maintained a consistent loss
15
.
Mohamed I. Ibrahem received the B.S. and M.S.
degrees in Electrical Engineering (electronics and
communications) from Benha University, Cairo,
Egypt in 2014 and 2018, respectively, and the Ph.D.
degree in electrical and computer engineering from
Tennessee Tech. University, USA, in 2021. He is
an Assistant Professor at the School of Computer
and Cyber Sciences, Augusta University, USA. He
also holds the position of Assistant Professor at
Benha University, Egypt. Dr. Ibrahem received the
Eminence Award for the Doctor of Philosophy Best
Paper from Tennessee Tech. University, USA. His research interests include
machine learning, cryptography and network security, and privacy-preserving
schemes for smart grid communication and AMI networks.