Cơ sở lý thuyết
Cơ sở lý thuyết
Vakula Rani J
Department of Computer Application, CMR Institute of Technology (Affiliated to Visvesvaraya Technological
University), Bangalore, India.
[email protected]
Received: 22 June 2024 / Revised: 05 October 2024 / Accepted: 15 October 2024 / Published: 30 October 2024
Abstract – The Internet of Things (IoT) represents a network of produced by IoT platforms require secure transmission and
interconnected gadgets, enabled by technology facilitating analysis to prevent privacy breaches. Despite its many
seamless communication between gadgets and the cloud. The benefits, IoT introduces security challenges due to its
adoption of IoT and its unique features expose these systems and unsupervised operation, reliance on wireless networks, and
devices to various intrusions. Traditional security methods are
inadequate to secure IoT and requires to reevaluate the existing
inability to support complex security systems. To address
security protocols. While IoT devices come with built-in security these challenges requires comprehensive strategies that
features such as encryption and authentication, they require account for the unique requirements of IoT environments.
more advanced techniques to ensure robust system protection. Modifications to current security frameworks for information
Machine learning has emerged as a vital tool in enhancing IoT and wireless networks are essential to develop robust IoT
security, proving effective in mitigating cybersecurity risks and security solutions that accommodate the global accessibility,
improving the intelligence of security systems. This survey resource limitations, and lossy network characteristics of IoT.
provides a comprehensive overview of IoT systems, with a focus Traditional defense mechanisms such as encryption,
on their security aspects, including features, architectures, authentication, access control, network security, and
protocols, and associated risks. It also highlights recent
algorithmic advancements, emphasizing the pivotal role of ML
application security face limitations and are often inadequate
in strengthening IoT security. Furthermore, it categorizes for IoT systems.
attacks on IoT systems, offering a systematic understanding of However, these security mechanisms can be enhanced to
vulnerabilities, and identifies relevant datasets to support future
satisfy the distinct needs of the IoT ecosystem. Advanced
research efforts.
techniques, such as ML and DL, can be utilized for data
Index Terms – IoT Security, Machine Learning (ML), Deep analysis, enabling the identification of normal and abnormal
Learning (DL), IoT Applications, Security, Attacks, Datasets, behaviors based on interactions among IoT devices. By
Cyber-Attacks, Challenges, IoT Layers. leveraging data from IoT components, it becomes possible to
1. INTRODUCTION detect malicious behavior early by analyzing typical
interaction patterns.
The Internet of Things (IoT) consists of inter-connected
physical objects that communicate through software, sensors, The motivation behind this survey is to furnish academicians
and network connectivity to share and collect data. Its primary and researchers with an extensive understanding of how ML
objective is to enable autonomous interaction between DL methodologies can address security challenges in IoT
devices, creating a smart, interconnected environment that environments, particularly focusing on mitigating attacks.
profoundly impacts people's lives. IoT is applied in various These techniques play a vital role in forecasting new attacks
fields, including intelligent homes, autonomous vehicles, gene by analyzing patterns from previous ones, thereby aiding in
therapy, and medical advancements. However, its inherent the detection of unknown threats. Furthermore, recent
characteristics also pose significant security and privacy literature lacks a thorough examination of the capabilities of
challenges, making IoT systems vulnerable to attacks such as ML and DL in securing IoT systems, especially in handling
impersonation and intrusion. The enormous amount of data emerging threats and scaling to real-world applications. This
paper aims to fill that gap by systematically reviewing recent IoT security. Figure 1 depicts the crucial significance of
advancements, applications, and limitations of ML and DL in ML/DL on the IoT environment.
• A comprehensive study of ML and DL methods for IoT efficient designs to extend device operation without frequent
security, covering their advantages, disadvantages, recharging.
solutions to security challenges, and applications.
Real Time Operations: The capability for processing and
• An analysis of current surveys on ML/DL, categorizing responding in real-time is vital for IoT applications, whether
research papers from 2018 to 2024. for managing autonomous vehicles or monitoring vital
infrastructure.
• An In-depth classification of attacks on IoT layers,
including principles, weaknesses, and objectives of each Network Connectivity: As the quantity of IoT gadgets
layer. increases, maintaining connectivity becomes more
challenging. Solutions such as cloud services and gateways
• An evaluation of diverse datasets in IoT security, help optimize network performance.
providing insights into their benefits and drawbacks.
Remote monitoring and Control: A key advantage of IoT is
• A presentation of possible research challenges in ML/DL the capacity to remotely monitor and control devices. Users
for IoT security, along with discussion on future trends. can access and manage IoT devices from any location with an
The remainder of the survey is organized as follows: Section internet access, providing comfort and flexibility.
2 provides a brief overview of the IoT system. Section 3 Cost-Effectiveness: As IoT adoption increases, the cost of IoT
reviews ML and DL methods, while Section 4 analyzes gadgets and associated technologies has decreased. Cost-
existing surveys on ML and DL by examining studies from effectiveness is a vital factor in the extensive adoption of IoT
2018 to the present. Section 5 discuses and emphasizes the across various industries and applications. These
classification of attacks on IoT layers. Section 6 introduces characteristics form the foundation for addressing security
datasets in the IoT system, and Section 7 discusses research concerns and designing effective IoT systems.
challenges, future trends, and related discussions on ML/DL.
Finally, Section 8 concludes the paper. 2.2. IoT Architecture and Protocols
2. OVERVIEW OF THE IOT SYSTEM IoT architecture indicates to the framework that defines the
interactions and relationships between the various
This section furnishes an overview of the IoT systems, components of an IoT system. It includes devices (things),
covering the characteristics, architecture, protocols, and communication protocols, cloud services, and applications
vulnerabilities that raise significant security concerns. that work together to gather, process, and act upon data.
2.1. Characteristics of IoT Different studies offer various classifications of IoT
architecture, with some [7, 8] identifying three essential
The following attributes are vital for the efficient design, layers, while others [9], [10, 11] categorize it into three, four
deployment, and management of IoT systems, as identified in [12] or five layers. In this study, we present the three-layer
[2], [6]: approach: Perception layer, Network layer, and Application
Actuating and Sensing: IoT gadgets contain sensors to gather layer.
environmental data and may include actuators to carry out 2.2.1. Perception layer
actions based on this data, such as adjusting thermostat
settings. The perception layer handles with the physical connectivity
and hardware components of the system [13, 14]. It includes
Scalability: The ability to handle substantial volumes of data devices, sensors, actuators, and technologies that enable
effectively is crucial in IoT systems, enabling insightful connectivity to the network. The key elements of the physical
analysis and decision-making. layer in IoT are shown in Figure 2. This layer also
Safety: Concerns about the security of personal data have incorporates protocols for performing specific tasks, as
emerged with the rise of IoT devices, highlighting the need illustrated in Figure 3.
for measures to avert unauthorized access and data breaches. 2.2.2. Network layer
Interoperability: Given the numerous origins of IoT gadgets In an IoT system, the network layer is crucial for facilitating
and the use of various communication protocols, device connectivity and enabling data exchange across
interoperability is essential to ensure seamless and efficient networks [15, 16]. It aligns with the OSI (Open Systems
collaboration between different devices. Interconnection) model and is responsible for routing packets
Energy Efficiency: Many IoT gadgets depend on batteries or between devices across different networks. Various protocols
limited energy sources, emphasize the importance of energy- in this layer handle transmitting IP datagrams from the source
to the target network.
Key protocols include: used in IoT for its publish-subscribe model, making it
suitable for scenarios with intermittent connectivity.
• IPv4 (Internet Protocol Version 4): The most widely
deployed internet protocol, using a 32-bit address scheme • XMPP (Extensible Messaging and Presence Protocol)
to identify devices in a network. and AMQP (Advanced Message Queuing Protocol):
Additional protocols used for message-oriented
• IPv6 (Internet Protocol Version 6): The latest version of communication in IoT systems.
the internet protocol, which uses a 128-bit address
scheme and is the successor to IPv4. These layers and protocols work together to ensure the
efficient operation and communication of IoT systems, with
• 6LoWPAN (IPv6 over Low-Power Wireless Personal each layer addressing specific functional and technical
Area Networks): A protocol that allows IPv6 packets to requirements.
be transmitted over low-power, low-rate wireless
networks, commonly used in IoT devices. It is designed 2.3. Internet of Things Vulnerabilities
to tackle the challenges of connecting devices with IoT gadgets have become more prevalent in every facet of
limited power, processing capabilities, and memory. everyday life, providing simplicity and automation [18].
• RPL (Routing Protocol for Low-Power and Lossy However, they also introduce significant security challenges
Networks): A protocol designed for routing data in low- and vulnerabilities. Below are some common IoT
power IoT networks. vulnerabilities [19], [20]:
• LoRaWAN (Long Range Wide Area Network): A low- • Inadequate Authentication and Authorization: Numerous
power, long-range protocol designed for wireless battery- IoT gadgets come with default usernames and passwords
operated devices. that are often left unchanged by users, making them
vulnerable to unauthorized access. The absence of Two-
Selecting a network layer protocol depends on the particular Factor Authentication (2FA) also makes it easier for
needs and limitations of the IoT implementation, considering hackers to obtain illegal access to IoT gadgets.
factors like interoperability, security, scalability, and power
efficiency. • Poorly Implemented Encryption: Some IoT devices
transmit data without adequate encryption, making it
2.2.3. Application Layer susceptible to interception and manipulation by attackers.
This layer is the top layer, accountable for delivering specific The use of weak encryption methods further increases the
functionality and services to various IoT applications [17]. It risk of compromising sensitive data.
enables communication between devices, applications, and • Outdated Software and Firmware: When manufacturers
services within the IoT ecosystem. Figure 4 demonstrates the fail to release regular firmware updates, devices remain
three layers and the protocols used in each layer. Various be susceptible to known exploits. Additionally, some
application layer protocols are employed to facilitate devices are unable to install updates, leaving them
interoperability, data sharing, and communication between exposed to security vulnerabilities.
IoT systems and devices: HTTP (Hypertext Transfer
Protocol): Used for web communication. Devices can send • Privacy Concerns: Inadequate privacy protections may
and receive data over the internet using HTTP or its secure result to the exposure of sensitive user information,
version, HTTPS. It is commonly used for web-based resulting in data leaks. Moreover, manufacturers may
communication and RESTful APIs. gather and retain more user data than necessary,
increasing the risk of privacy breaches due to insufficient
• CoAP (Constrained Application Protocol): A lightweight user data management.
protocol designed for networks and devices with
constraints. It is often used where a simple and efficient • Limited User Awareness: Many users are unaware of the
communication protocol is needed. hazards related to with IoT gadgets, which can lead to
insufficient security practices, such as neglecting to
• WebSocket: A protocol that enables full-duplex change default settings or failing to apply security
communication over a single socket, enabling real-time measures.
message exchange between client and server.
• Inadequate Physical Security: A lack of tamper
• MQTT (Message Queue Telemetry Transport): A protection can result in unauthorized physical access to
lightweight messaging protocol designed for low- IoT devices, compromising their security. gadgets
bandwidth, high-latency, or unreliable networks. It without proper physical security measures are vulnerable
operates on top of the TCP/IP protocol and is frequently to being manipulated or stolen. These vulnerabilities
emphasise necessity of developing robust security operations, and the integration of IoT technologies for
practices and educating users about potential risks in IoT autonomous vehicles.
systems.
2.4.5. Smart Agriculture
2.4. Internet of Things Applications
Precision agriculture, or smart agriculture, utilizes IoT sensors
IoT applications span across multiple industries and fields, to monitor crop health, irrigation, and soil conditions.
offering innovative solutions to improve efficiency, Wearable technologies and sensors are also used for cattle
connectivity, and automation. Below are some of the monitoring, tracking the health and behavior of livestock.
prominent IoT applications: These technologies enhance productivity, sustainability, and
efficiency in farming [1].
2.4.1. Smart Home Automation
2.4.6. Smart Healthcare
In intelligent homes, appliances like refrigerators, televisions,
doors, and heating systems can be automated and remotely IoT gadgets are widely utilized in healthcare for remote
controlled [1]. Users can customize door settings, maintain patient monitoring and timely interventions. For instance,
cameras, manage home security systems, and control sensors can be implanted to observe glucose levels in diabetic
appliances such as air conditioners and heaters. Energy patients and send alerts when levels become critical. Wearable
consumption can also be optimized by automating tasks like devices track health indicators and communicate data to
lighting and temperature management. Examples include medical professionals. Additionally, smart pill dispensers help
smart thermostats that adjust temperature and humidity based monitor drug adherence, and asset tracking systems in
on user preferences and energy-efficient lighting systems that hospitals manage medical supplies and equipment.
are remotely controlled. Integrated cameras, sensors, and
2.4.7. Smart Environment
alarms enable remote surveillance and form part of smart
security systems. Smart environmental technologies use data-driven strategies
to monitor and improve both built and natural environments.
2.4.2. Smart Cities
These innovations address environmental issues, promote
Urban areas leverage IoT devices like meters, lights, and sustainability, and enhance quality of life. Air quality sensors
sensors to collect and analyze data, which is used to improve measure pollutants, providing real-time data for managing
public utilities, infrastructure, and services. Smart city environmental health, while water sensors monitor the
technologies aim to simplify daily tasks, enhance efficiency, condition of natural bodies of water to detect pollution.
and address public safety, traffic management, and
2.4.8. Smart Grid
environmental sustainability issues. Examples include smart
meters for effective energy management and connected This domain is the next generation of energy infrastructure,
vehicles [21]. Traffic management systems, such as intelligent enhanced with IoT connectivity and communication
traffic lights and parking systems, reduce congestion, while technologies to improve resource utilization. It enables more
waste management solutions use sensors to optimize efficient electricity distribution, real-time monitoring, and
collection routes. disaster prevention. Smart grids also detect energy spikes and
device malfunctions, helping to enhance reliability and reduce
2.4.3. Smart Transportation
power transmission costs. In Table 1, the applications,
Smart transportation integrates IoT and other advanced principles, and weaknesses of IoT in various domains are
technologies to boost the sustainability, safety, and efficacy of summarized.
transportation networks. It relies on interconnected sensors
2.5. Internet of Things Critical Attacks
and data from mobile gadgets, GPS, accelerometers, and
weather sensors to optimize urban traffic and freight An IoT assaults indicate to a breach of an IoT system,
scheduling, improve road safety, and reduce delivery times targeting gadgets, networks, data, or users. Cybercriminals
[22]. exploit these vulnerabilities to steal data or gain control over
automated systems, threatening their functionality. Due to the
2.4.4. Smart Vehicles
inherent weaknesses in the IoT environment, it remains
Smart vehicles, or intelligent cars, are equipped with AI- constantly exposed to cyberattacks. These assaults can be
controlled computer systems that relieve drivers of routine categorized as either active or passive, and they are still under
driving tasks. This technology aims to improve highway investigation, as researchers have not yet developed definitive
safety by reducing the driver's decision-making burden. Key solutions to fully protect IoT systems. This subsection
features include telematics for data collection and discusses the most critical active and passive assaults in the
transmission, fleet management for monitoring vehicle IoT environment. The objectives and details of each attack are
outlined in Tables 2 and 3. Below are the primary types of This attack involves the illegal monitoring and analysis of
IoT attacks. network traffic to gain insights into patterns, behavior, or
private data shared between IoT devices. Unlike other attacks
2.5.1. Passive Internet of Things Attacks
that exploit device or network vulnerabilities, traffic analysis
Passive IoT attacks involve unauthorized monitoring, focuses on passive observation of data transmission [25].
eavesdropping, or information gathering without actively
• Passive Device Fingerprinting
interfering with the communication or functionality of IoT
devices [23]. These attacks are often subtle and aim to collect Passive device fingerprinting in IoT refers to identifying and
sensitive information for malicious purposes. Defending profiling IoT devices on a network without actively engaging
against passive IoT attacks requires strong encryption, secure with them. This involves observing and analyzing network
communication protocols, monitoring network traffic for traffic, characteristics, and patterns generated by devices to
anomalies, and using intrusion detection systems. Below are create a unique fingerprint or signature. The data can be used
common types of passive IoT attacks: for goal such as targeted attacks, reconnaissance, or
unauthorized access [26].
• Eavesdropping
• Radio Frequency (RF) Snooping
Eavesdropping in IoT threats refers to the unauthorized
monitoring and interception of communication between an RF snooping in IoT involves the unauthorized interception
IoT device and a network. In this type of attack, the assailant and analysis of radio frequency signals emitted by IoT
covertly listens to the data or messages being transmitted to devices. These attacks exploit wireless communication
acquire sensitive information, such as credentials or personal channels used by IoT devices to exchange information,
data, without actively disrupting communication [24]. potentially leading to the extraction of sensitive data, device
identification, or remote control of targeted devices [27].
• Traffic Analysis Attack
Smart Encompasses diverse IoT applications like fire - FN and FP may result in disastrous results for such
Environment disclosure in forests, observing the amount of snow IoT applications.
in high altitude areas, avoiding landslides,
- Integrating diverse technologies, protocols, and
premature disclosure of earthquakes, pollution
devices can be complex, leading to challenges in
observation, etc.
maintaining and managing the system.
Privacy breaches.
Smart Grids A bi-directional power network that facilitates the
- Physical and cyber-attacks criticality of data delivery
transmission of both electricity and data using latency.
digital communications technologies.
- The smart grid may be more vulnerable to
cyberattacks, equipment malfunctions, and system
failures due to its reliance on digital technologies.
Intelligent grids rely heavily on digital
communication and data transfer, making them
vulnerable to cyber-assaults and hacking.
Smart Healthcare Enhancing the quality of care delivered. - Sensitive health data gathering, and storage provide
serious security threats.
Enhancing patient health outcomes.
- Insufficient cost-effective intelligent and precise
Minimizing healthcare expenses.
medical sensors.
- Lack of a standard architecture of IoT system.
- High handling volume data and challenge of
interoperability, etc.
Require robust privacy measures to ensure that
patients' personal health information.
Smart Decreased traffic congestion leads to improved air
- The software of the control system could be
Transportation quality, less wasted time, and decreased compromised by hackers.
consumption of energy.
- creating issues with data privacy and the possibility
of misuse or illegal access.
- High Implementation cost.
Energy Requirements.
Smart Vehicles Analyzes intelligent vision for safe driving, Vulnerability to Hacking.
intelligent monitoring of driving that is insecure,
Accomplishing a high level of safety in autonomous
intelligent disclosure of automobile power and
vehicles is challenging,
transmission systems, intelligent vehicle navigation
and transportation systems, and intelligent High energy consumption.
technology that can be assisted by vehicles.
Data Privacy Risks.
Smart Managing farms with the utilize of sophisticated Vulnerable to cyber threats.
Agricultural information and communication technology to raise
- Farmers' Privacy Concerns.
product quantity and quality while reducing the
amount of human work necessary. - Lack of technical skills.
Affected by adverse weather conditions.
providing insights to help IoT developers in managing contributed to the current advancements in learning
hazards and security flaws for improved protection. It also algorithms, commonly referred to as ML and DL.
presented alternative five-layer and seven-layer IoT
ML is an area of AI focused on developing systems that learn
architectures alongside the current three-layer design. Modern
or enhance their performance based on the data they consume,
approaches to enhancing IoT device security include
while DL is a subfield of ML. AI is an umbrella term that
leveraging machine learning, edge computing, fog computing,
indicates to systems or gadgets that simulate human
and blockchain technology while also addressing unresolved
intelligence. ML, DL, and AI are often discussed together,
research issues. Author [45] presented a complete
and the terms are sometimes employed interchangeably;
categorization for authentication and access (AA) in IoT
however, they do not represent the same concept. It is vital to
networks, evaluating various elements of AA using
note that while ML and DL methods are forms of AI, not all
conventional and ML-driven approaches to assess their
AI encompasses ML or DL.
potential to enhance IoT ecosystem security and identify
research areas. The topic of IoT architecture in the context of ML enables machines to learn independently without human
AA schemes was also covered, focusing on different risks and guidance to perform tasks. It deduces a model for solving
attacks at each IoT layer. IoT applications utilizing machine future problems by extracting specific patterns from data [51].
learning algorithms for AA were examined for their This field emerged from scientists' aspirations to create
requirements and existing challenges. autonomous systems that infer without human intervention,
moving beyond the previous reliance on direct commands. In
Study [46] analyzed recently proposed models, protocols, and
today's world, ML is pervasive in various sectors. Whether we
encryption techniques for securing IoT networks, highlighting
link with banks, shop online, or use social media, ML
the latest security trends. It discussed the classification of IoT
algorithms play a crucial role in ensuring our experiences are
attacks and provided an updated analysis of protocols and
efficient, seamless, and secure.
standards proposed for IoT systems. Study [47] reviewed
current IoT security issues related to potential future attacks, The technologies surrounding ML are evolving rapidly.
identifying concerns associated with IoT integration with Conventional ML methods rely on engineered features, while
cloud and blockchain technologies, changes in cryptography DL methods represent advancements in learning techniques
due to quantum computing, and the growth of artificial that utilize multiple non-linear processing layers for feature
intelligence. Study [48] compiled information on reported abstraction and transformation, aiding in pattern analysis.
security vulnerabilities, their classification, and remedies Therefore, the aim of this review of ML and DL is to provide
proposed to address IoT security challenges. readers with a comprehensive understanding of both. In this
section, we will first examine ML techniques from an IoT
Study [49] identified major security concerns and anticipated
security perspective, discussing their pros and cons, along
challenges within the IoT ecosystem, guiding authentication
with solutions for IoT security. Next, we will review DL
methods and addressing various threats. Study [50] provided a
algorithms, their advantages and disadvantages, and their
concise overview of security issues across different IoT
applications in addressing IoT security challenges.
protocol layers, along with preliminary simulation findings.
Through our review of these studies, we summarize them in 3.1. Machine Learning (ML) Techniques
Table 4, focusing on discussions about IoT security and the
In this subsection, we furnish an overview of ML methods
limitations of these studies.
that have proven effective in disclosing and mitigating cyber-
3. REVIEW ON MACHINE LEARNING AND DEEP assaults in IoT-based environments. ML involves training a
LEARNING computer to achieve a performance criterion by using
previous or sample data [52]. ML algorithms create a
Traditional security mechanisms have demonstrated
mathematical model that aids in generating predictions or
inadequate in tackling the security challenges related to IoT.
decisions using training data and previous data samples,
Therefore, researchers and experts must explore more
without the need for explicit programming. ML merges
efficient mechanisms to confront the security risks that
computer science and statistics to develop prediction models,
threaten this technology and, consequently, human lives. For
with a fundamental aspect being the development and use of
this reason, modern methods related to artificial intelligence
algorithms that derive knowledge from past data. Providing
(AI) have been investigated and shown to be capable of
more data generally improves the performance of ML
combating cyberattacks, such as hacking devices and cracking
algorithms.
passwords. Due to their distinct problem-solving approaches,
learning algorithms have found widespread adoption in ML techniques are suitable for IoT devices with resource
various real-world applications. The emergence of low- constraints, as they can detect various IoT attacks early by
computation-cost algorithms, combined with the availability observing network behavior [53]. ML methods can be broadly
of vast datasets and the development of novel methods, has classified into two categories: Supervised Machine Learning
(SML) and Unsupervised Machine Learning (USML). This (KNN), Random Forest (RF), Association Rule (AR), and
subsection addresses common ML techniques, such as PCA, Ensemble Learning (EL), along with their pros and cons in
K-means clustering, Decision Trees (DT), Support Vector IoT security.
Machines (SVM), Naive Bayes (NB), k-Nearest Neighbors
Another study [60] presented and implemented a sequential networks using SMOTE-RF, which is trained to address
detection architecture for an ML-based botnet attack detection imbalanced and multi-classification issues. The suggested
system, utilizing J48 DT, ANN, and NB, which showed a method attained an accuracy rate of 80%.
higher performance score in creating a lightweight, high-
3.1.4. Naive Bayes (NB)
performing detection system.
NB is a (SML) algorithm based on Bayes' theorem and is
3.1.3. Random Forest (RF)
commonly utilized for solving classification problems. NB
RF is a (SML) method where multiple DTs are built and makes predictions based on the probability of an event
combined to form an RF, which creates a robust and accurate occurring given the prior data [64]. In IoT security, the NB is
prediction model for better overall outcomes. In RF, trees are employed to forecast attacks based on historical data and is
randomly constructed and trained to select a class by voting. particularly effective in detecting network layer anomalies. It
The method's performance improves as the number of trees has benefited from methodologies such as Gaussian Naive
increases, leading to higher classification accuracy and Bayes, which allows it to handle continuous data more
prediction reliability. RF is widely utilized in IoT security efficiently in IoT applications. This enhancement makes NB
operations, such as anomaly disclosure, due to its exceptional suitable for real-time intrusion detection that requires quick
classification capabilities. Recent advancements in RF utilize classifications [65]. A study in [66] proposed intrusion
ensemble methods that combine multiple DTs to enhance detection methods based on Naïve Bayes, noting that the
classification accuracy and robustness in detecting anomalies Bayes classifier is particularly well-suited for intrusion
in IoT communications. Additionally, feature importance detection systems (IDS) due to its high classification speed.
analysis allows RF to identify critical features in high- Another study [67] presented an IDS model based on a two-
dimensional IoT data, improving the model's interpretability layer dimension reduction and a two-tier classification
[61]. For instance, a study [62] employed RF and other ML module, built to disclose malicious activities such as User-to-
techniques to disclose and prevent DoS assault traffic arriving Root (U2R) and Remote-to-Local (R2L) attacks using NB and
from smart home LAN devices. RF achieved 99% accuracy KNN. The model achieved a DR of 84.82% with a high false
and precision compared to other ML algorithms in the alarm rate (FAR) of 5.56%, while the two-tier model attained
proposed methods. Another study [63] suggested a method to a DR of 83.24% and FAR of 4.83%.
classify Advanced Persistent Threat (APT) malware in IoT
Table 4 Existing Surveys in IoT Security
Ref. IoT IoT IoT IoT IoT Limitations
Characteristics Protocols Architectures Security Challenges
Solution
[41] Deficiency to taxonomy ant bit
discussion on attacks detection scheme
in IoT layers.
Deficiency to present security solution
mechanism, IoT vulnerabilities are not
considered
[42] IoT security measures are not
adequately regarded and focuses on the
of IoT features’ impact on security and
privacy without emphasis regarding IoT
security requirements.
[43] IoT security requirements and
mechanisms are disregarded.
Deficiency to discuss IoT application.
[44] Deficiency to discussions on important
IoT security requirements.
[45] Lacks classification and minimal
discussion on ML techniques, moreover,
IoT Vulnerabilities are not specified.
resilient robust classifier, which addresses the imbalanced 3.2. Deep Learning (DL) Techniques
nature of IoT security datasets. Various EL methods, such as
Recently, the incorporation of DL in IoT systems has gained
stacking, boosting, and voting, can be applied in IDS
significant attention as a research area. DL outperforms
strategies, enhancing their effectiveness [74].
classical ML techniques, particularly when applied to large
In [75], the authors suggested a new smart ensemble-based datasets, which is one of its primary advantages [88]. DL is
IDS designed to be deployed at the IoT gateway. The method the most advanced method for analyzing data to assess both
applied NB, SVC, and KNN classifiers and achieved high benign and malicious behaviors of IoT components based on
accuracy and performance when combined with EL the interactions between devices within an IoT environment.
techniques, exceeding 90% in accuracy compared to methods By learning from past attacks, DL models can accurately
without EL. While ML techniques are effective in disclosing predict future attacks. DL is a branch of ML that employs
cyber-attacks in IoT environments, they face challenges multiple non-linear processing layers to abstract and
related to reliability, accuracy, and efficient labeling of data. transform features in a discriminative or generative manner
These methods must adapt to the diverse data generated by for pattern analysis. Because DL techniques can capture
IoT applications, but they also come with limitations. hierarchical representations in deep architectures, they are
often referred to as hierarchical learning techniques [1].
Table 5 highlights the benefits and drawbacks of diverse ML
Examples of discriminative DL methods include
techniques and their applicability to different types of
Convolutional Neural Networks (CNNs) and Recurrent
assaults. Although many studies have proposed ML-based
Neural Networks (RNNs). Hybrid DL methods include
methods to mitigate IoT security concerns, there remain
Autoencoders (AEs), Deep Belief Networks (DBNs),
deficiencies in their findings.
Restricted Boltzmann Machines (RBMs), Generative
Table 6 presents previous work on using ML to detect assaults Adversarial Networks (GANs), and Ensembles of DL
in IoT environments [76-87]. Networks (EDLNs).
Table 5 Advantages and Disadvantages of ML Techniques
Technique Advantages Disadvantages Application
SVM Employ kernel mechanisms and is Unbalanced samples have an impact Use with:
capable of simulate decision boundaries on conventional SVM performance
Anomaly Detection.
that are non-linear. efficiency.
IoT-Botnet
SVMs are well-known for their capacity Memory-sensitive and could find it
detection.
to generalize and for being applicable to challenging to choose the best
data that has a lot of feature attributes but kernel when modeling massive data DoS/DDoS
few sample points. sets. Detection
Perfect for data with a numerous of
feature attributes.
Memory and storage are used less.
Extremely scalable and task-performing.
Suitable to IoT Security due to has a
higher classification accuracy.
NB Employ to address practical issues such Incapable of extracting valuable Suitable for
as text classification and spam detection. information from feature Anomaly disclosure
correlations and interactions. in IoT network.
High Scalable, Rapid, Robust.
Appropriate for carrying out multi-stage
classification and needs less data for
classification.
Handle with high-dimensional data
points.
RF Its applicability to any size data sets and Require a long time to train period Suitable for DoS,
flexibility of implementation are not than other supervised algorithms. DDoS, Probe, R2L,
quite complex. U2R attacks,
Impacted when the quantity of trees
intrusion anomalies,
Suitable for simulating real-world surpasses a particular threshold,
and unauthorized
situations. which causes the algorithm to
IoT devices.
become sluggish and less efficient
High veracity and less prediction time.
for real-time classification tasks.
K-NN Simple to use. Unqualified for data with high Suitable for U2R,
dimensional and are memory R2L, Flooding
Reasonable score to accuracy to detect
intensive. attacks, DoS, DDoS.
U2R and R2L attacks.
Not function well with enormous And Intrusion
data sets and are highly sensitive to detection and
outliers and missing values. anomalies.
K-mean Simple algorithm and flexible. less effectiveness than techniques in Suitable for:
SL. methods, especially in detecting Detecting anomalies.
Functions well with unlabeled data.
known attacks.
Sybil attacks in IoT.
Utilize for confidential data
Obtained poor cluster formation
anonymization in an IoT system.
results, if the clusters are not
globular.
DT Basic, simple to use, and transparent Demands large storage. DDoS
technique.
Understanding DT-based Network traffic
approaches are simple only if a few
DTs are included.
PCA Reduce data dimensionality and rise the Not resistant to any outliers, which Used in IoT system
computational speed. has an impact on its performance. real-time detection
Enhances the effectiveness of ML Presupposes a linear connection
techniques by choosing features related between two features, making it
to IoT assault disclosure. challenging to assess the correlation
between the features.
EL Suitable for complex problem in IoT Long time for training and testing Used with anomaly
attacks detection phase. detection and botnet
disclosure
Providing high performance
Table 6 Previous Studies on IoT Attacks Detection-Based ML Algorithms
Reference Algorithm Attacks Shortcoming Observation
[76] DT, NB, RF, Malicious Inappropriate feature DT and RF fulfilled high performance,
SVM Bot-net selection lead to However SVM, and NB were slightly weak.
misclassify malicious
traffic flow.
[77] KNN, DT, APT Less performance RF achieved high performance compare the
XGB, RF malware measurement. rest of classifiers.
[78] SVM, DT, NB, DDoS FAR is high. The classifiers performance evaluation,
USML however, FAR and specificity need to be
Specificity is less.
improved.
[79] NB, C4.5, RF Anomaly Time taken to establish The classifiers obtained exceptional
and the model is high. performance but the time need to be
Intrusion minimized.
[80] EL, RF, DR, Botnet Imbalanced dataset. The ensemble model achieved high
KNN performance, however the time computation
Binary-class
is high.
classification model.
[81] DT, XGB, LR Botnet Binary-class The model achieved high performance Metric
classification. with EL classifier compare to the other two
classifiers, but the model needs to get rid of
Overfitting model.
overfitting and increase the test accuracy in
Testing accuracy for balanced dataset.
balanced data is less.
[82] DT, RF, SVM Injection Performance classifiers Classifiers fulfilled high performance metric
attack reduces as the quantity of except SVM achieved the vilest performance.
features increases. However, the model requires to adjust the
number of selected features.
[83] Voting, stacking DDoS Execution time is high. The models achieved high performance but
the time of execution is high specially in
stacking.
[84] DT, SVM, NB Routing Overfitting in few The models achieved high performance
Attack classes. metric, but the model lack lacks clarification.
[85] NB, LR, DT, DDoS Binary- class The model contains two experiments, both
KNN, RF classification. fulfilled high performance but only in binary
classification.
Overfitting model.
[86] RF, DT, XGB, MiTM, DoS Binary-class The classifiers achieved high performance
GB classification. metric to detect MiTM but achieved
reasonable performance metric to detect DoS.
[87] ML Black hole The energy consumption The energy consumption is increase by
attack is high. increase the quantity of nodes.
3.2.1. Convolutional Neural Network (CNN) DenseNet, which improve their feature extraction capabilities
from IoT traffic data. Moreover, CNNs can utilize multi-
CNNs are a type of DL model frequently employed for image
channel inputs to analyze various characteristics of network
classification and recognition. They analyze input images and
data simultaneously, enhancing their ability to detect complex
classify them into categories such as dogs, cats, lions, and
cyber-attacks. Their capacity to process large datasets makes
tigers. Unlike other neural networks, CNNs process images as
CNNs highly effective in IoT security, leading to improved
two-dimensional pixel arrays, focusing directly on the images
detection results. For example, a study in [89] presented a
rather than relying on feature extraction. CNNs consist of
technique combining two CNN models (CNN-CNN) to
three layers: the input layer, which supplies inputs to the
disclose assaults on IoT networks. Using raw network traffic
model (each neuron in this layer corresponds to features in the
data, the first CNN model identifies key features that assist in
data); the hidden layers, which can consist of multiple layers;
disclosing IoT assaults. The second CNN utilizes these
and the output layer, which converts the outputs of the hidden
features to generate a strong disclosure model that reliably
layers into probability scores for each class using a logistic
identifies IoT assaults. The suggested approach attained a
function such as sigmoid or SoftMax. CNNs have been
confusion matrix score of 98%. The ability of CNNs to
enhanced with sophisticated architectures such as ResNet and
concurrently learn relevant features and perform classification
removes the need for manual feature extraction, producing an accuracy, achieving superior detection rates for abnormal
end-to-end model that, with optimization algorithms, offers attacks.
exceptional results for IoT security-based IDS [90].
3.2.4. Restricted Boltzmann Machine (RBM)
3.2.2. Recurrent Neural Network (RNN)
RBMs are generative and stochastic neural networks that can
RNNs, or Artificial Neural Networks (ANNs), are primarily model probability distributions over input groups. They are
applied in speech recognition and natural language processing used for feature selection and extraction in various
(NLP). RNNs are designed to recognize patterns in various applications, including dimensionality reduction,
data types, comprising text, genomes, handwriting, spoken classification, and regression. RBMs consist of two layers: an
language, and numerical time-series data. RNNs are used by input (visible) layer and a hidden layer, which serve as the
systems such as Apple’s Siri and Google’s voice search for foundational elements of Deep Belief Networks (DBNs).
processing sequential data. RNNs are especially effective in RBMs excel at pattern recognition tasks such as interpreting
IoT security because of their ability to analyze sequential handwritten text and identifying radar targets in low signal-to-
data, making them essential for network IDS (NIDS). Long noise ratio conditions. Additionally, RBMs are used in
Short-Term Memory (LSTM) networks enhance RNNs by recommendation systems, enhancing user suggestions through
mitigating the vanishing gradient problem and enabling the filtering algorithms [95]. Improvements in RBMs, such as
detection of long-term dependencies, which are crucial for layer-wise pre-training, allow these models to develop
disclosing IoT security vulnerabilities in time-series data. As hierarchical features that improve their ability to detect
a result, DL methodologies such as RNNs have become a intricate attack patterns in IoT networks. RBMs are crucial for
central focus in NIDS research [91]. In [92], a proposed identifying attacks in IoT environments [96]. A previous
model integrated DL and metaheuristic techniques by using study [97] proposed an innovative approach for anomaly
RNNs within a multi-modal framework to efficiently capture detection by projecting raw features through a constrained
complex correlations in diverse network traffic data. The Boltzmann machine. This approach outperformed several
model used a wavelet-based feature extraction method to modern methods when evaluated on a widely known anomaly
improve the discriminative power of the generated features, detection dataset, demonstrating strong performance metrics.
achieving remarkable performance metrics with a 98%
3.2.5. Deep Belief Network (DBN)
accuracy score and an AUC of 99%.
A DBN is a type of generative neural network that uses an
3.2.3. Auto-Encoders (AEs)
unsupervised learning model. DBNs are often referred to as
AEs are an area of neural network in which the "Boltzmann Machines" and consist of multiple layers of
dimensionality of the input and output layers are equal. Since neural networks. They have enhanced their ability to predict
an AE replicates data from the input to the output in an complex patterns in IoT traffic for threat detection through
unsupervised manner, it is also referred to as a replicator unsupervised pre-training followed by supervised fine-tuning.
neural network. The AE network consists of two main DBNs have emerged as a critical technique for detecting
components: the encoder function (h = f(x)) and the decoder malicious activities in IoT security [98]. While researchers
function responsible for reconstructing the input (r = g(h)) [1]. have not yet thoroughly analysed every aspect of DBN-based
intrusion detection model, further research is expected to
The encoder takes the input and converts it into an abstract
present these techniques in greater detail, as DBNs are ideal
representation named a code, while the decoder uses this code
for feature extraction and are particularly robust for
to rebuild the original input. During AE training, the goal is to
classification tasks.
minimize reconstruction error. Recent advancements in
variational autoencoders (VAEs) have improved their ability 3.2.6. Generative Adversarial Network (GAN)
to learn data distributions and extract features, increasing their
GANs are ML models that contains two neural networks
efficacy for unsupervised anomaly detection in IoT security.
competing against each other to improve their prediction
In IoT networks, AEs can proficiently detect various types of
accuracy. GANs typically operate in an unsupervised manner
IoT assaults. A study [93] developed an architecture based on
within a cooperative zero-sum game framework. To use
an asymmetric parallel autoencoder (APAE), with two
GANs effectively, the first step is to identify the desired
encoders working simultaneously, each with three successive
outcome and collect an initial training dataset based on these
layers of convolutional filters. This lightweight architecture
parameters. GANs have advanced significantly through the
enhances AE's ability to detect unknown attacks and improve
use of conditional GANs and semi-supervised learning
detection rates. Another study [94] proposed the
methods, enhancing their ability to create realistic attack
nonsymmetric autoencoder (NAE) model, where the encoder
scenarios. This strengthens model robustness and prepares
extracts complex hidden representations of network traffic,
systems to defend against unknown attacks. In IoT security,
and the decoder reconstructs the input data with high
GANs can proficiently protect systems from unknown
intrusions [99]. GANs are also capable of securing the IoT their applicability in assault detection. Table 8 outlines
physical layer [100]. A previous study [101] introduced a various DL algorithms from previous studies that discuss
technique for detecting human activity using generative researchers' efforts to address IoT security challenges referred
adversarial micro-aggregation, which improved data privacy to in [102-115]. Table 9 summarizes the key hardware and
while generating realistic samples based on the estimated resource requirements for applying ML and DL in low-power
distribution of the original data. This method showed superior IoT devices, which is crucial for optimizing model
efficacy in securing IoT systems. Despite the benefits of using performance while maintaining energy efficiency and
DL to combat IoT assaults, some challenges remain. Table 7 practical operation.
illustrates the benefits and drawbacks of DL methods and
Table 7 Advantages and Disadvantages of DL Techniques
Technique Advantages Disadvantages Applications
CNN Ideal for rapid and extremely efficient Needs high computational Malware attacks
feature extraction. power.
Anomaly attacks
Require less preprocessing Compared to Highly challenge when using
other methods that are ideal for rapid and on resource-constrained IoT
highly effective feature extraction. devices.
It may employ raw network security data to
automatically learn behavior.
RNN Can automatically learn new information and Addressing the problem of Malware attacks
predict sequences based on historical data. gradients that vanish or
extend, which poses
Suitable to IoT Security due to IoT
difficulties while learning long
environment creates sequential data in some
data sequences.
circumstances.
Training Slow and complex
high prediction capability.
tasks.
AEs Used in dimensionality reduction and extract Required high computational. Botnet attacks
the features.
Since the training dataset is
not typical of the testing
dataset, the outcomes could
not be what was expected.
RBM RMBMs' feedback function enables the elicit Require a lot of computational R2L, DoS, U2R
of vital features, that are then utilize to log capacity. and Probe
IoT traffic behavior.
Features cannot be represented
by a single RBM.
DBN Exceptionally accurate and reliable. Demand a large computational R2L, DoS, U2R
cost.
It is suitable for significant feature extraction
because it has been trained on unlabeled
data.
GAN Suitable for Zero-day attack. Training is challenging and Mirai, Bashlite,
provides erratic outcomes. Scanning, MiTM
4. EXISTING SURVEYS ON ML AND DL The study also discussed how blockchain can be applied to
TECHNIQUES efficiently address these issues. The study in [127] provided
fundamental information on security threats and safeguards in
This article presents and discusses previous studies on ML
IoT networks, covering topics such as the IoT market,
and DL, comparing them with the survey we present. Since
security architecture, and procedures for security managers
2018, many studies on IoT security have been conducted,
and IoT developers. The author in [128] discussed primary
with a particular focus on ML and DL applications for IoT
security and forensic issues in the IoT domain and presented
security. Our survey addresses the shortcomings in previous
papers addressing these topics.
discussions of these two techniques, as well as inadequate
allocation of attention to their capabilities. For instance, the In [129], DNN topologies and the potential benefits of deep
author in [116] provided a comprehensive review and analysis learning were discussed, along with a detailed analysis of IoT
of diverse ML methodologies, highlighting issues with use cases powered by DL. In [130], a comprehensive
different ML approaches for detecting invasive activities. The overview of current IoT security solutions and developments
research in [117] analyzed the possibilities and challenges of was presented, focusing on IoT security threats. The survey in
utilizing data in ML solutions for IoT privacy by exploring [131] provided a recent overview of various ML techniques
various data sources, analyzing them, and examining ML- for IoT applications, covering supervised and unsupervised
based solutions currently in development, designed to models that support IoT frameworks and the importance of
preserve IoT privacy. In [118], the threats to IoT security ML models in relation to IoT.
were reviewed, along with a systematic analysis of those
In [132], a classification system for IoT attacks was provided,
threats from both the training and testing/inference
along with an examination of IoT security weaknesses at
perspectives. The author categorized current ML-based
different levels. The study also presented an analysis of recent
defensive techniques into four groups.
security systems by evaluating the effectiveness of new
The research in [119] focused on studies related to intrusion solutions. The study in [133] reviewed IoT security protection
detection (ID) for computer network security and ML and concluded that AI methods such as ML and DL can offer
techniques for IoT. In [120], the Cisco IoT reference model novel abilities to meet IoT security requirements. In [134], a
architecture was used to classify well-known security brief description of ML and DL-based IDS was provided,
concerns, allowing the study to focus on IoT security threats discussing different types of assaults and anomalies and how
and vulnerabilities. Additionally, an analysis of previous these systems disclose them.
studies on DL-based IDS in IoT security was included. In
In [135], a detailed account of cutting-edge approaches to IoT
[121], the IoT design was presented after an in-depth
data challenges was provided, while in [136], a systematic
literature analysis of ML techniques and the essential role of
literature review (SLR) examined the utilize of DL
IoT security concerning various attack vectors.
approaches for anomaly-based IDS in IoT environments. The
In [122], the IoT network security needs, assault vectors, and study extracted data from sources like IEEE Xplore, Scopus,
available security solutions were analyzed. The author also WoS, Elsevier, and MDPI. In [137], a summary of DL
highlighted the weaknesses in existing security solutions that techniques in cybersecurity applications was provided,
require ML and DL techniques and detailed the various ML including explanations of GANs, RNNs, restricted Boltzmann
and DL technologies currently available to tackle security machines, and deep autoencoders (AEs), followed by how
concerns in IoT networks. The study in [123] evaluated these DL methods apply to various types of assaults such as
current approaches for categorizing IoT security risks and network intrusions, malware, spam, insider threats, and more.
challenges in IoT networks, with a focus on network intrusion
In [138], a review of the pros and cons of ML algorithms in
detection systems (NIDS). A thorough analysis of NIDS using
IoT security was presented, with a concentrate on the
various IoT learning strategies was also provided.
application of DL and Federated Learning (FL) in IoT
In [124], the notion of malware and botnets causing DDoS security. FL models enable systems to share information
assaults in IoT was outlined and contrasted, along with the while protecting data privacy. In [139], the specifics of ML
different DDoS defense strategies. In [125], a detailed security attacks in cyber-physical systems were outlined,
investigation of IoT malware disclosure and static analysis along with defense strategies, threat models, and a
methods was presented, covering key techniques, along with comparative analysis of ML model performance under diverse
the pros and cons of current static IoT malware disclosure assault scenarios. The study in [140] reviewed privacy and
frameworks. security concerns related to DL algorithms, categorized
various assault types, and examined protection strategies,
In [126], assaults were classified into groups based on the
including privacy-preserving techniques like Homomorphic
most pertinent security threats, countermeasures, and real-
Encryption (HE) and hash functions.
world assaults across the generalized IoT/IIoT architecture.
The study in [141] discussed the major security problems and security, although it is impossible to cover every aspect in one
challenges that IoT infrastructures face, providing a thorough study. We analyzed these studies along with additional studies
examination of ML-based solutions for IoT security. [146-150], classified their contributions from 2018 to the
Additionally, the limitations of common ML-based security present in a sequential and descending order based on the
techniques for IoT were discussed. In [142], a tutorial-style years of publishing, and compared them with our survey, as
analysis of advanced DL architectures for cybersecurity summarized in Table 10.
applications was provided, along with an evaluation of recent
4.1. Research Papers Methodology
contributions and challenges.
In this survey, a collection of research articles was compiled
In [143], the latest findings on ML/DL-based scheduling
from various sources, including Elsevier, IEEE, Springer,
strategies were examined, covering the trade-offs between
MDPI, ACM, Hindawi, and others, published between 2018
accuracy and execution time, as well as the security and
and 2024. These articles focus on ML and DL survey papers
privacy of learning-based algorithms in real-time IoT systems.
and models. Each study was analyzed based on the problem
The study in [144] aimed to enhance IoT device security by
statement it attempted to tackle, the domain in which it was
reviewing ML systems and the latest advances in DL
executed, the types of attacks it aimed to detect, the methods
techniques, identifying future IoT device threats and
used to address the problem, and the outcomes obtained. A
protection concerns. The study also evaluated DL/ML
total of 200 papers were gathered for the literature review on
strategies for IoT security, discussing their potential and
ML and DL methods.
limitations.
Figure 6 illustrates the number of papers published in the
Lastly, in [145], a comprehensive overview of IoT security
journals mentioned in this survey, showing an increase in
intelligence based on DL/ML technologies was presented,
publications from Elsevier and IEEE compared to other
highlighting research topics and future directions. Prior
sources. Additionally, Figure 7 highlights the number of
studies have substantially enhanced our comprehension of IoT
papers published between 2018 and 2024.
Table 10 Analyzing and Classifying the Previous Studies Between 2018-2024
Reference Year ML DL Dataset Domain Attacks/Threats Countermeasures Challenges/Issues
[116] 2018
[118] 2018
[128] 2018
[119] 2019
[122] 2019
[123] 2019
[124] 2019
[126] 2019
[1] 2020
[117] 2020
[120] 2020
[121] 2020
[125] 2020
[127] 2020
[129] 2020
[133] 2020
[134] 2021
[135] 2021
[136] 2021
[137] 2021
[140] 2022
[143] 2022
[144] 2022
[150] 2022
[138] 2023
[142] 2023
[145] 2023
[146] 2023
[147] 2024
[148] 2024
[149] 2024
Our 2024
Survey
Figure 6 Number of Papers Published in the Journals Figure 7 Papers Published Between 2018-2024
5. CLASSIFICATIONS OF IOT LAYERS ATTACKS section discusses the types of assaults at each layer and the
solutions presented by researchers.
As mentioned in Section 2, the IoT architecture contain of
three key tiers: the perception layer, network layer, [151], and 5.1. Perception Layer Attacks
application layer, as demonstrated in Figure 4. The perception
This layer in IoT is accountable for gathering information via
layer is made up of sensors and controllers that collect data.
actuators, Zigbee, and RFID. It faces a variety of attacks
The network layer's primary role is to establish connections
aimed at damaging or destroying its devices. Attackers may
between networks, using protocols and various connections.
penetrate and modify devices through social engineering,
Finally, the application layer responds to the user and
launching large-scale attacks such as device destruction,
software programs, allowing users to access and retrieve data.
eavesdropping, or other assaults. Physical assaults, such as
The goal of this section is to understand the risks of attacks at manipulating energy sources or disrupting communication
each layer and provide an overview of the solutions offered mechanisms, may require the attacker to be in close proximity
by researchers, along with the benefits of each study. This to the target. For example, physical attacks like jamming,
eavesdropping, interference, and traffic analysis can disrupt
the physical layer. Robust approaches, including ML/DL to contemporary techniques, P4NIS reduced encryption costs
technologies, are required to detect and secure this layer. by 69.85%–81.24% and minimized false alarms. In [153], a
Several researchers have addressed physical layer attacks, ML model using SVM was presented to classify spoofing
particularly jamming attacks using radio frequency (RF). One assaults on signals received by unmanned aerial vehicles
technique compared SVM and K-NN methods in multi-track (UAVs). K-fold examinations were conducted to improve the
and single-route scenarios. The RF technique, combined with learning pattern, which was termed K-learning. The model,
AdaBoost, achieved superior findings compared to other using GPS features, achieved high levels of accuracy,
methods. Additionally, RF showed better accuracy and lower precision, recall, and F-score (99%, 98%, 99%, and 98%)
false alarms compared to other techniques. In [152], the when compared to earlier research. Additionally, Table 11
author proposed P4NIS, a network invulnerable schema with summarizes a few studies, [154], [155], [156], [157] detailing
three layers of protection to identify and prevent the contribution of each study, the type of assault tackled, and
eavesdropping attempts. The findings showed that, compared the outcomes obtained.
Table 11 Attack Detection in Physical Layer
Reference Contribution Attacks Algorithm Results
[154] The study provided a wireless fingerprinting-based Spoofing DT ACC: 95.43
PHY-layer continuous authentication and spoofing
disclosure method for a real WSN in which diverse
nodes connect to a central sink node.
[155] The author used DL with LSTM to provide DDoS LSTM ACC: 0.99
confidentiality and privacy in physical layer.
AUC: 0.99
R: 0.98
P: 0.95
[156] Proposed a CDAE model that decreases feature Malicious AE ACC: 0.98
dimensions, removes noise, and extracts key vectors.
AUC: 0.99
[157] Provided the advanced hybridized optimization Malicious UML DR: 0.98
technique AHGFFA to avoid attacks issues using
EC: 5
USML in the MANET-IoT sensors system.
5.2. Network Layer Attacks endpoints. Securing the application layer poses significant
challenges. Many of the vulnerabilities found here are based
This layer is accountable for transmitting data from the
on sophisticated user inputs that are difficult to disclose with
perception layer to the application layer for processing [158].
IDS. Additionally, this layer is vulnerable to software-based
This layer faces several threats, including eavesdropping,
assaults such as malware, viruses, worms, etc., and is publicly
man-in-the-middle (MITM) assault, Sybil assault, routing
accessible and visible to everyone. One notable example of an
information threats, and DDoS. When compromised, IoT
application layer attack is SQL injection, which was
devices may become botnets, enabling hackers to hinder
responsible for significant data breaches in 2014. SQL
communication paths between source and destination.
injection ranks third in frequency of attacks after DDoS and
Hackers can also launch Sybil attacks by exploiting
malware. Other common vulnerabilities in this layer include
compromised or fake nodes, tampering with security keys and
security misconfiguration, which allows hackers to alter
routing tables, which can affect higher levels of the IoT
program details and access confidential information without
system. Because the network layer sits between the physical
being detected by network security measures. In Table 13, we
and application layers, it plays a vital role in IoT security.
present recent studies [165-170] that address IoT attack
Numerous efforts have been made to secure this layer, with
detection at the application layer. Additionally, Figure 8
many studies achieving exceptional results in disclosing IoT
illustrates the taxonomy of IoT attack layers.
assaults at the network layer. Table 12 analyzes studies [159-
164] related to IoT assaults at the network layer. In Section 2, we discussed the prominent assaults in the IoT
environment, which are considered the most critical threats
5.3. Application Layer Attacks
impacting general IoT security. In this section, we have
This layer handles several data transactions and is responsible outlined the attacks that occur in each tier, focusing on the
for establishing a user interface between end users and primary challenges in each layer. This helps researchers
identify issues specific to each layer and gain comprehensive provides detailed information, and Table 15 outlines the key
knowledge of the challenges within each IoT layer. Table 14 principles of attacks on IoT layers [171-185].
Table 12 Attack Detection in Network Layer
Reference Contribution Attacks Algorithm Results
Presented a DL model named DeepAK-IoT to Botnet DeepAK ACC: 90.57
disclose cyber-assaults in IoT networks. base DL
[159] F1: 88.87
P: 89.59
Used DL to present a new anomaly-based IDs DoS DNN, ACC:
method for IoT networks. In particular, a DNN GAN-
DNN: 84.4
model with filter-based FS that eliminates highly DNN
[160] linked features has been introduced. Additionally, GAN-DNN: 90.9
the model is fine-tuned utilizing a range of
parameters and hyperparameters.
The author provided a new technique using the RF DDoS RF ACC: 99.53
classifier to get over the attacks. This method
[161] P: 0.99
utilizes EL to combine many DRs in order to
generate precise and efficient forecasts for the quick F1: 0.98
identification of hazards in IoT networks.
AUC: 0.99
[162] The author designs a model using ensemble Anomaly AE, GAN AE = ACC: 97.96
approaches on the KDD Cup 99 dataset after doing a
P: 90.68
survey of the literature on the most recent studies
utilizing deep learning techniques. GAN =
ACC: 90.26
P: 91.27
[163] Provided an IDS defensive system that applies DoS DT, RF, ACC, P, R, F1 are
anomaly disclosure and ML to enhance the security SVM, 0.99 for all
of IoT networks against DoS assault. They also used KNN classifiers
two several features selection algorithms, the GA
and the Correlation-based Feature Selection (CFS)
algorithm, and evaluated how well they performed.
This paper provided a novel ID method IoT devices Blackhole DNN ACC: 93.74
based on DL. To identify malicious traffic that could
[164] Sinkhole P: 93.73
start an assault on linked IoT gadgets, this intelligent
system employs a four-layer deep Fully linked (FC) Workhole R: 93.82
network architecture. Based on the experimental
DDoS F1: 93.47
performance analysis, the suggested system
demonstrated reliable performance for both DT: 93.21
simulated and real invasions.
Table 13 Attack Detection in Application Layer
Reference Contribution Attacks Algorithm Results
[165] This research, which focused on communication and Jamming Stack ACC: 99.5
environmental dynamics in industrial settings, LSTM
P: 99.4
proposed a novel method for detecting jamming in
IoT. It focused on gathering QoS, and connection R: 99.26
parameters during normal communication and
Spoofing Attack [172] Hackers pose as authorized users or devices in order to distribute malware, steal data,
and get around access control measures.
Reverse Engineering A person-to-person assault where the criminal makes direct connect with the target in
[173] an attempt to get them to furnish crucial information.
Physical damage [174] Carrey out in a situation where the hacker is approaching the device. A malicious user
has the capability to take control a computer or communication system, harm property,
and jeopardize lives.
RFID Cloning [175] Signifies the process of duplicating the data from an RFID electronic tag or intelligent
card to a cloned tag that will resemble the original tag and possibly replace it.
RF interface [176] Target devices that employ radio, Wi-Fi, Bluetooth, and Bluetooth Low Energy (BLE)
as communication means
Code and malicious Malicious software, sometimes known as malware, that has the ability to rapidly or
[177] gradually damage client PCs, databases, networks, and even server clusters.
Injection attacks [178] A malicious code injected into the network which retrieves all the data from the
database to the hacker.
DNS Spoofing and Attackers may spoof DNS responses or launch phishing attacks aiming IoT applications
Phishing. [179] to disclosing private data including login passwords or bank account information
SQL injection [180] SQL injection assault exploit weaknesses in IoT apps that store and retrieve data from
databases. Attackers can extract sensitive data, alter database contents, or run
unauthorized instructions on the underlying database server by adding malicious SQL
queries in the input fields or API parameters.
XSS [181] Cross-site Scripting attacks penetrate websites visited by other users with malicious
scripts, aimed targeting web-based Internet of Things applications. Attackers can alter
web interfaces, take illicit actions on behalf of authorized users, and steal session
cookies by taking advantage of XSS vulnerabilities.
Software Tampering On IoT devices, hackers may tamper with the firmware or software to add backdoors,
[182] vulnerabilities, or malicious features. Firmware-altering assaults pose a vital risk to the
security, reliability, and integrity of IoT gadgets by allowing data to be exfiltrated,
causing malfunctions or unauthorized access.
Sybil attacks [183] A group of nodes that broadcast fake data from a random network by pretending to be
several peer identities in order to compromise an IoT ecosystem.
API abused Assailants misuse Application Programming Interfaces (APIs) made available by IoT
applications to carry out illicit operations, obtain private information, or alter device
settings. Attacks using API abuse can take advantage of poorly constructed APIs, weak
access restrictions, or insufficient input validation systems.
Manipulation of data In order to trick consumers, set off false alarms, or bring about disruptive events,
attackers alter or corrupt data that is transferred between IoT gadgets and applications.
Attacks that modify data might jeopardize the integrity and reliability of IoT systems,
resulting in incorrect judgments or actions taken in responding to misrepresented data.
Routing attacks [184] Routing attacks aim to modify or interfere with device-to-device communication by
targeting the routing protocols and techniques utilized in IoT networks. Attackers
might, for instance, create routing loops, reroute traffic to hostile nodes, or insert
erroneous routing information, all of which could cause network congestion or
fragmentation.
Ransomware [185] Attacks using ransomware encrypt or prevent users from accessing files, systems, or
devices and demand a ransom to be paid by the target in order to unlock the device.
Ransomware can harm an organization's brand in addition to causing large financial
losses and operational problems.
6. INTERNET OF THINGS SECURITY DATASETS statistics from Ubuntu 14 and 18. The dataset was gathered
from a large-scale, realistic network at the Australian Defense
In this paper, we discuss the datasets commonly used to
Force Academy (ADFA), School of Engineering and
construct IoT security models. We focus on the typical and
Information Technology (SEIT), UNSW Canberra, and the
popular datasets that help researchers gain insights into the
IoT Lab of UNSW Canberra Cyber [188].
types of datasets they will use to develop models for
identifying IoT attacks. Additionally, we discuss the pros and 6.4. IoT-23 Dataset
cons of each dataset, along with research papers that have
The dataset comprises network traffic data from 23 distinct
utilized these datasets.
IoT gadgets across different categories, addressing various
6.1. BoT-IoT Dataset IoT applications such as industrial control systems, wearable
technologies, intelligent home devices, and healthcare
This dataset is an extensive dataset for IoT botnet research,
equipment. The dataset includes traffic from devices like
containing both malicious and benign traffic gathered from
fitness trackers, IP cameras, smart doorbells, smart
various IoT gadgets. It simulates real-world IoT network
thermostats, and industrial sensors. The IoT-23 dataset aims
conditions by incorporating traffic data from multiple IoT
to support IoT security research and development, particularly
devices. The dataset contains five distinct attack scenarios,
in traffic analysis, anomaly detection, and IDS. Researchers
each with several assault variations, and was created at
can employ this dataset to evaluate the effectiveness of
UNSW Canberra's Cyber Range Lab. The source files are
security algorithms and processes in protecting IoT networks
available in multiple formats, including CSV files, Argus
and devices [189].
files, and original pcap files. The dataset includes attacks such
as DDoS, DoS, OS and service scanning, keylogging, and 6.5. MQTT-IoT-IDS2020 Dataset
data exfiltration, with DDoS and DoS assaults further
In machine-to-machine (IoT) communication, one of the most
classified based on the protocol used [186]. The dataset serves
utilized protocols is the Message Queuing Telemetry
as a reference for assessing the performance of ML and IDS
Transport (MQTT) protocol. It is the initial dataset that
IDS in identifying IoT botnet activity.
mimics a network based on MQTT. 12 sensors, a broker, a
6.2. UNSW-NB15 Dataset phony camera, and an assailant make up the network. A
dataset concentrates on IoT security, specifically to
The dataset an extensively used network traffic dataset for
identifying security risks in IoT networks utilizing the MQTT
assessing IDS. The UNSW-NB 15 dataset was generated in
protocol. IDS for IoT networks can be trained and assessed
the UNSW Canberra Cyber Range Lab using the IXIA
using the labeled data in the dataset, which includes both
PerfectStorm tool to build a blend of real-world modern-day
normal and assault traffic [190].
activities and artificial modern-day assault behaviors. It
comprises of about two million records totaling 49 features 6.6. CICIDS 2017 dataset
that were obtained with the aid of Argustools, Bro-IDS, and a
This dataset is a labeled network traffic dataset collected in a
few specially developed algorithms. The labeled dataset
controlled environment. It was generated as a result of
UNSW-NB15 includes network traffic information gathered
research conducted by the Canadian Institute for
under controlled environment [187].
Cybersecurity (CIC). The dataset's primary objective is to
6.3. ToN-IoT Dataset promote cybersecurity research and development, especially
in IDS.
This dataset is one of the recent IoT and IIoT datasets,
designed to assess the accuracy and effectiveness of various It provides a standard benchmark for assessing the
AI-based cybersecurity technologies. It includes data from effectiveness of IDS methods and algorithms. The dataset
IoT and IIoT sensor telemetry datasets, Windows 7 and 10 captures network protocol traffic, such as TCP, UDP, and
operating system datasets, and TLS and network traffic ICMP, along with traffic from various services and
applications, offering a broad range of network behaviors for 6.9. NetFlow ToN-IoT dataset
analysis [191].
The NF-ToN-IoT v1 dataset was created utilizing the publicly
6.7. CTU-13 Dataset accessible pcap files from the ToN-IoT dataset to generate its
NetFlow records. This resulted in the NF-ToN-IoT NetFlow-
The CTU-13 dataset, generated by the Czech Technical
based IoT network dataset. Of the total 1,379,274 data flows,
University (CTU) in Prague, is a popular benchmark dataset
270,279 (19.6%) are benign samples, and 1,108,995 (80.4%)
in cybersecurity, particularly for NIDS. It contains of labeled
are attack samples. The NF-ToN-IoT v2 dataset was similarly
network traffic data generated in a lab setting that simulates
produced utilizing publicly available pcap files, resulting in
various types of cyberattacks [192].
16,940,496 total data flows, of which 10,841,027 (63.99%)
6.8. NetFlow BoT-IoT Dataset are assault samples, and 6,099,469 (36.01%) are benign. Both
NetFlow datasets, NF-BoT-IoT v1 and v2, as well as NF-
The BoT-IoT dataset was employed to build the NF-BoT-IoT
ToN-IoT v1 and v2, were created by Mohanad Sarhan [193].
v1 dataset, an IoT NetFlow-based dataset. The features were
extracted from publicly available pcap data, and the flows 6.10. N-BaIoT Dataset
were labeled with the appropriate attack types. There are
The dataset tackled the scarcity of botnet databases,
600,100 data flows in total, of which 13,859 (2.31%) are
particularly in the IoT domain. It contains authentic traffic
benign, and 586,241 (97.69%) are assault samples.
data collected from nine commercial IoT gadgets confirmed
The dataset includes four distinct assault categories. The to be infected with the BASHLITE and Mirai botnets [194].
distribution of all flows in this dataset is demonstrated in the Furthermore, there are several other IoT datasets that are less
table below [193]. The dataset has two versions: version one prevalent. For more details and further knowledge, refer to
(discussed here) and version two, which also uses features [195].
extracted from pcap data and labeled flows.
In this section, we provided an overview of key IoT security
In version two, out of 37,763,497 total data flows, 37,628,460 datasets, along with references to assist researchers in easily
(99.64%) are assault samples, and 135,037 (0.36%) are locating them. Each dataset has its advantages and
benign. The dataset contains four distinct assault categories. disadvantages, which we will outline in Table 16.
Additionally, Table 17 presents some studies [196-205] that
have utilized these IoT security datasets.
Table 16 IoT Datasets
Dataset Attack type Advantages Disadvantages
BoT-IoT DDoS, DoS, OS and• Real-Word Network Traffic. • Imbalanced Dataset.
Service Scan,
Keylogging and Data• Include a wide variety of IoT • Accurately labeling network
exfiltration attacks. gadgets and assault scenarios. traffic data can be
challenging.
• Labeled Data.
• Has overfitting.
• New generated Features.
• Privacy Issues.
• Accessibility Dataset.
UNSW-NB15 Fuzzers, Analysis, • Realistic Dataset. • Developed with a synthetic
Backdoors, DoS, environment for producing
Exploits, Generic, • Offers CSV files and network assault activities.
Reconnaissance, traffic (PCAP).
• Imbalanced Dataset.
Shellcode and • Labeled Dataset.
Worms. • Deficiency of update.
• Diversity Dataset.
• A collection of a wide array of
features derived from network
traffic.
ToN-IoT DoS, DDoS and Include heterogeneous data Launched exclusively on IIoT
Ransomware. sources. Network computer systems,
IoT gateways, and web
Realistic traffic.
applications.
Cove various attacks.
Restricted acceptance and
validation in the field of
cybersecurity research
IoT-23 Malware An extensive dataset. Imbalanced dataset.
Labeled dataset. Limited to attacks type.
Contain various of protocol which Contain Biases which leads to
assist researchers to evaluate influence the outcomes.
various IoT device and protocol
Contain sensitive information
interactions and vulnerabilities.
due to its real-word dataset.
Benefield for security research.
MQTT-IoT- SSH-Brute Force, Real word traffic data. Dependence on particular
IDS2020 MQTT brute-force protocol.
Includes an extensive amount of
attack, aggressive
network traffic data Captures of static network
scan, UDP Scan
traffic.
Contain divers type of attacks.
CICIDS 2017 Brute Force FTP, Real-word traffic network. Imbalanced dataset.
Brute Force SSH,
Labeled dataset. Contain limited attacks.
DoS, Heartbleed,
Web Attack, Accessibility dataset. Need preprocessing for
Infiltration, Botnet optimization which cause
and DDoS. computational cost.
CTU-13 Botnet, Malware Real-word dataset. Limited attack type.
Scalability dataset.
Labeled dataset.
NF- BoT-IoT v1, Benign, DoS, DDoS, Real-word data traffic. Data quality issues.
v2 theft, Reconnaissance
Applied to disclose Botnet attacks. Analytical complexity.
Used in IoT security researches. Imbalanced dataset.
Contain noise.
NF-ToN-IoT v1, Benign, Backdoor, Real-word dataset. Contain Biases.
v2 MiTM, Password,
Contain data different IoT Contain Noise.
XSS, Scanning, DoS,
devices.
DDoS, Injection, Focus only on traffic data.
Ransomware. Used in Anomaly detection.
Imbalanced Dataset.
N-BaIoT Mirai, Bashlite Real Data collected for 9 IoT Imbalanced and Biases
devices. dataset.
Used in Anomaly detection. Limited volume of network
traffic.
7. CHALLENGES, FUTURE TREND, AND security. This section presents the challenges linked to ML
DISCUSSION and DL in relation to IoT security. Furthermore, it provides a
discussion on the roles, future trend, and the limitations of
ML and DL are essential components in ensuring the security
ML and DL methods.
of IoT systems; however, they face diverse challenges in IoT
7.2.5. Privacy of Data enhance that with the primary objective for each attack. An
existing survey related to IoT security has been presented.
DL models require large amounts of data, which often include
ML/DL methods have been discussed with the strength and
sensitive information from IoT devices. Ensuring data privacy
weakness of each. Furthermore, we discuss the previous
while collecting enough data to train effective models is a key
studies with respect of them. analyzing and classifying of the
challenge.
existing researches between 2018 up to this date have been
Addressing DL challenges requires innovative algorithms discussed. After that, we present the taxonomy of IoT layer
development, optimization strategies, and system-level design attacks and discussed each attack type in detail, providing
tailored to IoT security applications. Collaboration between recent studies that propose solutions using ML/DL methods to
deep learning and IoT security researchers is necessary to address these attacks. Additionally, we summarized the
create solutions that balance security, performance, and datasets related to IoT security, highlighting their advantages
resource constraints. and disadvantages, as well as current research that has applied
these datasets. We also discussed the challenges, and the
7.3. Discussion
future trends related to ML/DL in the context of IoT security.
DL, and ML mitigate some of these limitations by The purpose of this survey is to provide a helpful guide for
automatically extracting complex features from large, academic researchers, offering comprehensive knowledge of
unsupervised IoT datasets, making it particularly effective at
IoT, IoT security, DL/ML techniques, and common IoT
identifying advanced security threats. In IoT security, DL has
attacks at various network layers. By outlining the challenges
been used to detect attacks and network anomalies by
faced by ML and DL in this domain, we aim to equip
analyzing real-time data from smart home systems and other researchers with a clear understanding, enabling them to
interconnected appliances. select the most appropriate techniques for disclosing and
However, despite the potential of ML and DL, challenges mitigating IoT attacks.
remain, including scalability, energy efficiency, and accuracy.
REFERENCES
Over classification and misclassification can lead to
significant errors in attack detection, resulting in false [1] M. A. Al-Garadi et al., “A survey of machine and deep learning
methods for internet of things (IOT) security,” IEEE Communications
positives and negatives. Future trends aim to enhance model Surveys & Tutorials, vol. 22, no. 3, pp. 1646–1685, 2020.
robustness through techniques like adversarial learning and doi:10.1109/comst.2020.2988293.
self-learning systems that adapt to emerging threats in real [2] F. Hussain, R. Hussain, S. A. Hassan, and E. Hossain, “Machine
time. learning in IOT security: Current solutions and future challenges,”
IEEE Communications Surveys & Tutorials, vol. 22, no. 3, pp.
The development of energy-efficient algorithms and federated 1686–1721, 2020. doi:10.1109/comst.2020.2986444.
learning will improve privacy and reliability for resource- [3] A. Thakkar and R. Lohiya, “A review on machine learning and Deep
Learning Perspectives of ids for IOT: Recent updates, security issues,
constrained IoT gadgets. Further research is required to tackle and challenges,” Archives of Computational Methods in Engineering,
these constrains fully and improve the accuracy of assaults vol. 28, no. 4, pp. 3211–3243, Oct. 2020. doi:10.1007/s11831-020-
disclosure in IoT security systems. 09496-0.
[4] U. Farooq, N. Tariq, M. Asim, T. Baker, and A. Al-Shamma’a,
8. CONCLUSION “Machine learning and the internet of things security: Solutions and
open challenges,” Journal of Parallel and Distributed Computing, vol.
IoT is increasingly integrated into our everyday lives because 162, pp. 89–104, Apr. 2022. doi:10.1016/j.jpdc.2022.01.015.
of the growth of the internet and the vast number of gadgets [5] V. Gugueoth, S. Safavat, and S. Shetty, “Security of internet of things
(IOT) using Federated Learning and deep learning — recent
linked to it. Because IoT networks are dynamic, securing advancements, issues and prospects,” ICT Express, vol. 9, no. 5, pp.
them can be challenging and presents a number of issues for 941–960, Oct. 2023. doi:10.1016/j.icte.2023.03.006.
standard security solutions. Securing IoT is complex and [6] B. Patel, J. Vasa, and P. Shah, “IOT concepts, characteristics, enabling
traditional security solutions face a several of challenges due technologies, applications and protocol stack: Issues and Imperatives,”
International Journal of Wireless and Mobile Computing, vol. 25, no.
to the nature and the characteristics of IoT networks. ML and 4, pp. 397–406, 2023. doi:10.1504/ijwmc.2023.135404.
DL have facilitated the enhancement of a several of [7] X. Liang and Y. Kim, “A survey on security attacks and solutions in
sophisticated analytical approaches that may be utilized to the IOT Network,” 2021 IEEE 11th Annual Computing and
enhance IoT security. Moreover, ML techniques can address Communication Workshop and Conference (CCWC), Jan. 2021.
doi:10.1109/ccwc51732.2021.9376174.
IoT security issues and challenges caused by the risk of
[8] H. Mrabet, S. Belguith, A. Alhomoud, and A. Jemai, “A survey of IOT
attacks and affected by leaving holed. In this survey, the security based on a layered architecture of sensing and data analysis,”
characteristics, IoT architecture, protocols, and IoT Sensors, vol. 20, no. 13, p. 3625, Jun. 2020. doi:10.3390/s20133625.
vulnerabilities of IoT systems are highlighted. we discuss IoT [9] N. Verma, S. Singh, and D. Prasad, “A review on existing IOT
architecture and communication protocols used in Healthcare
applications and present a table that summarizes the pros and Monitoring System,” Journal of The Institution of Engineers (India):
cons of each application. Then, we discuss the potential IoT Series B, vol. 103, no. 1, pp. 245–257, Jun. 2021. doi:10.1007/s40031-
attacks in term of passive attack and active attacks and 021-00632-3.
Haifa Ali Saeed Ali, Vakula Rani J, “Machine Learning for Internet of Things (IoT) Security: A Comprehensive Survey”,
International Journal of Computer Networks and Applications (IJCNA), 11(5), PP: 617-659, 2024, DOI:
10.22247/ijcna/2024/40.