0% found this document useful (0 votes)
23 views

Reference Paper - Page 85

Uploaded by

jyothisaieshwar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

Reference Paper - Page 85

Uploaded by

jyothisaieshwar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Digital Communications and Networks 9 (2023) 1486–1515

Contents lists available at ScienceDirect

Digital Communications and Networks


journal homepage: www.keaipublishing.com/dcan

IoT data analytic algorithms on edge-cloud infrastructure: A review


Abel E. Edje a, b, *, M.S. Abd Latiff a, Weng Howe Chan a, c
a
School of Computing, Faculty of Engineering, Universiti Teknologi Malaysia, 81310, Skudai, Johor, Malaysia
b
Department of Computer Science, Delta State University, Abraka, P.M.B 01 Abraka, Delta State, Nigeria
c
UTM Big Data Centre, Ibnu Sina Institute of Scientific and Industrial Research, Universiti Teknologi Malaysia, 81310, Skudai, Johor, Malaysia

A R T I C L E I N F O A B S T R A C T

Keywords: The adoption of Internet of Things (IoT) sensing devices is growing rapidly due to their ability to provide real-
Internet of things time services. However, it is constrained by limited data storage and processing power. It offloads its massive
Cloud platform data stream to edge devices and the cloud for adequate storage and processing. This further leads to the chal­
Edge
lenges of data outliers, data redundancies, and cloud resource load balancing that would affect the execution and
Analytic algorithms
Processes
outcome of data streams. This paper presents a review of existing analytics algorithms deployed on IoT-enabled
Network communication protocols edge cloud infrastructure that resolved the challenges of data outliers, data redundancies, and cloud resource
load balancing. The review highlights the problems solved, the results, the weaknesses of the existing algorithms,
and the physical and virtual cloud storage servers for resource load balancing. In addition, it discusses the
adoption of network protocols that govern the interaction between the three-layer architecture of IoT sensing
devices enabled edge cloud and its prevailing challenges. A total of 72 algorithms covering the categories of
classification, regression, clustering, deep learning, and optimization have been reviewed. The classification
approach has been widely adopted to solve the problem of redundant data, while clustering and optimization
approaches are more used for outlier detection and cloud resource allocation.

1. Introduction of data. However, it is challenged by issues such as latency distance and


bandwidth, which are highly required to process real-time data streams
The Internet of Things (IoT)-enabled edge cloud is an emerging retrieved from IoT sensing devices. Consequently, the edge device(s) are
ubiquitous network infrastructure that provides various distributed designed to address these challenges, which are made up of clusters of
services in every aspect of human life. Smart devices such as sensors, interconnected physical servers located in close proximity to the IoT
microcontrollers, mobile phones, local servers, and the cloud can sensor devices.
interact with each other to perform tasks and share information. As the Cloud resource allocation for the processing of IoT sensory data
popularity and extensive use of IoT-enabled edge cloud increases over streams tends to improve the efficiency and data quality through the use
the years, more sensor data will be generated, and various IoT-enabled of various algorithms (e.g., supervised and unsupervised machine
edge cloud applications will be implemented to provide quality services learning, optimization, and deep learning). This has further simulated
to end-users regardless of their geographical location. the rapid adoption of data-driven analytics and cloud resource alloca­
IoT sensor devices are typically used to capture events that are sents tion algorithms to solve problems of data outliers, redundancies, and
to other connected devices and systems over Internet and other resource load balancing in IoT-enabled edge cloud infrastructure [1]. An
communication networks. IoT sensors are characterized by the genera­ anomaly or outlier is a data instance that is significantly different from
tion of dynamic, heterogeneous, inaccurate, and weakly semantic data the rest of the instances, as if it was retrieved from a different source. On
streams over time. However, the massive data streams cannot be pro­ the other hand, redundancy refers to duplicate or repeated sensed data
cessed due to limited storage and computational resources. Therefore, or events captured over time. Such data is not considered useful and can
the generated data streams are offloaded to the edge device(s) or cloud negatively impact an application’ performance and consume massive
data center for further processing and analysis. The cloud data center resources (such as storage, memory, and compute). Load balancing en­
provides massive storage and processing power to handle large amount sures that the workload of IoT application requests (e.g., data analysis or

* Corresponding author. Department of Computer Science, Delta State University, Abraka, P.M.B 01 Abraka, Delta State, Nigeria.
E-mail addresses: [email protected] (A.E. Edje), [email protected] (M.S. Abd Latiff), [email protected] (W.H. Chan).

https://doi.org/10.1016/j.dcan.2023.10.002
Received 29 December 2019; Received in revised form 10 September 2023; Accepted 6 October 2023
Available online 13 October 2023
2352-8648/© 2023 Chongqing University of Posts and Telecommunications. Production and hosting by Elsevier B.V. on behalf of KeAi Communications Co. Ltd.
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
A.E. Edje et al. Digital Communications and Networks 9 (2023) 1486–1515

filtering processes) is evenly distributed across available cloud resources is called a smart sensor, which is with electronics that can perform
to achieve efficient execution, with minimum resource utilization, and multiple logic functions, two-way communication, make decisions, store
completion time. sensed information for future analysis, or offload it directly to the
Conversely, to the best of our knowledge, most of the existing liter­ Internet [5]. Therefore, the limitations of WSNs which include input
ature ’reviews in this field, are yet to fully explore the use of data-driven offset and span variation, cross sensitivity, and nonlinearity are auto­
analytic-enabled cloud resource allocation algorithms for the execution matically corrected by the smart sensor processor. IoT sensing devices
of sensory data streams on IoT-enabled edge cloud Infrastructure. generate massive data that is dynamic and heterogeneous. In addition,
Therefore, it is crucial to investigate the existing data-driven analytic- the rapid rate at which unstructured and semi-structured data is being
enabled cloud resource allocation algorithms that are used to address generated is a common problem. There are four main characteristics of
the challenges of data outliers, data redundancy, and cloud resource IoT sensed data namely multi-source high heterogeneity, sensing data
load balancing on IoT-enabled edge cloud infrastructure. The contri­ inaccuracy, and weak semantic data with low-level and enormous data
bution of this work is as follows. dynamicity. Sensing data inaccuracy refers to the information collected
from IoT sensing devices, due to several limitations such as unreliable
1) A detailed analysis of the algorithms to resolve outlier and redun­ reading, which leads to data outliers. This brings about the complexity of
dancy issues in sensory data, highlighting the strengths and weak­ using the sensed data directly for its purpose. Therefore, appropriate
nesses of each algorithm in tabular form. multi-dimensional and data processing techniques need to be adopted
2) A detailed analysis of cloud resource allocation algorithms to address for accurate retrieval of sensed data.
resource load balancing challenges, for optimal execution of outlier Enormous data dynamics arise from interconnected multi-sensors,
and redundant sensory data. embedded in a large-scale environment. Communications between the
3) The identification and discussion of the various algorithms to various sensors always results in a large volume of data generated in real
perform their respective functions. Also, to compare their level of time, resulting in duplicate (redundant) data. Weak semantic data with a
usage in the IoT-edge cloud infrastructure. low level is attributed to the sensed data obtained from IoT sensing
4) We also highlight and discuss the various network communication devices. This is due to the spatial-temporal correlation relationships of
protocols that govern the interaction between the three-tiered IoT- the sensed data. Therefore, the extraction of useful information from the
enabled edged cloud architecture described in previous research. massive data generated is needs to be performed in an event-driven
5) Detailed current and potential challenges that pave the way for perspective. The acquisition of sensed data from distributed sensor
future research directions in this field are discussed in this paper. nodes varies from character to integer, video and audio streaming.
The provision of computational resources to store process sensed
The remainder of this paper is structured as follows. Section 2 in­ data, filtering or analysis cannot be handled by IoT sensor devices. This
troduces the background information about IoT-enabled edge cloud is due to the characteristics of sensed data, and the limited storage and
computing and the characteristics of IoT sensing data. Section 3 dis­ computation resources of IoT sensor devices. However, the cloud plat­
cusses the research methodology used to update the current research form has been used in recent years to address these limitations. Its large
survey understudy. Section 4 discusses the existing literature surveys in pool of data storage resources and high computation power on complex
this field. Section 5 presents the analysis of existing algorithms deployed tasks leverages the limitations of the IoT sensor devices. The idea of
for resolving data outliers, data redundancy, and load balancing-related cloud computing was initiated in 1951 when John Macarthy envisioned
issues in IoT-enabled edge cloud infrastructure. Section 6 discusses the the importance of time-shared computers, to share hardware and soft­
processes adopted by existing algorithms and network communication ware resources among multiple end-users with real time multi-tasking
protocols that govern the interaction between the three-layer architec­ and programming.
ture of the IoT-enabled edge cloud infrastructure. Section 7 presents the Madhavaiah and Irfan [6], defined cloud computing as a
current challenges that pave the way for future research directions. technology-based business model, delivered as a service over the
Finally, Section 8 presents a general discussion based on result of the Internet, where software and hardware computing services are accessed
research survey and ends with concluding remarks. virtually by end-users, based on-demand in a self-service perspective
irrespective of their geographical location. There are three services
2. Background offered by cloud platforms namely, Software as a Service (SaaS), Plat­
form as a Service (PaaS), and Infrastructure as a Service (IaaS). SaaS is a
IoT technology has come a long way in recent years. The concept of web-based interface that allows end-users to access to cloud software
IoT was introduced by Kevin Ashton in 1999 and has been widely Auto- applications; PaaS enables developers to have access to various devel­
ID Center. IoT is a worldwide network of interconnected devices opment tools for the implementation of software applications on its
addressable with standard communication protocols, with the Internet platform. On the other hand, IaaS provides storage and computation
as the convergence point [2,3]. Radio Frequency Identification (RFID) processing power. These services can be accessed from Cloud Service
and Wireless Sensor Networks (WSNs) are the most widely used IoT Providers (CSPs) such as Google, IBM Salesforce, Amazon, and
sensing devices since their inception. RFID is composed tag and reader Microsoft.
used to identify and track an object anywhere and anytime. It is used in Currently, the Google cloud platform provides intelligent IoT ser­
the courier and logistic transportation industry to track goods in transit. vices that enable end-users to connect their physical IoT sensing devices
The WSNs consist of multiple sensor nodes deployed for environmental to the platform and process, analyze and store the sensed data. The
monitoring. WSNs communicate cooperatively and forward aggregated platform consists of fully managed cloud services and scalability, an
data to the network sink node or control system for further processing integrated software stack for on-premises computing, and machine
[4]. Both sensing devices can be integrated for better sensing and learning approaches for all IoT needs. Additionally, IBM launched its
tracking of objects by collecting information such as object locations, IBM IoT Connection Service in 2016 to formalize the use of IBM IoT for
movement, and temperature. connected offerings on the cloud, which ingests and transforms sensory
Over the past decade, IoT sensor devices have experienced tremen­ data obtained from sensors into meaningful insights. It also integrates
dous advances in development. Currently, IoT sensor devices pre- the existing functionalities of IoT for electronic solutions available on its
process, store, and transmit sensed data directly to the internet IBM Bluemix (an open standard for developing, managing, and running
without any human intervention. Unlike WSN, IoT sensor devices do not multiple applications) cloud platform with additional data storage, se­
communicate with each other or inter-networked to transmit their curity, and monitoring functions. Microsoft introduces IoT services on
sensed data to a connected sink node. This emerging IoT sensing device its Cloud Azure platform namely, Azure IoT Hub and Azure IoT central.

1487
A.E. Edje et al. Digital Communications and Networks 9 (2023) 1486–1515

IoT Hub is an open cloud platform that enables end-users to securely network load by conserving bandwidth transmission rate. Table 1 shows
connect, monitor, and manage numerous devices to implement IoT ap­ the characteristics of the IoT sensing devices, edge, and the conventional
plications. Azure IoT Central is an IoT SaaS solution that makes it cloud data center, while
explicit for end-users to connect, monitor, and manage the physical Fig. 1 shows the three-layer physical architecture of the investigated
features of the IoT sensing devices. Even though the cloud may offer IoT-edge cloud infrastructure.
virtually unlimited resource storage and computational processing
power to leverage the limitations of IoT devices, the long-distance 3. Research methodology
network communication between them is a problem that needs a solu­
tion. In other words, the long-distance communication between both The research survey understudy was conducted with the support of
technologies due to bandwidth availability may hinder the prospect of the methodology utilized by Kitchenham and Charters [9]. The explo­
their integration if not curtailed. ration of the literature contributions, covering the years 2011–2019 was
The Long-distance communication between them leads to latency obtained from the academic research databases, which were considered
and delay which can hinder timely responses in critical situations. For the most relevant to achieve the objectives of the current study. These
example, healthcare workers needs to constantly monitor patients in databases include IEEE Xplore, Google Scholar, Springer, Scopus, and
critical condition by equipping them with IoT sensing devices in their ScienceDirect. The search phrase (“internet of things data” OR “mining
respective homes through the Internet provided by the cloud application algorithm” OR “edge” and “storage resource provisioning” OR “IoT
layer. Another challenge is in the area of privacy and security. Owners of data” OR “cloud data center”) was used to retrieve articles relevant to
IoT sensing devices tend not to send their data to the cloud data storage the current study. However, the results of the search query returned
center because of the unknown storage location. Recently, edge/ numerous research articles that were not relevant to the study.
gateway computing has been introduced lately to address these chal­ The relevant articles not retrieved after the initial search were ex­
lenges. This distance is also responsible for the long delays that some­ pected to be present in the referenced list of these results and were
times exits between the clients’ IoT sensing devices and the traditional included in the analysis iteration. Research articles published only in
cloud [7,8]. English and contained in journals and conference proceedings were
Edge computing consists of clusters of servers that located close to considered. The initial result yielded a total of 502 retrieved articles.
the IoT sensing devices for timely response to service requests while Each article undergoes a series of quality assessment phases until it was
conserving bandwidth consumption rate and latency delay. On the other finally selected. These phases are composed of four sequences which are
hand, IoT sensing devices can offload their sensed data to the edge highlighted as follows;
servers when the load exceed their capabilities. The proximity between
edge and the IoT devices, provides an opportunity to control the latency • Evaluate the title and exclude it if it does not conform to algorithms
delay between the IoT devices and the traditional cloud. In addition, the used in the IoT-enabled edge-cloud platform (current study).
sensed data collected from IoT devices is stored and immediately pro­ • Read the abstract and discard it if it is not relevant to the current
cessed by the edge servers, with only a fraction of the data being sent to a study
cloud data center for long-term processing. This results in reduced • Read and evaluate the introduction and conclusion, reject if the
contribution is the same as other relevant articles.
• Analytically assess the research contribution quality and disqualify
Table 1 articles with low quality.
Comparative features of IoT sensing device(s), edge and cloud platform.
S/ Features IoT sensing Edge computing Cloud computing The considerations of articles accepted were considered based their
N devices degree of relevance to the current study. In addition, the writing quality,
1 Components Physical Clusters of Virtual resources soundness, clarity, and credibility of the contributions made by each of
devices servers the articles were considered.
2 Storage Minimum Limited Massive A total of 85 articles scale through the quality assessment, which are
capability
highly relevant to the current research question. These 84 articles
3 Data availability Source Process Process
4 Utilization of High, due to Minimum, due High bandwidth
further subjected to the process of extraction to retrieve the desired
Network continuous to the fact that consumption due information required to accomplish the objectives of the research study.
communication event sensed data is to the long The required information is highlighted below.
bandwidth rate sensing processed distance between
locally and the cloud and IoT
stored in edge devices
servers close to
the IoT sensing
devices
5 Computational Limited Limited Unlimited
resource power
6 Deployment Distributed Decentralized Centralized
7 Quality of service Continuous Faster, due to Slower, due to the
delivery in terms sensing location long distances
of timeliness proximity to IoT between IoTs and
devices cloud data-centers
8 Level of safety in Minimal risk Minimal risk on Long-distance
data transmission of data data attack communication
operations attack while while in transits. between IoT and
in transits. cloud pre-empts
attacks on data
while in transits
9 Resource and Not Edge servers Remote
service location applicable usually close to datacenters are
proximity for task IoT sensing usually far away
execution devices from IoT sensing
devices
Fig. 1. IoT-enabled edge cloud architectural design.

1488
A.E. Edje et al. Digital Communications and Networks 9 (2023) 1486–1515

• The algorithms used for outlier and redundant data detection. utilized in the Cloud of Things (CoT) platform. Firstly, the relevant
• Allocation of resources to execute IoT application requests, problem features of middleware are discussed followed by the presentation of
resolution, and outcome. various architecture and service domains. It also explores the types of
• Performance evaluation processes adopted by each algorithm. middleware that are appropriate for CoT-based platforms and outlines
• Strengths and weaknesses of each algorithm. future challenges and issues in the design of CoT middleware. Cui et al.
• The network communication protocols govern the transmission of [12] present an overview of the application of machine learning tech­
sensed data from IoT sensing devices to the edge and to the cloud niques in the IoT domain. It discusses the current advances in applying
storage server. machine learning techniques to IoT-related processes such as IoT device
• The number of physical and virtual machines used to process (s) identification, security, IoT edge computing infrastructure, traffic
application requests for IoT-sensed data (based on detection of profiling, and network management. Also, research challenges and open
outlier and redundant data) on the edge enabled cloud Infrastructure issues of machine learning for IoT were extensively discussed.
as a Service (IaaS) platform. An overview of several machine learning algorithms that tend to
solve the challenges of IoT sensor data is presented in the research work
A total of 72 desired candidate articles emerged from the extraction of Mahdavinejad et al. [13]. It focused on the taxonomy of machine
process to be used in the current research under study. Fig. 2 presents a learning algorithms, describing how they have been used on IoT datasets
summary of the bibliometric data that includes 5 conferences and 67 to retrieve some relevant level of information. It also discussed the
Journal articles, for a total of 72 studies of the selected articles. It also prospects and challenges of the algorithms for IoT data analysis, paving
shows that the number of studies increased over the years. Therefore, it the way for the application of a Support Vector Machine (SVM) to
shows the novelty and increasing interest in using algorithms for IoT Aarhus smart city traffic data as a use case for a more detailed investi­
data filtering/analysis based on the detection of outliers and redundant gation. Cai et al. [14] presented the recent achievements in the man­
sensor data. In addition, the resource allocation algorithm is used to agement, processing, and extraction of IoT big data by utilizing several
provide optimal computation and storage resources for the execution of existing algorithms. Thus, the algorithms are defined and described in
sensory data filtering/analytic application requests on IoT-enabled edge terms of their significant features and capabilities, and the current
cloud computing infrastructure. The remaining 13 articles are consid­ challenges and opportunities associated with IoT big data are analyzed.
ered suitable for use as related research works in this field, which is Also, some typical examples and open issues in the application of al­
discussed in the next section of this paper. Finally, 72 articles are gorithms for data acquisition are discussed. A thorough investigation of
qualitatively analyzed to synthesize the findings. the use of mining algorithms in the management of IoT big data by
Shadroo et al. [15]. It further identifies and discusses the architecture,
4. Related work framework, and applications of IoT big data. It also briefly discusses the
algorithms used for the processing of IoT data in three categories which
This section presents a brief description of previous literature review include descriptive, predictive, and classification.
in this research field which motivated the current research study. The In [16], a Novel Concentric Computing Model (CCM) is investigated
research survey conducted by Qiu et al. [10] is based on conventional for the use of IoT big data analytics applications. It discusses the sensing
and the latest machine learning algorithms for the processing and systems, and outer/inner gateway processors that make up CCM. In
managing IoT big data. It discusses relevant machine learning algo­ addition, it highlights current research work related to the IoT model for
rithms in recent research such as the representation of learning, deep big data analytic techniques. It also describes the current challenges that
learning, distributed and parallel learning as well as active learning, and need to be addressed for the deployment of CMM in the Internet of
kernel learning. The challenges and possible solutions to machine Things environments. Thus, various future research directions are pre­
learning algorithms for the processing of IoT data are also analyzed. sented such as dispatching of significant data, real-time fusion of
Subsequently, the relationship between machine learning techniques streaming data, and data integration. Sharma and Wang [1] investigated
and signal processing techniques used in the processing of IoT big data is the enablers for live data analytics in wireless IoT networks and storage
highlighted and various open issues and research trends are outlined. provisioning by edge-enabled cloud computing environments. The
Farahzadi et al. [11] studied the middleware technologies that are framework for systematic processing between the cloud and the edge
device(s) is discussed. It also highlights the networks and the available
information in the cloud data center to support the edge computing units
to meet various performance requirements of the wireless IoT networks.
The key enablers in data analytics, such as NoSQL database and
distributed file systems, to handle the unstructured IoT big data in the
edge cloud are also discussed. In addition, machine learning techniques
are used to extract relevant data. Related challenges and selected future
research directions for researchers are also highlighted.
Recent advances in massive data analytics for IoT systems and the
potential requirements for managing big data, as well as enabling ana­
lytic techniques enablers in IoT platforms [17]. Requirements such as
IoT connectivity, storage capabilities, quality of service, and real-time
services, and real-time analytics are discussed in detail. It explains the
role of data analytics in IoT applications such as smart health, smart
grid, and smart transportation as well as presents various open chal­
lenges as future research directions. Ge et al. [18] investigated big data
technologies in several IoT domains to improve knowledge sharing
across the IoT domains. It explained the similarities and differences
between big data technologies and the analytics techniques (e.g., clas­
sification, filtering, compression, extraction, indexing, prediction, and
storage) used in different IoT domains such as health, agriculture, and
Fig. 2. Survey of previous researches on data analytics algorithms for IoT- transportation to retrieve knowledge information. It further suggested
based edge cloud. how some big data technology deployed in a specific domain, can be

1489
A.E. Edje et al. Digital Communications and Networks 9 (2023) 1486–1515

re-used in different IoT domains. Also, a conceptual framework was Table 2


formulated to identify the critical big data technologies across all the IoT Comparison of previous research surveys.
domains reviewed. Author Article Title Contributions
The review of the task offloading scheme proposed for cloud
Qiu et al. [10] A survey of machine • Review of conventional and
computing, fog, and the IoT is discussed in Ref. [19]. It describes the learning for big data advanced machine learning
middleware technologies (e.g., cloudlet, mobile edge, Micro datacenter, processing methods for solving big data
and Nano datacenter) that facilitate the offloading in edge-IoT infra­ problems
structure. It also presents research opportunities in offloading data • Logical analysis of the
challenges and potential
streams in the fog and edge computing paradigm. Mohammadi et al. solutions for leaning big
[20] presented a comprehensive overview of using a class of ground­ data, based on the
breaking machine learning algorithms that can perform analytics and characteristics of big data
learning in the IoT domain. Detailed background information on various • Open questions and research
trends
Deep Learning (DL) algorithms is presented, and specific research efforts
Farahzadi et al. Middleware technologies • Identification and
that have used DL in the IoT domain are highlighted. The implementa­ [11] for cloud of things: a survey explanation of IoT-Cloud
tion approaches of DL on fog and cloud centers for the provisioning of middleware characteristics
IoT applications are also discussed. Fei et al. [21] studied several ma­ • Comparison of middleware
chine learning algorithms and how they are deployed on fog and cloud architectures.
• Middleware service domain
architectures for optimal processing and timely retrieval of data. In e.g. information sharing and
addition, the time complexity of the machine learning techniques used storage and communication
for IoT data stream analysis is highlighted. The challenges of deploying • Comparison of sample
machine learning algorithms in the fog and cloud are also discussed, middleware e.g. C-MOSDEN,
ThingsWorx and Carriots
paving the way for future research directions.
• Challenges and issues in the
Alam et al. [22] conducted a review on data fusion for IoT. It de­ Cloud of Things
scribes various mathematical techniques (e.g., probabilistic, artificial Cui et al. [12] A survey on application of • Description of possible
intelligence, and theory belief) use of IoT data analysis. It also detailed machine learning for supervised and unsupervised
the prospects and challenges of each mathematical technique adopted in Internet of Things machine learning for traffic
profiling
specific IoT environments (e.g. heterogeneous, distributed, object
• Identification of IoT devices
tracking, and nonlinear environments). In addition, future advances are (mobile phones and general
discussed, including emerging area (autonomous vehicles, futuristic IoT devices) using machine
applications, and infotainment systems and smart cities) that would learning
• Review on machine learning
benefit immensely from data fusion and IoT.
approaches for IoT system
The related research survey conducted by previous researchers security based device and
summarized in Table 2, motivated the current research survey in this network security
paper. Thus, we investigate the various algorithms used for IoT data • Summary of IoT applications
filtering/analytics based on outlier detection, redundant sensed data (e.g. health and industries)
developed using machine
elimination, and optimal load balancing of cloud resources allocated to
learning
execute the filtering/analytics-based IoT applications on edge-enabled • The use of machine learning
cloud infrastructure. This is because the previous reviews have yet to approach for IoT network
provide substantial contributions to the aforementioned research prob­ management and edge
computing design
lems in this research field. Also, the network communication protocols
• Challenges and open
that govern the interaction and data transmission within the IoT, edge, questions of the above
and cloud layers are considered in this paper. These are discussed in reviewed areas
detail in the following sections. Mahdavinejad Machine learning for • Describes the machine
et al. [13] Internet of things data learning algorithms used to
analysis: a survey process data collected from
5. Analysis of algorithms on IoT-edge cloud
IoT devices
• Review of eight types of
This section analyses the various analytic algorithms used for outlier machine learning techniques
and redundancy detection/elimination, as well as the allocation of re­ used for IoT data analytics
• Brief discussion of research
sources to fulfill application requests in the cloud. Application symbol­
trends and open questions
izes outlier detection, redundancy elimination, etc. We start with that of Cai et al. [14] IoT-based big data storage • Analysis of cloud-based IoT
outliers, followed by redundancy and resources provisioning as follows; systems in cloud application utility frame­
computing: perspectives work based on this data
5.1. Outlier detection algorithms and challenges acquisition and processing
• Discussion on the challenges
of IoT data acquisition and
An outlier is a piece of data that does not conform to the rest of the methods used for data
data or follow the expected trend [23]. It is an essential feature of data processing
mining, where the goal is to identify outliers or unusual data from a • Brief discussion on
application module
given data set. Outlier detection has been extensively studied in machine
optimization based on
learning and statistics, and it is also known as anomaly detection, nov­ architecture optimization,
elty detection, and deviation detection [24]. In particular, outlier data storage optimization
detection in IoT-enabled edge cloud computing has been an aspect of and data operation
great importance, as it becomes even more of interest due to the het­ optimization
Shadroo et al. Systematic survey of big • Review of tools used for IoT
erogeneity and dynamism of IoT sensor data. However, it has not been [15] data and data mining in and big data processing
given the necessary attention and consideration, in the existing litera­ Internet of Things
ture. The detection of outliers in sensory datasets is used for the removal (continued on next page)
of error data, the detection of faulty IoT sensing device(s), and detection

1490
A.E. Edje et al. Digital Communications and Networks 9 (2023) 1486–1515

Table 2 (continued ) Table 2 (continued )


Author Article Title Contributions Author Article Title Contributions

• Analysis of various into fog and cloud


techniques used for IoT architecture
device management/data Alam et al. [22] Data fusion and IoT for • Mathematical techniques
mining smart ubiquitous used for sensor data fusion
• Brief discussion on open environments: a survey • Review on special IoT
issues of IoT big data and environments such as
mining methods heterogeneous and
Rehman et al. Big data analytics in • Review of the applicability of distributed environments
[16] industrial IoT using a concentric computing model • Challenges of individual
concentric computing for big data analytics in IoT mathematical techniques
model • Brief summary of and IoT environments
communication and
performance goals that can
be achieved by adopting the of an event of interest [25]. These can only be achieved through the use
concentric computing model of analytic algorithms which are discussed in detail, highlighting the
in IoT
• Highlighting some potential
processes employed by the algorithms to solve the prevailing problems,
challenges and open issues performances, and weaknesses of the algorithms, edge devices, and
that may lead to future cloud IaaS resource(s) used to store and execute the algorithms, as
research directions indicated in Table 3. It also shows where the algorithms are deployed.
Ahmed et al. The role of big data • Review on the processing
For example, part of an algorithm may be implemented at the edge while
[17] analytics in Internet of and key requirements of big
Things data in IoT environment the other part is located in the cloud. On the other hand, the entire part
• Big data processing and of the algorithm can be stationed either in the edge or cloud.
analytics opportunities and An adaptive Compressive Sensing-based (CS) autoregressive recon­
the applicability of data struction algorithm is proposed for the sparsity of sensory data which
analytics in IoT applications
• Highlighting open research
varies in the temporal and spatial domain [26]. The autoregressive
challenges of big data method is responsible for reconstructing false data retrieved from faulty
processing in IoT that leads sensor nodes. This is realized by exploiting the varying local spatial
to future research directions similarity in the sensed data set with an estimated parameter. If the
Ge et al. [18] Big data for Internet of • Analyze the comparison of
sensed data exceeds the estimated parameter, it is classified as an
Things: a survey big data technologies in
different IoT domains anomaly or false data that needs to be reconstructed, otherwise the
• Recommend the type of big sensed data is classified as consistent data. The recovered data is then
data technology that can be evaluated to determine whether additional measurements are needed to
used in other IoT domains improve the reconstruction quality and whether the recovery process
• To shed more light on big
data for each IoT domain
meets the expected accuracy. Furthermore, a combinational method is
• Framework to assist introduced to predict and identify the sparsity, which is incorporated
practitioner and researchers into the CS to recover anomalous sensed data. Then, the recovered
to adopt big data abnormal data are classified into two groups namely, error and external
technologies that are
event by their identified patterns. The external event data is considered
commonly used in specific
IoT domain to reflect the actual activities in the environment and is preserved for
Aazam et al. [19] Offloading in fog Current technologies used for further processing. On the other hand, the error data represent the
computing for IoT: review, offloading in fog computing physically sensed data which are discarded and replaced with their
enabling technologies, and • Different requirements original normal readings.
research opportunities adopted by existing
middleware technologies for
However, the CS-based autoregressive reconstruction algorithm is
offloading tasks in fog not able to predict and identify an event that occurs a real-time event. It
computing has been solved with the support of a model-based Multilayer Percep­
• Challenges that still need to tron Classifier (MLPC) proposed by Stocker et al. [27]. The MLPC model
be addressed for optimal
is capable of obtaining knowledge that is represented in a semantic
performance of task
offloading database by abstracting sensed data from the physical sensor layer on a
Mohammadi Deep learning for IoT big • Leveraging deep learning in real-time basis. At the initial stage, a band-pass filter is applied to
et al. [20] data and streaming various IoT application pre-process the raw sensor data sample, after which the Multilayer
analytics: a survey domains Perceptron (MLP) neural network classifier is used to predict and classify
Current methods for applying
deep learning in a wide range
various abstractions and events. The result of the classification process is
of devices, from constrained to transferred to the semantic database. However, the MLPC model is prone
the fog and the cloud to a long non-automated learning process that requires domain experts
• Challenges and future to provide the model with sample data for the supervised learning
research directions for the
process. This issue has been addressed in the research work of Ganz et al.
integration of deep learning
and IoT applications [28], which introduces an approach that infers abstractions based on
Fei et al. [21] CPS data streams analytics • Machine learning methods pattern representations. The approach is called the Sensor Symbolic
based on machine learning used for IoT data processing Aggregation Approximation (SAX) algorithm, which is implemented to
for cloud and fog in cyber-physical system convert continuous sensor data into a compressed pattern representa­
computing: a survey applications
• Time complexity of
tion. Firstly, the sensed data is normalized to have a standard deviation
traditional machine learning of 1 and a mean of 0, to facilitate the comparison of data points from
strategies different sources and to limit the volume of the sensed data sample. The
• Requirements for integrating sampled data is divided into two equal-sized windows by calculating the
machine learning methods
mean value of each window, resulting in the data sample being reduced

1491
A.E. Edje et al. Digital Communications and Networks 9 (2023) 1486–1515

Table 3
Comparison of outlier detection techniques.
Article Title Algorithm Process Problem Outcome Weakness Edge Cloud Data Center
Resolve Device
Cloud No. of No. of
Storage Physical Virtual
Server Machine Machine
(PMs) (Vms)

Data gathering in Adaptive Classification Abnormal Improved Unable to predict Remote Server N/A N/A N/A
Sensor nodes compressive sensing data accuracy and and identify
through sensing-based reduces Mean events that occur
intelligent autoregressive Squared Error on a frequent
compressive reconstruction and latency basis
sensing [26] algorithm
Making sense of Knowledge- based Classification Predict and Accurate Long non- Remote Server N/A N/A N/A
sensor data multi neural identify events prediction of automated
using ontology: network classifier that occur on a frequent learning process
a discussion for frequent basis anomaly sensing that relies on
residential data on a real- domain experts
building time basis for the
monitoring [27] provisioning of
sample data
Information Sensor symbolic Classification Issue of Improved Abstraction Remote server Yes N/S N/S
abstraction for aggregation minimizing the accuracy, accuracy still
heterogeneous approximation huge volume minimized data needs further
real world algorithm of sensing data volume and improvement
internet data latency
[28]
Smart outlier Fuzzy-based Classification Detection of Improved Unable to self- Remote Server N/A N/A N/A
detection of spatial-temporal error and event accuracy for check the
wireless sensor approach outliers in error/event prediction process
network [29] local/global outliers with using a
search space of minimum false mandatory
the sampled positive rate perception data in
data an IoT platform
Non-parametric Non-parametric Classification Problem of Enhanced Difficult to detect Remote Server Yes N/S N/S
sequence-based sequence learning self-check classification outliers in global
Learning algorithm identification accuracy with space data set
approach for using optimal increases is size
outlier perception for detection of
detection in IoT error/event error/event
[30] outliers outliers with less
detection false positive
rate
Cooperative Multivariate Classification The inability to Improved the Its static Remote Server N/A N/A N/A
sensor anomaly Gaussian-based differentiate Receiver transformation is
detection using principal between Operating unable to realize
global component erroneous and Characteristic optimal
information analysis event data (ROC) curves, prediction of
[31] from true and false erroneous data on
inconsistent positive rates for real-time basis
observations detecting
erroneously
sensed data
Recursive Recursive principal Clustering Inability to Improved It is Aduino Uno Yes N/S N/S
principal component make optimal aggregation computationally Microcontroller
component analysis prediction of with error intensive because
analysis-Based erroneous data growth and it tends to adapt
data Outlier in global space event detection recursively to the
detection and of massive accuracy changes in
sensor Data sensed data sensory data
aggregation in sets readings
IoT systems
[32]
A novel three-tier Logistic regression- Regression Ineffective Enhanced Inefficient Mobile Phones Yes N/S N/S
Internet of based prediction classification classification aggregation of Personal Server
Things algorithm of heart related accuracy rate data
architecture disease based on
with machine symptoms specificity, and
learning sensitivity
algorithm for
early detection
of heart diseases
[33]
Non-parametric Non-parametric Classification Problem of Enhanced Difficult to detect Remote Server Yes N/S N/S
sequence-based sequence learning self-check classification outliers in global
Learning algorithm identification accuracy with space data set
approach for using optimal increases is size
(continued on next page)

1492
A.E. Edje et al. Digital Communications and Networks 9 (2023) 1486–1515

Table 3 (continued )
Article Title Algorithm Process Problem Outcome Weakness Edge Cloud Data Center
Resolve Device
Cloud No. of No. of
Storage Physical Virtual
Server Machine Machine
(PMs) (Vms)

outlier perception for detection of


detection in IoT error/event error/event
[30] outliers outliers with less
detection false positive
rate
Cooperative Multivariate Classification The inability to Improved the Its static Remote Server N/A N/A N/A
sensor anomaly Gaussian-based differentiate Receiver transformation is
detection using principal between Operating unable to realize
global component erroneous and Characteristic optimal
information analysis event data (ROC) curves, prediction of
[31] from true and false erroneous data on
inconsistent positive rates for real-time basis
observations detecting
erroneously
sensed data
Recursive Recursive principal Clustering Inability to Improved It is Aduino Uno Yes N/S N/S
principal component make optimal aggregation computationally Microcontroller
component analysis prediction of with error intensive because
analysis-Based erroneous data growth and it tends to adapt
data Outlier in global space event detection recursively to the
detection and of massive accuracy changes in
sensor Data sensed data sensory data
aggregation in sets readings
IoT systems
[32]
A novel three-tier Logistic regression- Regression Ineffective Enhanced Inefficient Mobile Phones Yes N/S N/S
Internet of based prediction classification classification aggregation of Personal Server
Things algorithm of heart related accuracy rate data
architecture disease based on
with machine symptoms specificity, and
learning sensitivity
algorithm for
early detection
of heart diseases
[33]
A real IoT device Fuzzy-based Classification The Improved Unable to deal Mobile Phones Yes N/S N/S
deployment for human activity complexity of outliers/inliers with missing data
e-health recognition data detection values
applications classifier algorithm overlapping in accuracy with
under massive sensed less
lightweight data computation
communication resources
protocols,
activity
classifier and
edge data
filtering [34]
On the effect of Dynamic symbolic Clustering The Achieved Unable to give Local Server N/A N/A N/A
adaptive and aggregation complexity of optimal data insight knowledge
non-adaptive approximation aggregating aggregation about the sensing
analysis of time- (DSAX) massive quality for error data retrieved
series sensory sensory data data prediction regarding drifts
data [35] retrieved from and consistent
various data
sources
Adaptive Adaptive K-means Clustering Deficiency in Improved Unable to Fog Server N/A N/A N/A
clustering for clustering clustering clustering consider the
dynamic IoT algorithm streaming accuracy based spatial dimension
data streams sensed data on Silhouette and correlation of
[36] coefficient the streaming
data
Clustering of data Gaussian-based Clustering Inefficiency in Improved drift There are about as Local Server N/A N/A N/A
streams with dynamic clustering detection many Instances
dynamic probabilistic dynamic accuracy to the detected as
Gaussian algorithm sensing data to tune to 98.7% turning points for
mixture Models. detect drifts and sensitivity concept drift
an IoT sensed data of 96%
application in indicating that
industrial almost all
processes [37] detections are
true positives
(continued on next page)

1493
A.E. Edje et al. Digital Communications and Networks 9 (2023) 1486–1515

Table 3 (continued )
Article Title Algorithm Process Problem Outcome Weakness Edge Cloud Data Center
Resolve Device
Cloud No. of No. of
Storage Physical Virtual
Server Machine Machine
(PMs) (Vms)

An automatic Linear prediction Regression Ineffective Efficient Not Specified Local Server N/A N/A N/A
health spectrum detection of detection of
monitoring algorithm voice disorder voice disorders
system for of patients with sustained
patients vowels and
suffering from running speech
voice based on
complications improved
in smart cities accuracy
[38]
Edge computing Convolution neural Deep Inaccurate Improved the High bandwidth Cloudlet Yes 1 3
with Cloud for network algorithm learning classification classification consumption Servers
voice disorder of voice accuracy in the during sense data
assessment and disorder detection of transmission from
treatment [39] symptoms voice (event) edge to the cloud
disorder
detection
Fog assisted-IoT Bayesian belief Deep Delay in the Improved Not considering Cloudlet Yes 8 N/S
enabled patient network algorithm learning classification accuracy of the spatio- Servers
health of sensed data classifying temporal Wireless
monitoring in acquisition dataset with less correlations Routers
smart homes time during among sensed
[40] classification data set
process
A new shelf life Back propagation Deep Issue of Effective Not specified Local Server N/A N/A N/A
prediction learning algorithm learning detecting and filtering of
method for farm elimination of normal sensed
products based erroneous data from the
on an outliers in big abnormal ones
agricultural IoT sensed data
[41]
IoT big-data Enhanced Clustering Challenges of Improved the Unable to Remote Server N/A N/A N/A
centered knowledge granule clustering high precision and minimize the
knowledge clustering complex accuracy of inter-cluster
granule analytic algorithm knowledge outlier detection distances of
and cluster granules for sensed data
framework for outlier
BI applications: detection
a Case base
analysis [42]
Fog intelligence Homoscedasticity Deep Improper Enhanced Duplicate sensed Arduino Uno N/A N/A N/A
for real-time IoT measurement learning selection of classification data and high Microcontroller
sensor data Leven’s test feed- threshold accuracy, computationally Local Server
analytics [43] forward neural leading to sensitivity and intensive
networks partial precision
algorithm classification
Efficient and Advance micro- Clustering Inefficient Improved Unable to Local Server Yes N/S N/S
flexible cluster-based outlier outlier detection addressed
algorithms for continuous outlier detection on with minimum uncertainty of
monitoring detection frequent data computational data streams,
distance-based algorithm stream and resource usage instances assigned
outliers over computation existential
data streams complexity probability
[44]
Smartphone- Complex event Clustering Computation Improved Weakness in Fog Server Yes N/S N/S
based outlier processing –based resource accurate identifying
detection: a Z-score and box complexity of detection outliers for
complex event plot constraints IoT outliers from emergency
processing devices online data scenarios due to
approach for streaming with lack of historical
driving less usage of data
behavior computation
detection [45] and memory
resources
Fog-empowered Hyperellipsoidal Clustering The Issue of Reduced energy A need for further Fog Server Yes N/S N/S
anomaly clustering high latency consumption improvement on
detection in IoT algorithm and energy and latency latency due to
using consumption while improving increase usage of
hyperellipsoidal anomaly computation
clustering [46] prediction resource
accuracy
(continued on next page)

1494
A.E. Edje et al. Digital Communications and Networks 9 (2023) 1486–1515

Table 3 (continued )
Article Title Algorithm Process Problem Outcome Weakness Edge Cloud Data Center
Resolve Device
Cloud No. of No. of
Storage Physical Virtual
Server Machine Machine
(PMs) (Vms)

Entropy outlier Entropy Outlier Clustering Insufficient Improved Cannot be Laptop Pc (1.6 N/A N/A N/A
detection using Detection Semi- labeled data outlier detection deployed in real GHz and 1 GB
semi-supervised supervised for training accuracy time big sensing RAM)
approach with (EODSP) algorithm and limited compared to data due to its
few positive positive other existing computational
examples [47] labeled approaches complexity
samples
IPCA for network Iterative Principal Clustering Variability of Improved Computation Remote Server N/A N/A N/A
anomaly Component feature scales outlier detection complexity in
detection [49] Analysis (IPCA) and the issue of efficiently iteratively
algorithm multiple mitigating the updating
number of limitations of distances of
dimension PCA neighborhood
data set
Real-time Moving Window Clustering Issue of time Improves the Unable to Remote Server Yes N/S N/S
multiple event Principal variance in prediction and disaggregate
detection and Component sensing data classification of multiple loss of
classification Analysis (MW- frequency outlier accuracy load and
using moving PCA) algorithm generation of
window PCA events
[50]
Research on real Robust Clustering Disaggregate Improved Unable to Remote Server Yes N/S N/S
time feature Incremental multiple loss of outlier detection determine the
extraction Principle load and in real-time by causes for the
method for Component generating of reducing the abnormal patterns
complex Analysis (RIPCA) events dimension of big
manufacturing algorithm IoT dataset and
big data [51] usage of
computation
resource

Footnote: N/A = Not Applicable, N/S=Not Specified.

to half its original size. As a result, the compressed data is reconstructed, Then, the difference between each instance of the sequence image of the
allowing the adductive abstraction of the sensed data to discover events sensed data is computed. Also, the Influential Relative Grade (IRG) co­
that occur over time. For example, the changes in temperature over a efficients for each sequence (class) sensed data are calculated to retrieve
day from cold to warm to cold, which represents a frequent or stable the relative mass function in each respective class. Therefore, outliers
temperature pattern. Therefore, newly observed states hidden from the are predicted as the classes with lower values while the inliers are classes
pattern are classified as outliers. with higher values. Furthermore, event outliers are detected by running
Kamal [29], introduced a fuzzy algorithm that utilizes spatiotem­ the algorithm on the fused parameter (attribute) dataset, while the error
poral similarity concept to detect outliers. However, could not provide type of outliers is obtained by running the algorithm on each parameter.
the self-check identification using perception data which is highly A Multivariate Gaussian-based Principal Component Analysis (MG-
required in an IoT Cloud-IaaS platform. It classifies the abnormal PCA) is designed in the research to predict erroneous sensing data
observation into error and event outliers. First, a data set generated by among irregular observations, based on the characteristic pattern of
sensor nodes is computed on the first-order difference |Si2–Si1|. Then, different dimensional sequence data [31]. The MG is first applied to the
the total difference is compared to the threshold value that is reached by retrieved sensed data set to determine the similarity among the data
the tolerance of the temperature sensor. Thus, if the total first-order points. It identifies the time point when the error occurred and further
difference does not exceed the threshold, the Si2 data point is consid­ retrieves the particular sensor node that is observed to be erroneous at a
ered similar to other data points. Otherwise, an outlier is obtained when particular time. Consequently, the PCA utilizes the principal vectors to
dissimilarity is observed on a data point. Second, the calculation is done determine the differences between data patterns for detecting the sensor
based on the distance between neighboring sensor nodes to discover the error readings that violate the inherent pattern extracted. However, the
spatial similarity between them. The Euclidean distance method used to MG-PCA approach is limited by the inability to track variations in dy­
compute the similarity or correlation measure between two points (x, y) namic and heterogeneous sensing data due to its static transformation.
that have identical transmission range and time proximity. Then, the This has been addressed by the Clustered-based Recursive Principal
spatial similarity threshold is obtained by computing the mean distance Component Analysis (CR-PCA) algorithm proposed in Ref. [32]. It
of all data points in the proximity time. If the Euclidean distance d(x, y) initially aggregates the redundant sensed data while detecting the out­
does not exceed the indicated threshold value, the data values at point X liers. The spatially correlated sensed data retrieved from the cluster head
are identified as similar to that of the data values at point Y. Otherwise, sensor members are aggregated by extracting the principal components
an error outlier is detected as a faulty sensor reading. and identifying the possible data outliers with the support of an
Conversely, a Non-Parametric Sequence-based Learning (N-PSL) al­ abnormal squared prediction error score, called the residual square. It
gorithm is proposed by Nesa et al. [30], predicting the outliers based on recursively updates its parameters to adapt to the dynamics of the sen­
error event types. It considers the use of data perception for self-check sory data retrieved from the sensor devices.
detection both error and event outliers. The N-PSL algorithm is based A Logistic Regression-based prediction (LRP) algorithm is developed
on a gray relational analysis. In the initial stage, the sample data is to detect patients with heart diseases by classifying clinical sensory data
normalized by calculating the average image of each sampled data. collected from IoT wearable devices [33]. Sensed data collected from

1495
A.E. Edje et al. Digital Communications and Networks 9 (2023) 1486–1515

wearable sensing devices are constantly monitored. If the data values Two output neurons were used to represent voice disorder detection,
exceed the reliable predicted value, it’s considered abnormal, otherwise eight neurons were used for the voice disorder classification before
it becomes normal. Consequently, Santamaria et al. [34] proposed a being trained by fine-tuning the parameters for optimal detection of
fuzzy-based Human Activity Recognition (HAR) classifier algorithm to voice disorder from normal ones.
classify the sensed data into normal and abnormal activities of patients. A Bayesian Belief Network (BBN) algorithm is proposed in Ref. [40]
The algorithm updates the classification process by initiating some for the classification of sensory data. It classifies sensory data retrieved
constant values that are used to specify the number of clusters. It then from patients into two classes namely, abnormal and normal. The
selects a weighted component (fuzzier) and an initial membership ma­ retrieved data in the abnormal class indicates the severe or critical
trix with some threshold values are selected. The weighted components health status of the patients. On the other hand, the sampled data in the
regulate the class overlapping of the classes while assigning a data point normal event class indicates the normality of patient’s health status. A
to its cluster member. Furthermore, the threshold value is used to naïve Bayesian classification procedure known as conditional proba­
evaluate the convergence in the iterations of the classification process. bility is used to achieve the classification process. Thus, a predefined
A Dynamic Symbolic Aggregation Approximation (SAX) is proposed value is set as the normal value, which indicates that the probability of
for the adaptive and non-adaptive window size, in the segmentation of all sampled data within the range of the predefined normal value will be
time sequence data stream with variation in real-time processing [35]. It classified as a normal class. Also, an abnormal class is obtained when the
divides the time sequence data set into equivalent segments and gen­ probability of having the sampled data value exceeds that of the normal
erates a string representation for each segment. First, the time sequence event class. To improve the prediction process, an important set of at­
data is normalized to achieve a standard deviation and mean (average) tributes, namely the environment and the patient’s history, were used.
of one, before being converted it to a Piecewise Aggregation approxi­ Thus, the abnormal class is transmitted to the cloud for further pro­
mation (PAA). Next, the data is divided into the desired number of cessing and analysis. Wu et al. [41], implemented a Back Propagation
windows and the average mean of the data falling in each window is Learning (BP) algorithm for the classification of sensed data retrieved
calculated by the PAA so that the size can be reduced. Then, a dis­ from sensing devices attached to agricultural crops. The sensed data are
cretization process is performed on the PAA coefficients (each window classified into abnormal and normal batches. The abnormal value or
size) by mapping the PAA coefficients to breakpoints which are gener­ attributes are discarded while the normal values are further processed
ated by the alphabet size (e.g. c), to determine the area of equal-size for on the cloud platform. The normal values are further divided into low,
retrieving the symbolic data representation. Puschmann et al. [36] normal, and high values based on predefined values consisting of − 1
developed an Adaptive K-means Clustering (AKC) for outlier detection. (low), 0 (normal), and 1 (high). The BP algorithm is then applied to
This is done by evaluating the dynamic sensor data and updating the accurately predict the crop yield. It multiplies the output and input data
cluster centroids according to the changes in the data stream at a given to obtain the gradient of the weight and places the weight in the opposite
time. Clusters are formulated based on the similar features of the sensory direction of the gradient by subtracting the ratio of it from the weight.
data stream retrieved over time. New cluster(s) are formed based on An Enhanced Knowledge Granule Clustering algorithm that is based
changes in data features. For example, if an incoming streaming data has on neuro-fuzzy analytic architecture is designed in Ref. [42]. It is used to
the feature types “Temp, Temp, Temp, Hum, and Hum ….n”, obviously extract complex knowledge granules from IoT sensory big data. First, the
the Temp features will be allocated to the initial cluster. The appearance facts are arranged in an array based on the multiple rule system to obtain
of the ‘Hum” will trigger the creation of another new cluster which will the knowledge granules for clustering. Each knowledge granule must be
contain the Hum feature data records. associated with a fitness tag, where the estimated value is present. This
Both the SAX and AKC approaches provide substantial assignment of is done through the attributes of the knowledge granule where the initial
sensory data instances to clusters but are unable to provide knowledge mapping for a cluster is performed by the fitness value, followed by the
information (i.e. inconsistency or consistent manner) about the data and next level mapping for sub-clusters under the previous cluster. In simple
how it is assigned to each cluster. These problems have been addressed words, based on the fitness rule, two clusters are said to be similar if both
by a Gaussian-based Dynamic Probabilistic Clustering (GDPC) algo­ have knowledge granules with homogenous attributes. The knowledge
rithm, proposed by Ref. [37]. It estimates the model parameters and any granules are mapped to individual clusters based on the attributes. Thus,
drifts in the data points. It further provides the membership likelihood of the sub-cluster within a cluster is maintained for the fitness of the
each data point to each cluster by utilizing the brier score. Brier score is explicitly identified knowledge granules. For example, in cluster, let X1
used to determine the abnormality of subsequent probabilities from be a knowledge granule such that X1 is mapped to sub-cluster (G < 0.5)
those objects or data points that are expected. Drifts or changes are if and only if G (X1) < 0.5; otherwise, X1 is mapped to sub-cluster (G >=
detected when the parameter of sensed data value is above the pre­ 0.5). Thus, the G values of clusters and sub-cluster are strongly estimated
defined threshold value of the brier score. Such drifts are known as by quantifying the outliers that are present. In addition, outliers that are
outliers. After drifts are detected, the brier score changes its behavior present in the clusters and sub-clusters degrade the G values.
and stabilizes for incoming sensor data. Furthermore, Raafat et al. [43] proposed a Homoscedasticity
A Linear Prediction Spectrum algorithm is introduced in Ref. [38] for Measurement-based Leven’s Test (HMLT)-based Feed-forward Neural
voice detection disorder, based on sensed data retrieved. It analyzes the Networks (FFNN) algorithm, for accurate classification of desired fea­
energy variation in the spectrum to distinguish between disordered and tures of sensed data in the cloud. Sensed data retrieved from sensing
normal voices. This is done by dividing the vocal track into various tubes devices is filtered. Then, the HTMLT is applied to extract dissimilarity
from the glottis to the lips. It then performs an estimated analysis on the features from the denoised signal, by observing the signal for sudden
source signal using inverse filtering that triggers the computation of the changes. Then, the extracted features are inputted into the FFNN to
spectrum. Furthermore, the estimated signal is utilized to determine the proceed with the classification process. The FFNN classifies the sensed
energy distribution in vowel and running speech for the detection of data into abnormal and normal data. This is updated by sending the data
voice disorder. Muhammad et al. [39] develop a Deep Convolution from its input layer to the hidden layers. The neurons in the hidden
Neural Network (CNN) algorithm for classifying the sensed data into two layers are responsible for computing an activation function over the sum
segments namely voice disorder and normal voice. It uses its input image of input features, which are multiplied by a set of weight parameters.
consisting of blue, green, and red colors to classify the voice sampled The, results are output as either normal or abnormal sensed data. Data is
data obtained from the IoT sensing devices. Therefore, the use of transfer abnormal when there is a sudden change in the sensed data due to an
learning and a fine-tuned approach is used to train the CNN for optimal external event.
detection of voice disorder and to speed up the classification process, An Advanced Micro-cluster-based Continuous Outlier Detection
due to the limited voice sampled data obtained from the IoT devices. (AMCOD) algorithm is proposed in Ref. [44] for frequent monitoring of

1496
A.E. Edje et al. Digital Communications and Networks 9 (2023) 1486–1515

outliers in sensory data streams to improve efficiency and reduce storage


resource utilization. An outlier ‘A’ is identified if the distance of ‘B’
instance (s) is greater than that of ‘A’. Also, if the number of data in­
stances or objects in the distant neighborhood of ‘A’ objects exceeds that
of B, then ‘A’ is referred to an inlier. Efficiency is improved by using
microclusters to minimize the number of distance calculations, memory
size determination, and the number of data objects for the time window
size. In addition, the arrival and departure of the data object is moni­
tored to determine the degree of an outlier and safe inlier. At this stage,
if the number of neighbors of a given data object of ‘A’ is higher than
that of ‘B’, then ‘A’ becomes a safe inlier and not an outlier. Therefore,
the use of computational and memory resources is reduced by discarding
the outliers.
Conversely, the distance-based algorithm cannot run on devices that
are challenged with low memory and computation resource. Vascon­
celos et al. [45] solved the problem by introducing a Complex Event
Processing Z-score and Box Plot approach to predict the outliers. The
sensor data collected from the on-board vehicle and embedded mobile Fig. 3. An example of ENOF outlier detection.
sensors are sent to the Complex Event Processing engine for pre­
processing. It generates or extracts features (e.g. speed, acceleration, traffic anomalies in the network. The IPCA functions are as follows; the
deceleration, mean deceleration, etc.) from the sensed data retrieved as matrix of a data set M is obtained from the data trace, then the sub­
evidence sensed data that best characterize chauffeur behavior. Then, traction of the mean value of each feature is computed to formulate a
patterns that significantly deviate from the evidence data are identified new matrix of the data set, denoted NM. Then, each feature is divided by
by the CEP rules. In addition, the Z-score method is used to assign a score its standard deviation to obtain a normalized metric dataset Dn. The
to each piece of evidence by splitting the stream into sequence windows. eigenvalues and eigenvectors of Dn are obtained by creating the corre­
Each window consists of a set of evidence sensed data. It then computes lation of its matrix (Dn). The eigenvector with the largest eigenvalue is
the standard mean deviation of the evidence in each window after which considered the normal subspace while others are the anomalies.
the Z-score distribution is assessed to classify the chauffeur’s behavior. Therefore, it is updated iteratively when a new traffic data stream of
Moreover, the box plot method is deployed to avoid the computation packets is transmitted.
complexity of pairwise distances for all evidence data by performing the Rafferty et al. [50] proposed a Moving Window Principal Component
computation for each evidence (dimension or feature) individually and Analysis (MW-PCA) algorithm to obtain the threshold value for pre­
correlating the outliers. It uses a threshold value to filter out all data dicting an event that can adapt to the uncertainty behavior of a power
instances that are inliers and those that are outliers. system frequency for time variance. It learns on the initial window,
A Hyperellipsoidal clustering algorithm is introduced by Ref. [46], to containing a specific size of data frequency. Each newly normal data
detect anomalies in the multimodal distribution of sensing data sample and that of PCA are calculated, updating their confidence limits
retrieved from end nodes. It accommodates heterogeneous sensing data to determine the subsequent new sample point. If the confidence limit of
ranging from linear to hyperspherical, with an automated mechanism to the initial normal data sample is less than or equal to the new sample
select the number of clusters. It also realizes a linear computation and that of the initial data PCA confidence limit is also less than or equal
overhead regarding the number of data vectors processed. At the initial to the new one then the system is considered to be operating normally
stage, a set of hyperellipsoidal clusters is obtained, by using the Ellip­ and the moving window is updated to capture the new data sample. On
soidal Neighborhood Outlier Factor (ENOF) to identify the ellipsoids the other hand, if both or either of the confidence bounds exceed the
that are drifting relative to their neighborhood to densities. Conse­ data point, it is automatically excluded, indicating the occurrence of
quently, the ratio between the average neighborhood range density of (outliers). However, the aforementioned algorithms (EODSP, IPCA, and
neighbors and ellipsoids’ neighborhood range determines the level of MW-PCA) can detect outliers from dynamic sensing data, but they are
outlier score. Therefore, a threshold is calculated using the standard challenged with the inability of robustness to predict outliers in complex
deviation of the ENOF scores and a parameter to determine the anom­ big and dynamic sensing data. It was solved using the support of a
alous clusters. Thus, clusters with an ENOF score that is higher than the Robust Incremental Principle Component Analysis (RIPCA) algorithm,
threshold are considered as anomalous clusters. The process of ENOF proposed by Kong et al. [51]. It uses the sliding window supported with
outlier detection is further illustrated in Fig. 3. Where the blue line an anti-K nearest neighbor method to compute the principal components
represents the threshold value. of the sampled data set in the most current window to identify and
An Entropy Outlier Detection Semi-supervised (EODSP) algorithm is discard outliers. The anti-K nearest neighbor is applied to the sliding
introduced in Ref. [47] for detecting the outliers in an unlabeled data window to update the current data and to predict the real-time data
set. Entropy is the degree of information and uncertainty of a random outliers. The anti-k nearest neighbor is a collection of data instances in a
variable [48]. For instance, let y be a random variable, the entropy E(y) data set that considers an instance (a) as a K nearest neighbor. Therefore,
of the probability distribution g(y) on y = {y1 ….yn}; thus is given as a the instances with at least of three anti-k-nearest neighbors are consid­
dataset of h instances with f-number of features, the entropy E(y) of a ered as the query outliers due to the anti-k neighbor in the current
multivariable vector Yi is a random variable which is considered to be a window.
member of y dataset. It consists of two strategies that are used to solve
the problem of outlier prediction when there are limited positive data 5.2. Redundancy discovery
objects for training data. At the initial stage, the reliable negative data
objects which are considered as inliers are extracted from positive Data redundancy is the duplication or repetition of data, as shown in
samples and unlabeled data. Then, the distances between each point in Fig. 4. It is a common problem in the IoT-enabled edge cloud domain.
the data set and positive objects are calculated. Therefore, the distance The sensed data generated by IoT sensing devices is massively dynamic,
points that are higher than the threshold value from each data object are with redundancies due to the strong correlation between sensed data
predicted as outliers. In addition, Delimargas et al. [49] proposed an [52]. For example, certain data may appear multiple times in a dataset
Iterative Principal Component Analysis (IPCA) algorithm to detect data due to the repeated capture of an event by the sensor(s) within a certain

1497
A.E. Edje et al. Digital Communications and Networks 9 (2023) 1486–1515

limitations consisting of a limited number of correlated features as in­


puts and the uncertain (confused) result based on overlapping (depen­
dent) input features. These problems have been solved using the Fast
Correlation-based Filter (FCBF) algorithm, implemented by Ref. [56].
It uses the Symmetrical Uncertainty (SU) to obtain the optimal and
desired features among several features. The SU has threshold values
ranging from 0 to 1, which used to evaluate the relationship between the
feature class and the similarity between different features. Therefore,
the variable can estimate the value of other variables if it’s equal to 1.
Otherwise, the two variables are independent if the value is equal to or
less than the 0 mark. At the initial stage, it determines the association
between feature and class subset with the support of C-Correlation while
it performs the pairwise similarity among the features for the F-Corre­
lation. Thus, feature redundancy is avoided during feature selection
when the similarity between features and classes that satisfy the con­
ditions of SU while searching for the relevant features, starting from the
features with the highest SU values.
A Fractional-order Embedding Multi-set Canonical Correlations
(FEMCCs) algorithm is introduced in Ref. [57] to resolve the eliminated
data drifts from the consistently sampled data. In the initial stage, the
covariance metrics are re-estimated using the fractional order to correct
non-zero values and single values. Then, a fractional order is defined
within-set and between-set scatter matrices to minimize the deviation or
Fig. 4. Example of redundant data in dataset.
drift of the data sample matrices. It then extracts similar features from
multiple sets of feature vectors obtained from the same objects. It then
time period. fuses the extracted similarity features together with the support of a
Most sensory data redundancies are considered irrelevant due to fusion strategy, to form a discriminative feature vector for classification
their negative impact on network and application performance. Just to function. Haghighat et al. [58], proposed a Discriminant Correlation
mention a few, unnecessarily increases the size of the IoT device(s), Analysis (DCA) approach to determine the class associations in the
limited storage, inconsistency, and data corruption. We present the similarity data feature sets. It reduces the pairwise similarities of the
filtering algorithms used to solve the problems related to sensor data corresponding feature sets simultaneously, discarding the feature simi­
redundancy on the IoT-enabled edge cloud platform, which are dis­ larities between classes and limiting the features belonging to different
cussed as follows; classes within each feature set. Then, the extracted features of interest
A Support Vector Machine Recursive Function Elimination-based from multiple classes are merged into a single class.
Correlation Bias Reduction (SVMRFE+CBR) algorithm is developed in However, FEMCCs and DCA have some challenges. The minimized
Ref. [53] to reduce the biased nature of SVM-RFE when a feature set feature sets generated by FEMCCs seem to neglect certain correlation
consists of multiple similar features. Li et al. [54] initially implemented information among various feature sets which degrade its classification
the SVM-RFE by using the requirement derived from the SVM coefficient performance. On the other hand, DCA is deemed not to be effective as
to evaluate features and recursively discard features with limited re­ redundancies are still detected in the fused features because of the
quirements with the support of two different strategies namely Kernel similarity requirement.
and Wrapper. The Kernel strategy retains the dependencies among Both issues were resolved by utilizing Intra-class and Extra-class
features, while the wrapper strategy does not use the cross-validation Discriminative Correlation Analysis (IEDCA-IRE) technique, proposed
testing method on train samples as the requirement selection. It is also in Ref. [60]. It uses its Kernelize strategy to the intra-class similarity
known to be efficient in terms of processing speed when dealing with (pairswise correlation) and the similarity across various data features in
different candidate features. It also makes maximum use of the training the same class, to retain the relevant data in the fused data feature. After
samples with minimum over-fitting. that, the irrelevant or duplicate data is eliminated. In simple words, it
The SVM-RFE is challenged to evaluate the feature(s) requirements, retains adequate dimensions of data features for class separation in each
and their importance is underestimated due to excessive correlation set of features and learned similarity features obtained by the discrim­
between candidate features. However, the SVM-RFE is integrated with inative structure. First, it generates a between-class scatter matrix via
the Correlation Bias Reduction (CBR) strategy to improve the elimina­ the nearest neighbor from both the extra-class and the intra-class. Then,
tion of duplicate sensed processes. Therefore, the SVM-RFE+CBR solves the non-zero vectors of the corresponding nonzero values in the
the prevailing problems of SVM-RFE, by generating a representative between-class matrix are identified. In addition, the maximization of
feature with the highest demand of classified correlated features back feature correlation between-classes is maximized by computing the
into the existing feature class. First, the list of features to be eliminated non-zero vectors with their corresponding values, which transforms the
during the first iteration is denoted as Fout, and the list of existing or entire matrices. The Kernelized intra-class correlation is used to
relevant features is denoted as Fin. Two thresholds Tc and Tg are used to concatenate the transformed features into a fused feature vector as
identify highly correlated feature classes in Fout. If there are more than shown in Fig. 5(a and b), which leads to the elimination of irrelevant
Tg features whose coefficient with the highest demand is greater than Tc, redundant features present in the fused feature vector.
they are identified as a group. Otherwise, if none of the group members However, FEMCCs and DCA do have some challenges. The mini­
are Fin, the features with the highest requirements are moved to Fin. mized feature sets generated by FEMCCs seem to neglect certain corre­
Thus, this process is repeated for each feature in Fout until all the fea­ lation information among different feature sets, which degrades its
tures have been removed. Szecowka et al. [55], proposed a Neural classification performance. On the other hand, DCA is weak due to the
Network Sensitivity (NNS) approach for removing duplicate sensed data discovery of redundant data in the fused features because of the simi­
while maintaining the accuracy of the overall performance. An larity requirement.
improved function was obtained with the support of the differential Both problems were solved by using the Intra-class and Extra-class
sequential coefficient of the neural network. However, NNS has some Discriminative Correlation Analysis (IEDCA-IRE) technique, proposed

1498
A.E. Edje et al. Digital Communications and Networks 9 (2023) 1486–1515

Fig. 5a. Original dataset.

Fig. 5b. Fused features via intra-class/extra-class discriminative correlation.

in Ref. [60]. It uses its Kernelize strategy to search the intra-class simi­ structure so that optimal informative frames (image) are extracted from
larity (pairswise correlation) and the similarity across different data the non-informative frames (image/data).
features in the same class to retain the relevant data in the fused data A Correlation Feature Selection-based Heuristic algorithm is intro­
feature. Then, the irrelevant or duplicate data is eliminated. In simple duced to address the problem of duplicate sensed data on edge-based
words, it retains adequate dimensions of data features for class separa­ cloud IaaS [61]. It uses the feature predictive performance and
tion in each set of features and learned similarity features obtained by inter-correlation to guide its search for an optimal feature subset of
the discriminative structure. Firstly, it generates a between-class scatter sensed data. It also, considers the benefit of each feature of sensed data
matrix via the neighbor in proximity to the extra-class and intra-class. for predicting the class label, based on the level of inter-correlation
Then, it identifies the non-zero vectors of the corresponding non-zero among them. At the initial stage, it computes a matrix of
values in the between-class. Furthermore, the maximization of feature feature-feature correlations and feature-class from the training data set.
correlation between classes is obtained by computing the nonzero vec­ Then, an optimal search is performed to determine the feature subset
tors with their corresponding values that transform the entire matrices. space, by using the best first search technique to obtain the relevant
Therefore, the Kernelized intra-class correlation is used to concatenate features. Furthermore, Scale Invariant Feature Transform (SIFT) algo­
the transformed features into a fused feature vector, as shown in Fig. 5(a rithm is developed by Yuan et al. [62], to manage the influx of sensing
and b). This results in the elimination of irrelevant redundant features data retrieved from multimedia sensor nodes. The retrieved data are first
present in the fused feature vector. fused by using the Laplace Pyramid Transform (LPT) method. Then, the
Jeffry-divergence (JD) and Inter-frame Correlation of Color Channels different sizes of Gaussian Kernels (known to have more accurate scale
on Boolean Series–based Ensemble-based Support Vector Classification transform) are selected to perform the scale transform of the fused data,
Algorithm is proposed in Ref. [59]. Thus, to minimize the massive to obtain the accurate candidate feature points. Therefore, the edge
amount of sensed data retrieved from camera sensing devices. response points of low contrast and instability of the sensed data are
The obtained video frames (sensed data) are compared based on their discarded. Each feature point is allocated a direction by the gradient
color and structures. If similarities are detected between two or more information of neighboring pixels to improve the accuracy of the feature
frames, their divergence is computed using the color histogram to obtain point matching. Li et al. [63] propose a Center-symmetric Local Gabor
the actual corresponding frame. Frames with high similarity measure Binary Pattern (CSLGBP) feature extraction algorithm to obtain the
are discarded. Then, a multi-fractal technique is used to discover the actual face image captured by camera sensor devices. The input face
frames, based on different texture structures at different scales with local image is convolved with the Gabor kernels to retrieve the magnitude
densities, to provide rich descriptors to categorize the structures of the information of well-defined specific orientations and scales. The speci­
frames. Then, an SVM is used to train each category of the frame the fied orientations at the same scale are accumulated to formulate a new

1499
A.E. Edje et al. Digital Communications and Networks 9 (2023) 1486–1515

scale feature. The features of each scale are computed using the CS-LBP space to its tensor format by a bijection function. Then they are aggre­
descriptor from the retrieved Gabor scale images to extract and obtain gated into clustered groups based on their similarity features. In addi­
the relevant image. tion, the attributes of each object or data record are greatly reduced
Linear Discriminate Analysis-based enhanced Support Vector algo­ using the canonical polyadic decomposition scheme. Thus, to obtain an
rithm is proposed in Ref. [64], to address the uncertainty with sensed optimal compression rate as it reduces the huge volume of raw sensing
image signal or data retrieved from camera sensor devices. It computes data to some significant extent. Therefore, enabling the traditional
various characteristic features of the data sets and classifies the features fuzzy-c means to cluster the huge sensed data with low-end devices such
present in the pre-processed sensed image signal. It also detects the Q as controllers and mobile phones.
wave, R wave, and S wave in the pre-processed input image signal to Banag-Pseudo-cluster-based aggregation algorithm is developed in
determine the various heartbeat levels (e.g., Left Bundle Branch Block, Ref. [68], to determine the exigency or criticality of various data
Right Bundle Branch Block, Premature Ventricular Contraction, and collected from multiple sensor nodes. Data is aggregated into groups
Premature Atrial Contractions) and classify them accordingly. The based on the level of their exigency at the edge (gateway) platform.
weighted kernel function computes the weight which is used to identify Therefore, the data with the highest exigency value is aggregated first
the R, Q, and S waves for optimal classification of the heartbeat levels. before the others. This is done repeatedly and systematically until all the
Consequently, the Incremental Fast Searching Clustering-based sensed data are fused into their respective groups and sent to the cloud
K-Mediods (ICFSKM) algorithm is introduced in Ref. [65], to discover data center for further processing. Abawajy et al. [69] designed a
the underlying patterns of the dynamic sensing data, by integrating the Cobweb Expectation Maximization and K-means, which is also called
initial data patterns into the previous ones by using its combination the Rank Correlation Coefficient (RCC) algorithm for the clustering of
operations. The cluster centers are continuously updated by the kme­ ECG sensed data. First, it uses the fuzzy-based data fusion technique to
doids upon the arrival of new sensing data. In simplicity, it maintains a aggregate only the relevant values of the sensed data and discard the
set of clustered data with similar feature patterns, so it either creates others. Thus, the relevant data sets are grouped into different indepen­
new sets of clusters or assign them to the previous cluster upon new dent clusters. Then, a consensus function is used to combine the clusters
sensing data arrival. to generate the final consensus cluster by partitioning all the elements or
A Blocks of Eigenvalues Algorithm for Time Series Segmentation values of the dataset. Furthermore, Liu et al. [70] proposed a Two-step
(BEATS) is proposed to remove the duplicate sensed data from large K-means Clustering (TKC) algorithm to cluster the image sensed data
datasets [66]. It divides the streams of time series data into 64 blocks, into two categories namely, Blurry and Clear Images. The Blurry images
clustered the streams in square matrices and transforms them into fre­ are discarded while the Clear Images are further processed at the edge
quency domain with the support of the Discrete Cosine Transform (DCT) platform. Clear image sensed data are segmented into two categories
technique. It is then quantized to obtain a finite data set. Then, the namely foreground (which contains the actual image data) and back­
duplicate data is removed from the finite data set with the support of ground (which contains useless image data) by utilizing the watershed
Eigen-values computation as shown in Fig. 6. segmentation function at the edge. This is done by using the Clear image
Consequently, Bu [67], develop an Efficient High-order Tensor Fuzzy and removing the background image, resulting in the updating of the
C-means (EHOFCM) algorithm, based on the Canonical Polyadic foreground image.
Decomposition scheme for the clustering of IoT streaming data. The Adaptive Moving Window Regression (AMWR) algorithm was
traditional fuzzy c-means (FCM) technique allocates each object or data developed by Akbar et al. [71], to determine the optimal training win­
record to two or more groups by computing a membership matrix. dow size of streamed data, by using a Lomb-Scale time series analysis.
However, IoT-sensed big data is characterized by heterogeneous fea­ For example, the temperature data retrieved over 24 h tend to contain
tures, which is a notable drawback to the conventional FCM for the repeated patterns or values. If the training window size of data used is
clustering of real-time IoT big data. The EHOFCM could solve the equivalent to the optimal periodicity of the data, it will learn all the local
problems as follows. Each data point or object is convert from the vector patterns, resulting in more accurate prediction. In addition, the window
sizes of data are predicted using the prediction horizon to ensure a
certain level of prediction accuracy. This allows the window size pre­
diction to be increased when the accuracy of the model is high and
decreased when the performance of the prediction model decreases.
Then, the output of the predicted block of data is transmitted to the
Complex Event Processing engine in the form of an event tuple. Thus,
applying predefined rules on the predicted block of data to detect or
predict the complex event.
An Elephant Herd Optimization-based Linear Kernel Support Vector
(EHO-LKSV) algorithm is proposed in Ref. [72], selecting the desired
subset features from a dimensionally sensed data set. It greedily searches
for the element space and determines a feasible feature subset to
continuously improve the given input data, as it speeds up the compu­
tation time of the entire process. Furthermore, the retrieved feature
subsets are classified into two different labels using a linear kernel
support vector technique to train the different data sets for optimal
prediction and accuracy results. Consequently, Wong et al. [73] pro­
posed a novel Perceptually Important Points (PIP) algorithm, for the
reduction of IoT time series sensing big data. It divides the sensed data
into segments by identifying a set of important points either a set of local
minima or local maxima out of the sensed data pools. At the initial stage,
the time series feature alongside sensed data features is segmented into
odd and even values, after which the similarity between features was
determined by using the Jaccard similarity distance method. Similar
instances with the same time retrieval value are eliminated across fea­
Fig. 6. Example of BEATS workflow [66]. tures, resulting in the reduction of the sensed data.

1500
A.E. Edje et al. Digital Communications and Networks 9 (2023) 1486–1515

Hadoop Artificial Bee Colony (HABC) algorithm is developed in Convolutional Neural Network (CNN) algorithm to retrieve the desired
Ahmad et al. [74], for redundancy of sensed data. In the initial stage, the sensed data on the cloud platform. It fine-tunes the sensed image dataset
classified sensed data are placed into a subset according to their simi­ (image of various foods) to generate a fine-grained model that is used for
larity characteristics by using the accuracy fitness values. In addition, the classification. Then, the fine-grained model is trained by Caffe. At
the parameter of Medication Rate (MR) is used to extract features from the initial stage, the model is loaded into the memory, as the data (food
neighboring subset data. Therefore, a random and uniform number image) is fed into the convolutional neural network as the input. Thus,
(from 0 to 1) is generated for each data in each sensed data subset. If it is the CNN features can be extracted by using the max-pooling and
observed that the value is less than the MR, then the feature is inserted Rectified Linear-Unit (ReLU) layers, to reduce the data feature di­
into a new subset. Otherwise, if the new subset happens to be better than mensions and speed up the convergence of the computing process.
the initial exploratory subset, it is considered as the last new subset. Li et al. [81] proposed a Deep Convolutional Computation model
Thus, this process is repeated until the best feature subset is reached. A (DCCM) algorithm to learn hierarchical features of sensed data by uti­
Deep Learning Long-short Term Memory (LSTM) algorithm is also pro­ lizing the tensor method, to extend the convolutional neural network
posed in Ref. [75], to predict the ground speed of aircraft landing, based from the vector space to the tensor space. Thus, the local features in the
on sensor data retrieved from the aircraft. It consists of six layers that are sensory data are optimally exploited and overfitting is avoided. Also, a
segmented into input, hidden, and output layers. A random forest al­ tensor convolutional layer is introduced to reach the deeper layers. The
gorithm is first used to classify the sensed data into twenty features. The initial layers are embedded on mobile devices, the intermediate layers
input consists of one layer, the hidden consists of four layers and the are presented in cloudlet and the deeper layers are embedded in the
output has only one layer. Consequently, the four hidden layers consist cloud server. The classification of the input sensed data (image) is
of 128, 64, 32, and 8 neurons while the output layer consists of one computed in the initial layers residing on the mobile device. Thus, the
neuron, which is used to obtain the predictive value of ground speed. back-propagation technique is used to train the layers by evaluating all
Mohammadi et al. [76] proposed a Deep Reinforcement Learning the layers until a desired confident classification result is obtained.
(DRL) algorithm to aggregate sensed data with the same distance posi­ Therefore, if it cannot classify the sample sensed data with sufficient
tion, labeled and imputed in the same cluster. Sensed data are clustered confidence, it is then transferred to the intermediate layers in the
based on their proximity level. It uses the variance auto-encoder func­ cloudlet for the classification process. The deeper layers are only
tion to identify the optimal data representing the closest distance in­ invoked when both the initial and intermediate layers are unable to
formation for locating the target object. Also, Yan et al. [77] proposed an classify the input data set to meet the desired confidence candidate. In
Integrated Deep Auto-Encoder algorithm for the management of sensed addition, the CDCNN can decide whether to reject or accept classifica­
data obtained from sensor devices. Data such as the state data recorded tions based on the threshold value passed as an argument at runtime.
within a period at each sub-processes before the failure is retrieved from This improves the accuracy and speed of the entire classification process.
the DECG which is known as the historical information. The historical Table 4 identifies the problems solved, performance results, and weak­
information is cleansed (e.g., filling missing data features) and divided nesses of the existing algorithms used for predicting data redundancy. It
into two categories, namely, distant records and recent records achieve also indicates the processes adopted by the algorithms, edge devices,
an optimal prediction. The distant records symbolize the records that are and cloud IaaS resource components as indicated in previous literature.
far away from the current time moment, while the recent records indi­
cate records that are close to the current time moment. Thus, the distant 5.3. Cloud resource provisioning for user requests
records are used to simulate the damaging trend, while the recent re­
cords are used to simulate the smoothing process of the recent change. Providing of efficient resource allocation ensures satisfactory cloud
Then, two outputs are fused and linear regression is performed to service for end-user requests. In IoT-enabled edge cloud computing,
convert hidden or discrete records to predict the Remaining Useful Life resources are allocated as Physical Machines and Virtual Machines in the
(RUL) of production machines. cloud IaaS platform, as shown in Fig. 7.
A deep learning based regression algorithm is proposed in Ref. [78]. How to integrate virtual machines into servers to support the
It consists of eight layers which are further grouped into three sections requested task determines the ability to minimize the resource alloca­
namely lower layer, intermediate layers, and higher layers. The lower tion problem [83]. This research focuses on the problem of load
and intermediate layers are implemented in the edge servers while the balancing when migrating virtual machine(s) from the source server to
higher layers are implemented in the cloud. The input sensed data the destination server for executing data filtering or analytical applica­
(image of dog and cat) from the camera sensor devices are transferred to tion requests. Load balancing refers to the pattern in which resources are
the lower layer in the edge servers for processing. The data are processed distributed to avoid overloading any Machine (Servers and VMs) as re­
at the intermediate layer where a filter or feature detector is utilized to sources are optimally utilized [84]. Also, it determines the migration of
extract features to obtain the relevant data. This reduces the size of the tasks to underutilized VMs and Servers for effective resource sharing
input data to a significant size known as the relevant data. In addition, [85]. In this article, we analyze the existing algorithms used for
the reduced relevant data is transferred from the edge server to the cloud resolving the related issues of load balancing while allocating resources
for further processing. The reduced data is passed to the higher layers to execute the filtering data or analytic application requests on the cloud
(consisting of neurons) residing in the cloud server, where it is filtered IaaS platform.
(feature detector) to retrieve optimal data. Jing et al. [86] proposed a Dynamic Priority and Load Balancing
Hybrid Multilayer Perceptron Convolution Neural Network (MLP- (DPLB) algorithm for VMs resource(s) load balancing carrying the
CNN) algorithm is developed in Ref. [79], for the fusion and classifica­ scheduling of IoT application request tasks execution on IaaS. The dy­
tion of sensed image data. Generally, it uses its fusion decision rule to namic priority function is composed of task value density and task
fuse the output sensed data based on the CNN confidence value. The computation urgency. In addition, the priority is subsequently increased
CNN confidence value is obtained by subtracting the maximum value of over a period of time to ensure timely execution of each task on the
a vector from its mean value, resulting in the optimal membership queue. The scheduling function consists of Earliest Completion Time
classification. However, if the CNN confidence value is higher than an (ECT) and retrieving the load status information of each VM with the
initial predefined threshold, it indicates that the CNN confidence is support of publish/subscribe method. The task are ordered according to
lower than another threshold. Thus, if the confidence of the CNN de­ their priority level, and the tasks with highest priority are scheduled first
pends on the initial and the other threshold, then the fusion output se­ to the optimal VMs among heterogeneous VMs that meet the QoS re­
lection with the higher confidence value is regarded as the actual quirements with the support of the task migration manager. The Brier
classification result. Consequently, Liu et al. [80], develop a Score method is used to predict an overloaded VM, whereby if a VM

1501
A.E. Edje et al. Digital Communications and Networks 9 (2023) 1486–1515

Table 4
Comparison of redundant data elimination techniques.
Article Title Algorithm Process Problem Outcome Weakness Edge Cloud Data Center
Resolve Device
Cloud No. of No. of
Storage Physical Virtual
Server Machine Machine
(PMs) (Vms)

SVM–T-RFE: A Support Vector Clustering Inefficient Efficiently Candidate feature PC N/A N/A N/A
novel gene Machine (SVM) elimination of eliminated set consists of
selection Recursive feature redundant data highly correlated
algorithm for Function redundancy with minimum features
identifying Elimination computation time
metastasis-
related genes in
colorectal
cancer using
gene expression
profiles [53]
Feature selection SVM Recursive Clustering Candidate feature Improved N/S PC Yes N/S N/S
and analysis on Function set consists of elimination of
correlated gas Elimination- highly correlated feature
sensor data based features redundancy while
with recursive Correlation Bias retrieving actual
feature Reduction sensed data
elimination (SVM-
[54] RFE+CBR)
On reliability of Neural Network Deep learning Inappropriate Effectively and Unclear result due PC N/A N/A N/A
neural network Sensitivity selection of efficiently to limited number
sensitivity (NNS) desired features retrieved the best of input features for
analysis applied among various features with training features
for sensor array features improved
optimization accuracy
[55]
Sensor array Fast Correlation- Classification Unclear result Obtained best Computation time Remote Yes N/S N/S
optimization based Filter due to limited combination of complexity Server
for mobile (FCBF) number of input features while
electronic nose: algorithm features and discarding
wavelet overlapping of redundant ones
transform and features
filter based selectivity
feature
selection
approach [56]
Fractional-order Fractional-order Classification Deviation of Effectiveness and Not considering Server N/A N/A N/A
embedding Embedding relevant sensing robustness in vital correlation
multi-set Multiset data due to noise eliminating noisy among different
canonical Canonical and limited data feature sets
correlations Correlations training samples
with (FEMCCs)
applications to
multi-feature
fusion and
recognition
[57]
Discriminant Discriminant Classification The identification Improved Still pose with Laptop PC N/A N/A N/A
correlation Correlation and elimination accuracy for feature redundancy
analysis: real- Analysis (DCA) of redundant detecting and within the intra
time feature feature between- elimination of and extra class in
level fusion for class feature redundant multiple classes or
multimodal similarities features a single class
biometric
recognition
[58]
Enhanced feature Intra-class and Classification The neglecting of Improved Computation time Remote Yes N/S N/S
fusion through Extra-class some correlation accuracy of complexity Server
irrelevant Discriminative information detection and
redundancy Correlation among various elimination of
elimination in Analysis feature sets due to feature
intra-class and (IEDCA-IRE) over-fitting redundancy
extra-class between data
discriminative points
correlation
analysis [59]
Mobile-cloud Jeffry- Classification Issue of duplicate Improved Constrained with Mobile N/A N/A N/A
assisted video divergence sensed images accuracy of computation time Phones
summarization Boolean Series- relevant sensed complexity and
framework for image retrieval cannot be applied
(continued on next page)

1502
A.E. Edje et al. Digital Communications and Networks 9 (2023) 1486–1515

Table 4 (continued )
Article Title Algorithm Process Problem Outcome Weakness Edge Cloud Data Center
Resolve Device
Cloud No. of No. of
Storage Physical Virtual
Server Machine Machine
(PMs) (Vms)

efficient based on while discarding for sequence sensed


Management of Ensemble SVM irrelevant ones dataset
remote sensing
data [60]
IoT as a Correlation Classification Computation Minimized Cannot be used for Local Yes 1 N/S
applications: Feature time complexity dimensionality of time-series sensing Server
cloud-based Selection-based for optimal sensed data sets data
building heuristic feature selection with less
management algorithm execution time
system for the
internet of
things [61]
Research on the Scale Invariant Classification Problem of Minimized Unable to consider Controller Yes N/S N/S
fusion method Feature extracting actual computation the spatial Raspberry
of spatial data Transform sensed data from resource usage correlations among Pi
and multimedia algorithm massive sensed while enhancing sensed data set
information of data sets accuracy of
multimedia extracting actual
sensor sensed data
networks in
cloud
computing
environment
[62]
A cloud-based Center- Classification Problem of poor Reduced the Consumes Desktop Yes N/S N/S
monitoring symmetric Local facial images and Gabor filter computation computer
system via face Gabor Binary the complexity of complexity and resources
recognition Pattern Gabor filter improved
using Gabor algorithm rotational
and CS-LBP invariance
features [63]
A big data Linear Classification Error during Improved the Highly Mobile Yes N/S N/S
classification Discriminant classification of sensitivity and computational phones
approach using Analysis-based sensed data for specificity and intensive
LDA with an Enhanced the retrieval of reduced the error
enhanced SVM Support Vector relevant ones during
method for ECG algorithm classification
signals in cloud
computing [64]
An incremental Incremental Fast Clustering Problem of Improved the Computation time N/S Yes 10 N/S
CFS algorithm Searching clustering dense effectiveness of consuming when
for clustering Clustering based peaks of dynamic clustering all the clusters are
large data in K-Mediods sensory data accuracy with to be merged.
industrial minimum
internet of computation
things [65] time, compared to
other methods
BEATS: Block of Block of Eigen- Clustering Unexpected drift Efficient Unable to estimate Local Yes N/S N/S
Eigen-values values algorithm data points in big detection of drifts the block size Server
Algorithm for data set with an improved before data arrival
Time Series classification and and involves
Segmentation clustering computation time
[66] accuracy complexity
An efficient fuzzy Efficient High- Clustering The inability of Improved There is still Remote N/A N/A N/A
c-means order Tensor fuzzy c-means computation limitation in the Server
approach based Fuzzy C-means algorithm to efficiency in aspect of clustering
on canonical algorithm cluster big terms of fo r accuracy as it
polyadic sensing data timeliness and mainly focuses on
decomposition stream in low end significant level the minimum usage
for clustering IoT devices such of clustering of computation
big data in IoT as controllers and accuracy as resources
[67] mobile phones. compared to the
conventional
method
Social choice Banag Pseudo- Clustering Problem of Improved Weakness in Broker Yes N/S N/S
considerations cluster based sensed data reliability aggregation sensed server
in cloud- aggregation filtering from a probability based data from
assisted WBAN data set on aggregation of heterogeneous (big
architecture for sensor data in data) sources
post-disaster terms of their
healthcare: level of need
(continued on next page)

1503
A.E. Edje et al. Digital Communications and Networks 9 (2023) 1486–1515

Table 4 (continued )
Article Title Algorithm Process Problem Outcome Weakness Edge Cloud Data Center
Resolve Device
Cloud No. of No. of
Storage Physical Virtual
Server Machine Machine
(PMs) (Vms)

data
aggregation
and
channelization
[68]
Federated Cobweb Clustering Highly Improved the Computationally Mobile Yes N/S N/S
internet of Expectation dimensionality of quality of sensed intensive and Phone
things and Maximization- sensed data set data by reducing consumes a large
cloud based K-Means and its noisy its dimensionality amount of
computing nature based on computer memory
pervasive aggregation
patient health strategy
monitoring
system [69]
A new deep Two-step K- Clustering Numerous blurry, Eliminated Unable to discover Mobile Yes N/S N/S
learning-based means background unusable sensor the actual Phone
food Clustering images (useless) data resulting in correlation among
recognition Algorithm data that limits improved the discovered
system for classification clustering patterns in the
dietary accuracy and accuracy sensed dataset
assessment on delayed
an edge transmission of
computing the data to the
service cloud
infrastructure
[70]
Predictive Adaptive Regression Challenge of Improved Utilizes huge N/A Yes N/S N/S
analytics for Moving Window complex event prediction amount of
complex IoT Regression streaming data accuracy in near computation
data streams algorithm without real-time and resource (memory
[71] leveraging minimized the space)
historical data for computation
prediction complexity
Effective features Elephant Herd Optimization/ Delay in the Enhanced feature N/S Fog Sever N/A N/A N/A
to classify big Optimization- Classification computation selection
data using based Linear processing of accuracy with
social internet Kernel Support sensed data minimum
of things [72] Vector (EHO- feature selection computation time
LKSV) algorithm and memory
usage
A novel data Perceptually Classification Problem of both Effective and Eliminate relevant Primary Yes N/S N/S
reduction Important Points local and global efficient sensed data and
technique with (PIP) optima in sensed elimination of alongside with Secondary
fault-tolerance data reduction. duplicate sensed duplicates ones due Server
for internet-of- data with same to missing data
things [73] time retrieval
Toward modeling Hadoop Optimization Computational Improved feature Not Specified Multi- Yes N/S N/S
and Artificial Bee complexity selection cluster
optimization of Colony involves accuracy with Hadoop
features algorithm extracting of response to with i5 3.4
selection in big features in real- timeliness GHz and 8
data-based time IoT GB RAM
social internet streaming data
of things [74]
A novel deep Deep Learning Deep learning Inaccurate Improved Weakness in the N/A Yes N/S N/S
learning Long-short term classification of classification selection of optimal
method for memory (LSTM) sensed data accuracy to some parameters to
aircraft landing algorithm retrieved from extent in a timely determine relevant
speed aircraft to manner sensed data from
prediction determine the irrelevant ones.
based on cloud- safety of its
based sensor landing speed
data [75]
Deep Deep Deep learning Problem of close Improved Highly Fog server N/A N/A N/A
reinforcement Reinforcement estimation of the classification computationally
learning in Learning target locations in accuracy and complex
support of IoT algorithm an indoor performance of
and smart city environment locating target
services [76] objects
(continued on next page)

1504
A.E. Edje et al. Digital Communications and Networks 9 (2023) 1486–1515

Table 4 (continued )
Article Title Algorithm Process Problem Outcome Weakness Edge Cloud Data Center
Resolve Device
Cloud No. of No. of
Storage Physical Virtual
Server Machine Machine
(PMs) (Vms)

Big data analytics Integrated Deep Deep learning Ineffective Effective Computationally Fog server Yes N/A N/A
for prediction Auto-Encoder retrieval of extraction of intensive
of remaining algorithm desired sensed desired sensed
useful life based data from data which
on deep massive data set enhances
learning [77] for prediction prediction
purpose accuracy
Learning IoT in Deep learning Deep learning Complications in Improved Highly Edge Yes N/S N/S
edge: deep algorithm obtaining optimal accuracy of computationally Servers
learning for the sensed data at desired sensed complex
internet of reduced sizes. data retrieval at
things with minimum size
edge computing
[78]
A hybrid MLP- Hybrid Deep learning Problem of Improved Computation Local Yes N/A N/A
CNN classifier Multilayer inaccurate classification intensive and huge Server
for very fine Perceptron classification of accuracy memory space
resolution Convolution different fine usage during
remotely Neural Network spatial resolution processing
sensed image algorithm remotely sensed
classification images
[79]
A new deep Convolutional Deep learning Inaccurate Improved Classification Mobile Yes N/S N/S
learning-based Neural Network classification of classification accuracy still needs Phones
food algorithm sensed data and accuracy by to be enhanced
recognition delayed eliminating
system for transmission of redundant data
dietary the data to the
assessment on cloud
an edge
computing
service
infrastructure
[80]
Deep Deep Deep learning Inefficient Improved Highly Local Sever N/A N/A N/A
convolutional Convolutional detection of the classification computationally
computation Computation correlations accuracy intensive
model for model algorithm between
feature learning heterogonous
on big data in sensed data
the internet of feature space
things [81]
The cascading Cascading Deep Deep learning Limited Reduced Classification Raspberry Yes N/S N/S
neural network: Convolution computational computation cost accuracy still needs Pi
building the Neural Network processing at reasonable to be enhanced Mobile
internet of algorithm resources on classification with an phone
smart things embedded mobile accuracy in a optimization Cloudlet
[82] devices timely manner algorithm server

workload exceeds the Brier Score it is considered as overloaded, but if it eight groups. Servers with optimal computation resources that is based
falls below the Brier score it is considered as underloaded. The Task on the parameters are selected from the resource pool to host VMs. As a
Migration Manager (TMM) then assigns or facilitates the migration of result, the virtual machines from overloaded servers are moved to the
tasks to the underload VMs to balance the loads on the available VMs. optimal server to process new jobs as they arrive.
Quasi-real-time Optimization-based Adaptive SERAC3 resource In [89], a Fuzzy Markov Normal (FMN) algorithm is proposed
allocation algorithm is introduced in Ref. [87], for selecting appropriate selecting VMs to be transferred from congested servers (hosts) to avoid
configuration of virtual machines to process IoT sensory big data oversubscribed hosts and minimize energy consumption. It categorizes
filtering application requests on the cloud IaaS upon arrival. It solved the the attributes of VMs based on their current utilization level and the
prevailing problem of the CP-BO algorithm, by extracting representative workload status of the host in which they reside with the support of
workloads for incoming sensing data, analyzing the data, and intelli­ fuzzy logic method. It then uses the Markov Normal technique is
gently determining an optimal configuration (type of virtual machines, deployed to determine which category of VMs should be migrated from
size of the virtual machine, and the number of virtual machines) for the the overloaded host to the less load target host. However, FMN only
clustering of each job in real time without considering the load performs migration of VMs based on host utilization without considering
balancing in PHs and VMs. However, problem of load balancing is solved the “memory utilization of VMs selection process which is the basic
in Ref. [88] by using a Virtual Machine and Selection algorithm for the requirement to be established before VMs migration” [90]. Therefore, an
processing of sensory data filtering or analytic application requests approximation Algorithm is proposed in Ref. [91], to solve the
(jobs) in the cloud IaaS platform. It uses parameters such as CPU utili­ content-based memory problems of VM selection from source to desti­
zation, memory utilization, and job arrival rate to cluster servers into nation, with a single overloaded host and a destination host when the

1505
A.E. Edje et al. Digital Communications and Networks 9 (2023) 1486–1515

different groups are assigned to different request classes, and the


formulation of VMs into dedicated PMs for request processing is
required. The number of assigned PMs is frequently updated for each
class of request tasks until all tasks are fully executed.
A sub-optimal resource-based Support Vector Regression-Genetic
(SVR-GA) algorithm is developed for the provisioning of cloud re­
sources for executing application requests [94]. The SVR is responsible
for predicting the resource specifications for jobs and creating the VMs
to resolve the uncertainty of job arrival on real-time basis. It also eval­
uates the number of resources with the support of two lookup tables that
keeps records of all related resource utilization rates for each VMs. It
also determines whether the VM should be increased or decreased based
on the number of application request tasks. The genetic algorithm is then
used to assign VMs to PHs for job execution. It adjusts itself optimally to
allocate VMs for the new arrivals. Jeyarani et al. [95], developed an
Adaptive Power-aware Virtual Machine Provisionary (APA-VMP) that
efficiently allocates VMs to a group of servers by satisfying the specifi­
cations of an optimal number of workloads. At the initial stage, the
workload of application service requests is estimated after which it al­
Fig. 7. An Example of servers and virtual machine(s) architecture. locates the desired number of VMs to active servers that can perform the
job. Also, Hieu et al. [96] proposed a Max-BRU algorithm to maximize
overloaded threshold is fixed. It uses the memory sharing-aware place­ resource utilization and balance the resource across different di­
ment system to exploit the content similarity of the VMs. This is done by mensions, to reduce the total number of servers in the active state. First,
sharing the same pages or sub-pages to simultaneously dispatch a batch a group of servers is instantiated as empty, while the initial value of
of VMs at the same time from the overloaded source host and determine servers under the running state is set to zero. Therefore, VMs are
the appropriate destination host from various hosts that can accommo­ assigned to servers until all VM requests are fully assigned. Then, the
date the migrated VM. average resource usage and the resource balance of all the active servers
Genetic and Particle Swarm Optimization (PSO) algorithms are are calculated. In this way, the most suitable server is selected from the
proposed in Ref. [92] to optimize the selection of virtual machines, group of the existing active servers and a possible VM request is selected.
which leads to optimal utilization computational resources on the cloud The SVR-GA, APA-VMP, and Max-BRU algorithms are prone to some
IaaS platform. The GA updates the optimal selection of VMs for limitations, which include the inability to preserve the unbalanced re­
executing the IoT application requests, by assuming that there are a sources when a server reaches to its maximum computation limit, its
given number of VMs (chromosomes) as the possible capacity to be disk and memory resources are wasted due to insufficient resources
allocated for executing the jobs. It then calculates their fitness by using resulting in high energy consumption, the inability to achieve quality of
the parameters namely, CPU utilization, Turnaround, and Waiting time service delivery due to real-time VM migrations and none of them are
to determine the optimal VMs with less execution time. Otherwise, the yet to be applied to IoT application requests services. However, Mekala
selection operation (crossover and mutation) is repeated to generate and Viswanathan [97] address most of the above limitations, by devel­
new VMs. The selection operation is used to select and obtain a matching oping an anergy-efficient resource ranking and utilization factor-based
pair of two VMs. It also compares the two VMs and randomly generates virtual machine selection (ERVS) algorithm. The algorithm is used to
the best one. It then implements the two-point crossover operation be­ solve the problem of energy utilization of server and virtual machine
tween two VMs to obtain two different offspring. Changes in the VMs are resources for job execution. It evaluates the resource utilization rate of
maintained by using the flip bit mutation, which in turn provides up­ the jobs and properly categorizes the jobs. Then, they are assigned to the
dates to one or more gene values in the VMs (chromosomes) from their appropriate VMs that can execute each class of jobs (IoT sensed data
initial state. The PSO algorithm assumes that there are numerous par­ analytics) by considering their resource utilization rate. This is realized
ticles that represent VMs and in the given time of iterations. Therefore, by sorting out the highly loaded servers with the support of the
each VM in the cloud (s) is considered as a possible solution that can be Compressive Resource Ranking (CRB) scheme, which places more
assigned to execute the incoming filtering application requests or jobs. It emphasis on resource utilization and energy consumption of servers.
then computes the fitness of each VM with its g-best. If the current value Then, VMs are assigned to execute the jobs (IoT sensory data filtering or
is better than the p-best, it places the current location as the p-best analytic application requests), by considering a limited type of job with
location. On the other hand, if the current value is better than the g-best, deadlines and the resource requirements for executing the specified job.
the g-best is reformulated to the current index in the array. Thus, the A hybrid-based Combinatorial Ordering First-Fit Genetic (COFFGA)
optimal VM is assigned as the g-best and update the entire process for and Combinatorial Ordering Next Fit Genetic (CONFGA) algorithms are
incoming job requests is updated. Experiment have been conducted to developed in Ref. [98], to reduce the resource waste per server and the
determine the performance of both algorithms, and the result shows that total number of servers in an active state. It determines the optimal VMs
PSO performs better than GA. that are capable of executing the requested workloads to be migrated to
Narman et al. [93] introduced a Dynamic Dedicated Server Sched­ the desired servers that are in an active state. While the First and Next Fit
uling (DDSS) algorithm, based on homogenous and heterogeneous heuristic techniques are responsible for making migration decisions to
servers for the processing of IoT application requests. It continuously reduce the total resource waste in each of the physical servers that are in
updates the number of dedicated PMs with VMs based on requests an active state, and the number of non-ideal physical servers. However,
arrival rates and their priority levels. Dedicated PMs are dynamically the hybrid-based algorithms are designed to solve the local optima
assigned to application requests by considering four important param­ problems without considering the global optima problems. Therefore, in
eters: task arrival rate, task priority levels, total service rate of servers in their research, Mohiuddin and Almogren [99] solve the global optima
the systems, and total service rate of servers capable of executing a problem by introducing a Workload Aware Virtual Machine Consoli­
single type of request. At the initial stage, IoT application request tasks dation (WAVMCM) algorithm to switch the idle physical servers into
are classified based on their arrival rate and priority levels. Then, PMs of hibernation mode. The resources of the server are classified into four
classes with different resource capacities to execute different VM

1506
A.E. Edje et al. Digital Communications and Networks 9 (2023) 1486–1515

requests. At the initial stage, new VM requests for executing jobs are use appropriate communication protocols to effectively communicate
classified according the amount of resource demand after which they are with other devices and networks on the IoT-based edge cloud
assigned to the VM class that is capable of executing each job. Thus, VMs infrastructure.
are migrated from low-load servers to intermediate-load servers within
the same class. It also determines which physical servers that are inac­ 6.1. Processes adopted in existing research
tive mode and put them into hibernation mode to minimize power
consumption. In this subsection, we discuss and analyze the processes adopted by
Abed and Younis [100] developed an Adaptive Firefly-enabled the existing algorithms (discussed in the previous section) to solve the
Weighted Round Robin (AFF-WRR) algorithm for dynamic and static problems (as highlighted in the tables of section 5) on IoT-enabled edge-
load balancing on VMs to process IoT application requests. The WRR is cloud computing.
responsible for estimating the weights of each VM based on three pa­ The classification process is a supervised machine learning technique
rameters namely CPU, memory, and latency. VMs with higher weights that assumes some prior knowledge to guide the partitioning operation,
are considered the most viable for executing large jobs followed by the formulating a set of classifiers for the representation of the best distri­
least weighted VMs. The Adaptive Firefly (AF) tracks the status of VMs bution of patterns [103]. Furthermore, classification processes are
and sorts them according to their weighted level. VMs with optimal designed to use both labeled and unlabeled data during the classification
resources are selected to execute incoming jobs on a real-time basis. The process. The set of labeled data is mainly used to train the classifier, such
status of VMs is regularly monitored in milliseconds by the AF, while as the prediction function, while the unlabeled data is classified by the
WRR frequently rebalances the status of VMs based on the Firefly results. classifier. The classification output is a finite set of predefined discrete
Chen and Chen [101] addressed the issue of load balancing on VMs classes or values, depending on the number of classes the classification
and servers in the cloud by developing a service-oriented Virtual Ma­ problems belong to either binary or multi-class categories [104]. The
chine (VM) placement algorithm. It uses the genetic algorithm to opti­ binary category or classification consists of two labels e.g. 0/1, good/­
mize the configuration of different VMs in order to achieve minimum bad, and white/black, while the multi-class category consists of multiple
communication overhead and total power consumption. In the initial labels. Consequently, the quality of the classification results is verified
stage, the population chromosome is generated, which represents the by determining the number of test patterns that are allocated to the
VMs. corresponding collections, which is called the accuracy rate.
It then assigns the required VMs that are capable of executing the The regression process is used to design the correlation between
jobs to the servers, ensuring that the VM load does not exceed the server input and output variables to achieve a predictive solution. The result of
limit. This is done through the fitness function where the communica­ regression processes is determine in the continuous domain. For
tion cost between the VMs is computed and summed up to obtain the example, in a diabetic monitoring application, a regression can predict
fitness value of one server. Therefore, the server with the highest fitness the symptoms of diabetes based on previous information. In general, the
is randomly selected from multiple servers to execute the job. Table 5 regression allows the prediction of the outcome of a specific event. It is
shows the solved problems, performances, and weaknesses of the algo­ widely used in the updating of IoT health and agriculture application
rithms used for Cloud IaaS resource allocation for the execution of domains.
sensory data filtering or analytic application requests on IoT-based edge The clustering process is an unsupervised learning process that ex­
cloud infrastructure. It also shows the processes used by the algorithms, tracts hidden patterns and structures from a given data set. Unlike
edge devices, and cloud data center resource components as depicted in classification which has some prior knowledge to strategize the parti­
previous research. tioning operation, clustering has no pre-knowledge of the strategy to be
Basu et al. [102] introduced a hybrid Genetic-Ant Colony Optimi­ used for the extraction process. It aggregates the data into groups, based
zation (GAACO) algorithm for scheduling the task requests of multi­ on their similar features and common structure as well as the data points
processor IoT applications on the Cloud IaaS. Each task is scheduled to a in different dissimilar clusters. Clustering is mainly used in recom­
single processor at a time in a heterogonous processor system. A task can mender systems and outlier detection. The verification and evaluation of
only be executed when its predecessors have finished execution. Simply clustering results is based on the amount or number of dimensions of the
put, once a task starts processing on a specific processor, the next task data set to which the clustering algorithm is applied. For example, the
request scheduled on the same processor must wait for the previous task sum of squared errors is mainly used for data clustering while the peak-
to finish executing. At the initial stage, the task and processor with the signal-to-noise ratio is mainly used for image clustering [105].
best fitness solution are determined among multiple processors and Deep learning is a machine learning technique that consists of deep
incoming task requests with the support of GAACO. After which the and complex architectures [106,107]. These architectures consist of
heuristic function is used to estimate the makespan (maximum execu­ many layers that convert input (e.g. images) into output data (e.g. an
tion time) taken for each task it traverses all the levels in the graph actual image) while learning progressively on higher-level features
structure. Therefore, a task with a larger makespan is scheduled first in [108]. Deep leering, also known as Deep Neural Networks (DNN), was
GAACO to avoid starvation processing resources. The capability of the considered complex to train data effectively and efficiently, it performs
processors is computed by the heuristic function, where the processors both classification and clustering processes during operation. It began to
with the highest probabilistic ratio of resources are selected to execute gain popularity in 2010 when it was discovered that training and
the task with the highest makespan. This process is repeated for several analysis of large, high-dimension IoT big data could be realized with
iterations until all tasks in the graph structure are fully executed. optimal results [109]. The stacked auto-encoders (SAEs) and DNN layers
sequentially in an unsupervised manner (pre-training), and fine-tuning
6. Processes and network protocols for IoT-edge cloud the stacked network with a supervised approach, could provide better
computing performance. However, they are known to be inflexible and require a
reasonable amount of work to generate acceptable results.
Processes are a set of instructions that are currently being executed. Optimization is the process of modifying some features of a system to
These sets of instructions that are processed logically to solve specific improve its performance or use limited resources more efficiently. For
problems which scientists call algorithms. In simple terms, processes are example, an algorithm can be optimized to speed up its process execu­
a set of instructions that are systematically applied by an algorithm to tion faster or to use minimum memory resources during process
solve a particular problem. On the other hand, network communication execution. Optimization techniques are mainly based on a bio-inspired
protocols govern the interaction between IoT sensing devices and edge- model whose algorithms are mainly used to solve optimization prob­
cloud platforms. Therefore, it is important for IoT low-power devices to lems. The optimization-based process is adopted by the algorithms in

1507
A.E. Edje et al. Digital Communications and Networks 9 (2023) 1486–1515

Table 5
Comparison of resource allocation techniques for executing IoT applications.
Article Title Algorithm Process Problem Resolve Outcome weakness Edge Cloud Data Center
Device
Cloud No. of No. of
Storage Physical Virtual
Server Machine Machine
(PMs) (Vms)

An open Dynamic Priority Optimization Inability to Reduced Local and global N/A Yes 12 124
scheduling and Load execute dependent makespan and optimum issue
framework for Balancing (DPLB) tasks and violation improved load
QoS resource algorithm of SLA balance on VMs
management in
the internet of
things [86]
SERAC3: Smart Quasi-real-time Optimization Exhaustive search Improved the High resource N/S Yes 16 N/S
and economical Optimization cost for optimal selection of utilization due to
resource based Adaptive resource selection. optimal inefficient load
allocation for SERAC3 resource configurations balancing
big data clusters allocation with lower
in community algorithm exhaustive search
clouds [87] cost
Resource-aware Resource-aware Optimization Issue of Reduced the Unable to Raspberry Yes 3 12
virtual machine Virtual Machine unbalanced load dispatch time for consider the Pi B+
migration in IoT and Selection due to unforeseen the provisioning bandwidth
cloud [88] algorithm changes upon job of PHs and VMs in communication
arrival the cloud data between VMs
center
Improvement of Fuzzy Markov Clustering Inefficient Improved Load Unable to N/S Yes 16 640
energy Normal selection of VMs balancing with consider the VMs
efficiency at Algorithm migration from optimal placement memory contents
cloud data overloaded host of VMs on target before migration
center based on servers and
fuzzy Markov minimal energy
normal consumption
algorithm VM
selection in
dynamic VM
consolidation
[89]
An optimization of Approximation Optimization Latency delay of Reduced the Energy N/A Yes 100 4000
virtual machine Algorithm VMs dispatched migrated VMs consumption is
selection and from overloaded memory data with still on the high
placement by to destination minimum energy side
using memory server consumption
content
similarity for
server
consolidation in
cloud [91]
A hybrid model of Particle Swarm Optimization Global optima Reduced Weakness in local Router Yes 100 1000
Internet of Optimizer entrapment and computation time space entrapment
Things and algorithm tasks computation and optimal
cloud Genetic time complexity provisioning of
computing to Algorithm storage
manage big data
in health
services
applications
[92]
Scheduling Dynamic Optimization Inefficient Minimized Weakness in load Not Yes 8 N/S
internet of Dedicated server provisioning of computation balancing among Specified
things scheduling servers for delay and servers
applications in algorithm homogenous and improved
cloud heterogeneous IoT utilization of
computing [93] data servers
An adaptive Support Vector Regression/ SLA variation for Improved Not considering N/A Yes 6 100
resource Regression- Optimization resource resource computation cost
management Genetic (SVR-GA) utilization utilization
scheme in cloud algorithm configurations
computing [94] with SLA between
VMs and cloud
service Providers
An adaptive Adaptive Power- Optimization Unexpected Improved load Still challenged N/A Yes 100 N/S
Resource aware Virtual overload and high balancing with with high energy
management Machine energy less energy consumption
scheme in cloud Provisioner consumption utilization
computing [95]
(continued on next page)

1508
A.E. Edje et al. Digital Communications and Networks 9 (2023) 1486–1515

Table 5 (continued )
Article Title Algorithm Process Problem Resolve Outcome weakness Edge Cloud Data Center
Device
Cloud No. of No. of
Storage Physical Virtual
Server Machine Machine
(PMs) (Vms)

(APA-VMP)
algorithm
A virtual machine Max-BRU Optimization Unbalanced load Improved and Unable to N/A Yes 150 N/S
placement algorithm due to inefficient balanced use of estimate
algorithm for activation of resources of overloaded PMs
balanced desired servers multiple types of upon arrival of
resource servers deployed new jobs.
utilization in
cloud data
centers [96]
Energy-efficient Energy-efficient Optimization Unbalanced Improved the Weakness in local Laptop Pc Yes 100 500
virtual machine resource ranking resource utilization rate search
selection based and utilization utilization and and minimize the entrapment and
on resource factor-based high energy number of live VM computation time
ranking and virtual machine consumption migrations with complexity
utilization factor selection less energy
approach in algorithm consumption
cloud
computing for
IoT [97]
Multi-capacity Combinatorial Optimization High number of Minimized the Unable to Yes 128 340
combinatorial Ordering First-Fit running servers total number of consider the issue
ordering GA in Genetic and and resource running servers of global optima
application to Combinatorial waste per server in with less resource while determining
Cloud resources Ordering Next Fit local search space waste the best VMs
allocation and Genetic among various
efficient virtual algorithms ones
machines
consolidation
[98]
Workload aware Workload Aware Optimization Inability for edge Improved Not considering Laptop Pc Yes 500 1500
VM Virtual Machine cloud data centers convergence rate the
consolidation Consolidation to process tasks in with minimum communication
method in edge/ algorithm a power-saving active server overhead between
cloud mode and the usage and less servers and VMs
computing for issue of global energy
IoT applications entrapment consumption
[99]
Developing load Firefly and Optimization Overloaded PMs Improved Inefficient Yes 1000 5000
balancing for Weighted Round due to unbalanced resource searching of
IoT-cloud Robin algorithms load on every utilization with candidate
computing resource minimum resources for job
based on response time execution
Advanced
Firefly and
weighted round
Robin
algorithms
[100]
Service oriented Service-oriented Optimization Challenges of high Minimized Unable to Yes 250 N/S
cloud VM virtual machine communication communication schedule the VMs
placement placement overhead between cost between VMs, for task execution
strategy for algorithm VMs under the energy usage and which disrupt
internet of same service the total PM utility load balancing in
things [101] the PMs
An intelligent Hybrid Genetic- Optimization Scheduling task Efficient load Not considering Yes 1000 2000
/cognitive Ant Colony dependency balancing with local search
model of task Optimization reduced makespan entrapment
scheduling for (GAACO)
IoT applications algorithm
In cloud
computing
environment
[102]

previous research, to solve optimization problems for the allocation of 6.2. Network communication protocols deployed in IoT-Edge Cloud
resources required for the execution of IoT data filtering (outliers and computing
redundancy elimination) on analytic applications have been extensively
analyzed in this paper. Communication protocols such as message Query Telemetry Transfer
(MQTT), Wireless Fidelity (WiFi), Bluetooth, General Packet Radio

1509
A.E. Edje et al. Digital Communications and Networks 9 (2023) 1486–1515

Service (GPRS), and Advanced Message Queue (AMQP) were used in 7. Potential challenges of IoT-enabled cloud computing
previous research which is briefly discussed as follows: infrastructure
Message Query Telemetry Transfer (MQTT) was invented by IBM in
the year 1999 as a standardized publish/subscribe push protocol. It is Whiel IoT-enabled cloud systems tend to solve many problems, there
specifically designed to facilitate the transmission of data under long are a reasonable number of challenges that have yet to be addressed.
network delays and low-bandwidth network conditions [110,111]. It This is because the potential solutions needed to solve these challenges
mainly runs on both TCP/IP and other network protocol that is designed have not been unravelled by the algorithms in previous research. Also,
to provide lossless and bidirectional connection. Consequently, MQTT is some of these remaining challenges require a handful of consistent ef­
suitable for resource-constrained IoT sensing devices that uses unreli­ forts from IoT-Cloud researchers and development communities, gov­
able or limited bandwidth channels [112]. It was standardized at Oasis ernments, policy makers, and platform/hardware providers. Some of
in 2013 with a channel bandwidth of 5–20 MHz, Downlink rate of 256 these challenges are discussed as follows;
MB and an uplink rate of 127 MB over the TCP/IP port of 8883. Unstructured IoT sensing data. In real-world sensing events, the
Bluetooth is a wireless communication protocol designed to provide sensed data generated by sensor devices is unstructured due to their
short-range connectivity for small devices such as smartphones, laptops, dynamic and heterogeneous nature. While NoSQL and Ubuntu servers
and hand-held devices. It was standardized by the 802.15.3 in 1999, and are designed to store the unstructured data, they have yet to make a
operates in the 2.4 GHz frequency band at a low rate of 200 kb/s. Its significant impact on real-world IoT sensory enabled cloud infrastruc­
main function is to allow audio and data streaming between devices. ture, as most researchers use structured data sources to experiment.
However, it consumes power energy during data transmission between However, the emergence of data lakes has proven to handle large vol­
devices. This led to the introduction of Bluetooth Low Energy (BLE) in umes of IoT sensor data. It is able to store both unstructured and
the year 2010 to address this high power consumption. BLE is designed structured data without any predetermined idea of how data will be
to extend the application of Bluetooth for use in low-power devices such used. It also does not use query languages or scheme mapping and can
as wireless sensors and wireless controllers [113]. Currently, the IETF store any type of data without limitations. Lake is challenged with two
6LoWPAN Working Group (WG) has already recognized the importance major issues. First, loss of agility may occur when it is utilized to store a
of BLE for the Internet of Things and is beginning to develop a specifi­ huge pool of data that urgently needs analysis and decision making.
cation for the transmission of IPv6 packets over BLE [114,115]. It is most Because they have to go through several processes before any mean­
commonly used by IoT sensing devices to transmit data to other devices. ingful data can be extracted from the data sample. Secondly, data
Fourth/Fifth Generation (4G/5G)-LTE Fourth Generation- Long- interchange may happen in the future since any data can be stored or
Term Evolution (4G LTE) are wireless network protocols designed and inserted [122]. This problem can be avoided by attaching metadata to
deployed for the Internet Protocol (IP)-based services, such as the the stored data and ensuring the attribute or source of the data. There­
combination of multimedia capabilities and applications that with high- fore, it is necessary to further investigate on how algorithms can be used
speed mobile broadband [116]. It is considered to be ten times faster to manage these unstructured sensory data both in the simulation
than 3G in terms of transmission speed and covers a wider range. As a environment and in a real-world scenario.
result, its Packet Core (EPC) and IP-based network framework, enable Protocol diversity and Standardization. The IoT-enabled edge cloud
the smooth delivery of voice and data packets as compared to the older platform is challenged with a universal protocol and standard, as
models of cell towers using GSM and UMTS. However, it is fast reaching different protocols are used to communicate and interact between de­
its limits due to the increasing demand for wireless data transfer as the vices of different development standards. While the platform has been
use of mobile phone usage grows and the reduction of latency in designed to enable multiple protocols to work together due to different
end-to-end connections due to the physical imposition of the Internet. requirements and their intended uses, but may lack the potential to
Therefore, the Fifth Generation (5G) mobile protocol has been intro­ support multiple protocols extensively. Therefore, it is worth further
duced to solve the aforementioned issues of the 4G. 5G is specifically exploring the development of an intelligent gateway as a possible so­
designed to support efficiently support massive machine-to-machine lution that can provide seamless interoperability and integration be­
and critical communications. Thus, a large number of actors and sen­ tween different protocols and algorithms that can intelligently select the
sors/meters that are deployed anywhere in the landscape will be able to optimal transmission channels for efficient data delivery. On the other
transmit their sensed data to other devices with a very low response time hand, various organizations, such as 3GPP, IEEE, ETSI, and M2M made
and high reliability [117]. It also has the potential to provide mobile some significant efforts to enforce standards for the development of IoT
broadband services such as high-speed multimedia streaming, video­ devices. They assume that interoperability will be provided by the
conferencing, Internet browsing, Voice-over-IP (VoIP), and efficient aforementioned standardization activities, but they may lead to higher
downloading and uploading of large files. uncertainty as they all provide specific and isolated solutions that can
Advanced Message Queue (AMQP) is a protocol that originated in the only cover their domains [123].
financial sector. It has been standardized by Oasis as a ubiquitous, secure Integration of contextual information. IoT data must be integrated
reliable, and open Internet protocol for handling messages [118]. It is with other data sources, such as context information that complement
regarded as a messaging middleware that uses different transport pro­ the understanding of the environment [124]. This is because IoT-sensed
tocols. AMQP provides asynchronous publish/subscribe communication data cannot understand the environment on its own. The emergence of
with messaging, in addition to its store-and-forward feature that ensures algorithms tends to speed up data filtering, analysis, and efficient
reliability during and after network disruptions [119,120]. This means reasoning due to the limited search space for the reasoning engine. For
that AMQP has the potential to be used in hazardous or hostile envi­ example, a sensor camera with the facial recognition cability can
ronments, as long as the overhead is not very high. perform surveillance in different contexts such as in government
Wireless Infidelity (Wi-Fi) is used to connect wireless devices such as buildings and residential areas [125]. Therefore, the sensed image data
laptops, smartphones, and PDAs. It is a brand of wireless communication collected from different contexts can assist the system to determine the
technology that is held by the Wi-Fi alliance to improve the interoper­ optimal action to be taken based on the retrieved face of an individual.
ability between wireless networking products based on the IEEE802.11 Overloading communication networks. With a large number of IoT-
standard [121]. It has a coverage range of 46 m (indoor) and 100 m enabled edge cloud components, maintenance and configuration of their
(outdoor) with a bandwidth channel of 20–40 MHz, followed by a underlying physical Machine-To-Machine (M2M) interactions and net­
downlink rate of 600Mbps and an uplink of 248Mbps at a frequency works becomes more complex. The dynamicity and heterogeneity of IoT
band of 2.4 GHz. big sensing data rapidly overwhelms the communication networks of the
IoT-enabled Edge Cloud platform. Therefore, the volume and speed of

1510
A.E. Edje et al. Digital Communications and Networks 9 (2023) 1486–1515

the data must be taken into account in order to provide optimal Quality (under-loaded) server(s) to execute jobs. This, avoid execution time
of Service (QoS). One way to address this issue is to provide for the complexity and overloading of available recourses (VMs and Servers).
storage and management of IoT sensor data across tiers of the IoT-based There are various servers (physical machines) in the cloud IaaS platform
edge cloud (ITC). This will compel application designers to deploy dedicated to specific tasks leading to the effective management of
complete data contextualization algorithms and techniques to obtain computation and storage resource provisioning. According to Ref. [14],
optimal QoS delivery across the ITC platform. The contextual techniques there are three main types of sensed data servers in the cloud IaaS
must consider the storage capabilities of the essential processing devices namely NoSQL, Relational Database (MySQL), and Hadoop servers. The
such as the sensing devices, microcontrollers, and edge servers and the NoSQL server is mainly designed to store and manage IoT sensed data
cloud data centers. due to their unstructured pattern. It has some features such as distrib­
Security challenge in collaborative edge-cloud processing. Further uted storage, dynamic schema, and horizontal scalability. However, it is
research is needed on how to perform cloud-side computations to limited in its ability to maintain consistency, isolation, atomicity, and
encrypt IoT sensor data, without revealing secrets or privacy to cloud durability of sensed data. In addition, it partially supports distributed
service providers. In addition, how the edge can send the sensing data to queries. On the other hand, Hadoop servers are unique distributed file
the cloud in a secure manner, ensuring that the sensing data is not repositories that store and efficiently manage massive unstructured
corrupted in the edge processing units, and cannot be intercepted by data. It enables IoT sensed data to be generated in XML format.
unauthorized persons (intruders) while in transit to the cloud platform, According to Shvachko et al. [103], the combination of both NoSQL
needs to be addressed. and Hadoop servers enables unified management and access to sensed
Real-time Filtering/Analytic data. Achieving useful and intelligent data. Relational database (MySQL) server stores massive structured
information in real time from a huge volume of sensed data collected data. However, different data are generated rapidly and the relationship
from several multiple IoT sensing devices has become a major challenge. between these data is of importance for a multitenant data storage
This is due to the unavailability real-time stream mining approaches. system [104]. Therefore, virtual relational data is merged with various
One way to overcome this challenge is the use of edge devices, which has conventional relational data in a single schema. Despite the potential
already been proposed. Nevertheless, there are other solutions (such as features of the cloud servers, they are still prone to the massively het­
algorithmic techniques) that are in the early stages of implementation erogeneous and dynamic nature of IoT big data. One way to solve this
and need to be optimized to extract meaningful and intelligent data on a problem is to use a virtual machine for effective and reliable data pro­
real-time basis, which needs to be addressed in the future. cessing and storage management on servers. Virtual machines subsets of
a server that can be used to perform highly intensive computational
8. General discussion and conclusion tasks. This enables a server to perform two or more tasks simulanously,
such as providing storage space for incoming sensed data and at the
As expected, the algorithms were able to resolve issues related to same time performing data filtering or analytic operations on the sensed
sensed data filtering based on outlier detection and redundancy elimi­ data using algorithms (e.g. algorithms used for both data outlier and
nation in a given data set. In addition, issues related to load balancing redundancy detection) based on user application requests.
for resource allocation, such as migrating VMs from source to target The Observation in Fig. 8(a) shows that redundancy problems were
server(s) to perform the execution of sensed data filtering or analytic mainly handled by the classification process, followed by deep learning
application requests, were significantly resolved. Outliers were pri­ and clustering, with limited use of optimization and less use of regres­
marily detected by considering the data type, spatio-temporal, attributes sion processes. On the other hand, the clustering process was mainly
correlations, user specification threshold, outlier score, and identifying used to detect outlier-related problems, followed by classification and
the type of outlier (error and event). There are two main types of data, deep learning, with limited use of regression and with no use of opti­
namely linear and non-linear. The linear data type is known as static and mization. The optimization process happens to be the most deplorable
is structured sequentially either in a list(s) or frame(s) format. Non- process for resource allocation in the Cloud IaaS, to execute sensory data
linear is dynamic data and is also known as time series or streaming filtering and analytic application requests (jobs), followed by clustering
data. Spatio-temporal simply means the distances between sensing data with limited usage of regression and without the use of classification
and time upon arrival from a particular source (sensor). In other words, process. In addition, clustering seems to outperform other processes in
sensing data within a specific close range are considered normal data terms of its usage by the existing algorithms studied in this research, as
while others are classified as outliers or anomalies. The similarity shown in Fig. 8(b). Followed by classification, optimization, deep
(correlation) between several data in a given dataset is also determined, learning, and regression processes respectively. This shows that the
as those with the same values are either clustered or aggregated into utilization of machine learning algorithms is also gaining more mo­
several groups or subsets according to their similarity level. Outliers mentum in IoT big data filtering and analytics on IoT-enabled edge cloud
within the subsets are then identified based on threshold(s) or score. computing.
We also observed that outliers are of two types as detected by some of Observations from the tabulated information indicated that a
the existing algorithms, namely error and event. Error outliers are reasonable amount of sensed data filtering algorithms used, to solve
generated by defect sensors which are often classified as irrelevant or
un-wanted data and are therefore eliminated from the dataset. Event
outliers, on the other hand, are useful data, most often used to report or
predict unforeseen circumstances. For example, the detection of a gas
leak from a cylinder is called an event outlier. In terms of redundancy,
feature selection and pattern recognition have been strongly. Features or
attributes of a given data set are subjected to a similarity check to
identify data with similar attributes or features. Thus, similar features
are selected to be merged into a single data feature or better still one out
of the similar data features is retained while others are eliminated.
Similarly, similar data patterns are classified or clustered together while
the irrelevant ones are identified and discarded.
Load balancing issues have mainly been solved by considering the
number of incoming requests prior to arrival while searching for optimal
or under-loaded VMs to migrate from source (overloaded) to target Fig. 8a. Utilization frequency of processes.

1511
A.E. Edje et al. Digital Communications and Networks 9 (2023) 1486–1515

and space complexity and are suitable for real-time sensing data.
Furthermore, the classification process is easy to develop and maintain
in parallel hardware such as the cloud data center. However, it requires
lengthy training and testing procedures on sensed data with poor
interpretability. Deep learning mainly combines the services of clus­
tering and classification to perform its operations on sensed data. They
are most suitable for large sensory data as observed from previous
research and tend to achieve high accuracy in terms of performance
compared to other methods. However, they require a large amount of
storage space and are more time-consuming to run than the other
methods. The optimization process has been mainly used at the cloud
data center to improve efficiency in terms of computation time
completion (makespan), minimum resource utilization and energy
consumption as observed from previous algorithms. Their main objec­
Fig. 8b. The level of Utilizing Processes in (%). tive is to prioritize available resources with optimal ability to execute
the required task.
outlier and redundancy related problems were implemented at the edge In conclusion, data filtering or analytic algorithms are the main tools
and edge/cloud respectively. The processes at the edge/cloud are mainly used to extract knowledge from massive data generated from various IoT
based on retrieving relevant sensed data while discarding the irrelevant sensing devices. On the other hand resource allocation algorithms are
ones. In addition, the algorithms executed only at the edge platform are used to provide optimal computation and storage resources for
mainly accessed immediately by the end-user applications. Fig. 9 shows executing data filtering/analytic application requests in IoT-enabled
that most of the algorithms are implemented in the edge/cloud respec­ cloud IaaS platform. Therefore, to achieve the desired knowledge in­
tively. This shows that the use of cloud to exploit the limitations of IoT formation, appropriate filtering algorithms that are effective and effi­
sensing device(s) and that of edge devices to process of IoT big data are cient need to be deployed due to the characteristic nature of IoT-sensed
gaining more momentum in this research area. big data. In this paper, we identify and discuss some related literature
In the aspect of processes adopted by the existing algorithms, the surveys on the IoT-based edge cloud domain, which motivated the
clustering process outperforms other processes in terms of usability current research under study. Extensive background information about
level. It can extract useful information from large sensed data as IoT devices, sensing data characteristics and factors that motivating the
compared to others, due to its sensitive nature to outliers and redundant integration of IoT, edge/cloud. A detailed description of the adopted
(noisy) sensed data. Clustering is done by partitioning based on the research methodology used to update the current research under
distance between instances, where each instance is identified as a cluster consideration. Filtering/analytic algorithms from previous researches
and merges the instances that are closer to one another until all instances were analyzed based on issues related to outlier detection, redundant
are fused into a single cluster. Observation also shows that most of the data discovery and elimination. The provisioning of optimal resources
clustering process was implemented on static sensed data retrieved from (PHs and VMs) for the execution of IoT application requests, taking into
various sensor devices. However, clustering such as Moving Window account load balancing issues is also presented. The problem solved, the
Principal Component Analysis [50] and Robust Incremental Principal successes and the weaknesses of algorithms are highlighted in tabular
Component Analysis [51] algorithms were implemented on dynamic or form. In addition, the processes employed by the algorithms were dis­
real-time sensing data. Clustering methods are also known to be rela­ cussed as well as the network communication protocols used for the
tively scalable and enable the number of clusters to be specified in transmission of sensor data on the IoT-enabled edge cloud domain.
advance, such as the Recursive Principal Component Analysis [32], Subsequently, the prevailing challenges that are yet to be resolved in the
Adaptive K-means [35] and Distance-based Algorithm [44]. On the IoTenabled edge cloud infrastructure are presented to help characterize
other hand, hierarchical clustering such as Enhanced Knowledge the research directions in this area. The significance of this research is to
Granule [42], Hyperellipsoidal [46], and Incremental Fast Searching provide new insight into the discovery of event and error outliers with
Clustering-based K-Mediods [65], specifies the number of clusters itself the use of machine and deep learning techniques. This have been
as it performs operation on any given dataset. ignored for long by computing research communities. The existing al­
Classification methods are mainly used in health-related sensory gorithms were applied in the healthcare sector to detect prevailing
data collection to predict redundant data, as can be seen from the diseases and symptoms in patients and minimize cybercrimes and
existing research. They are known for their efficiency in terms of time internet fraud. Also, in manufacturing company such as automobile
production plants for detection of faulty equipment. Detection of do­
mestic and industrial gas leak. Researchers in this area may capitalize on
the weaknesses of the existing algorithms to improve their performances
in future research. For example, managing IoT and cloud components to
minimize energy usage and emission of carbon-dioxide. Furthermore, to
improve the performance of resource allocation techniques to minimize
hazardous material use and resource waste during assigned task in the
cloud. Also, to apply outlier detection techniques to detect unauthorized
access to data repositories and assigning resources, to protect sensitive
information of cloud users’ request tasks. Subsequently, optimizing the
existing techniques for the retrieval of useful and intelligent data in real
time will be considered in future research. The authors are currently
implementing outlier techniques for detecting cancer in human brain.

Declaration of competing interest

The authors of the research paper titled “Survey on the Utilization of


Fig. 9. Analysis of comparative usage of algorithms on Edge and Edge/Cloud. Algorithms for IoT Data Analytics on Edge-based Cloud Infrastructure”,

1512
A.E. Edje et al. Digital Communications and Networks 9 (2023) 1486–1515

are hereby declared that there are no any competing or conflict of [30] N. Nesa, T. Ghosh, I. Banerjee, Non-parametric sequence-based learning approach
for outlier detection in IoT, Future Generat. Comput. Syst. 82 (2018) 412–421.
interests among them.
[31] Rui Zhang, P. Ji, D. Mylaraswamy, M. Srivastava, S. Zahedi, Cooperative sensor
anomaly detection using global information, Tsinghua Sci. Technol. 18 (3) (2018)
References 209–219.
[32] Tianqi Yu, X. Wang, A. Shami, Recursive principal component analysis-based data
outlier detection and sensor data aggregation in iot systems, IEEE Internet Things
[1] S.K. Sharma, X. Wang, Live data analytics with collaborative edge and cloud
J. 4 (6) (2017) 2207–2216.
processing in wireless IoT networks, IEEE Access 5 (2017) 4621–4635.
[33] P.M. Kumar, U.D. Gandhi, A novel three-tier Internet of Things architecture with
[2] L. Atzori, A. Iera, G. Morabito, The internet of things: a survey, Comput. Network.
machine learning algorithm for early detection of heart diseases, Comput. Electr.
54 (15) (2010) 2787–2805.
Eng. 65 (2018) 222–235.
[3] A. Botta, W. De Donato, V. Persico, A. Pescapé, Integration of cloud computing
[34] A.F. Santamaria, F. De Rango, A. Serianni, P. Raimondo, A real IoT device
and internet of things: a survey, Future Generat. Comput. Syst. 56 (2016)
deployment for eHealth applications under lightweight communication protocols,
684–700.
activity classifier and edge data filtering, Comput. Commun. 128 (2018) 60–73.
[4] K.M. Modieginyane, B.B. Letswamotse, R. Malekian, A.M. Abu-Mahfouz, Software
[35] Ş. Kolozali, D. Puschmann, M. Bermudez-Edo, P. Barnaghi, On the effect of
defined wireless sensor networks application opportunities for efficient network
adaptive and nonadaptive analysis of time-series sensory data, IEEE Internet
management: a survey, Comput. Electr. Eng. 66 (2018) 274–287.
Things J. 3 (6) (2016) 1084–1098.
[5] T. Islam, S.C. Mukhopadhyay, N.K. Suryadevara, Smart sensors and internet of
[36] Daniel Puschmann, P. Barnaghi, R. Tafazolli, Adaptive clustering for dynamic IoT
things: a postgraduate paper, IEEE Sensor. J. 17 (3) (2017) 577–584.
data streams, IEEE Internet Things J. 4 (1) (2017) 64–74.
[6] C. Madhavaiah, I. Bashir, Defining cloud computing in business perspective: a
[37] J. Diaz-Rozo, C. Bielza, P. Larrañaga, Clustering of data streams with dynamic
review of research, METAMORPHOSIS 1 (2) (2012) 50–65.
Gaussian mixture models: an IoT application in industrial processes, IEEE Internet
[7] F. Bonomi, R. Milito, J. Zhu, S. Addepalli, Fog computing and its role in the
Things J. 5 (5) (2018) 3533–3547.
internet of things, in: Proceedings of the First Edition of the MCC Workshop on
[38] Z. Ali, G. Muhammad, M.F. Alhamid, An automatic health monitoring system for
Mobile Cloud Computing, 2012, pp. 1–8.
patients suffering from voice complications in smart cities, IEEE Access 5 (2017)
[8] E.G. Petrakis, S. Sotiriadis, T. Soultanopoulos, P.T. Renta, R. Buyya, N. Bessis,
3900–3908.
Internet of Things as a Service (iTaaS): challenges and solutions for management
[39] G. Muhammad, M.F. Alhamid, M. Alsulaiman, B. Gupta, Edge computing with
of sensor data on the cloud and the fog, Internet of Things Journal 3 (2018)
cloud for voice disorder assessment and treatment, IEEE Commun. Mag. 56 (4)
156–174.
(2018) 60–65.
[9] B. Kitchenham, S. Charters, Guidelines for Performing Systematic Literature
[40] P. Verma, S.K. Sood, Fog assisted-IoT enabled patient health monitoring in smart
Reviews in Software Engineering, EBSE Technical Report EBSE-2007-01,
homes, IEEE Internet Things J. 5 (3) (2018) 1789–1796.
Durham, United Kingdom, 2007, pp. 1–53.
[41] M.Z. Wu, Y.T. Wang, Z.C. Liao, A New Shelf Life Prediction Method for Farm
[10] J. Qiu, Q. Wu, G. Ding, Y. Xu, S. Feng, A survey of machine learning for big data
Products Based on an Agricultural IoT, vol. 846, Trans Tech Publication, 2014,
processing, EURASIP J. Appl. Signal Process. 67 (2016) 2–16.
pp. 1830–1835.
[11] A. Farahzadi, P. Shams, J. Rezazadeh, R. Farahbakhsh, Middleware technologies
[42] H.-T. Chang, N. Mishra, C.-C. Lin, IoT big-data centred knowledge granule
for cloud of things: a survey, Digital Communications and Networks 4 (3) (2018)
analytic and cluster framework for BI applications: a case base analysis, PLoS One
176–188.
10 (11) (2015) 1014–1980.
[12] L. Cui, S. Yang, F. Chen, Z. Ming, N. Lu, J. Qin, A survey on application of
[43] H.M. Raafat, M.S. Hossain, E. Essa, S. Elmougy, A.S. Tolba, G. Muhammad,
machine learning for internet of things, International Journal of Machine
A. Ghoneim, Fog intelligence for real-time IoT sensor data analytics, IEEE Access
Learning and Cybernetics 9 (8) (2018) 1399–1417.
5 (2017) 24062–24069.
[13] M.S. Mahdavinejad, M. Rezvan, M. Barekatain, P. Adibi, P. Barnaghi, A.P. Sheth,
[44] M. Kontaki, A. Gounaris, A.N. Papadopoulos, K. Tsichlas, Y. Manolopoulos,
Machine learning for internet of things data analysis: a survey, Digital
Efficient and flexible algorithms for monitoring distance-based outliers over data
Communications and Networks 4 (3) (2018) 161–175.
streams, Inf. Syst. 55 (2016) 37–53.
[14] H. Cai, B. Xu, L. Jiang, A.V. Vasilakos, IoT-based big data storage systems in cloud
[45] I. Vasconcelos, R.O. Vasconcelos, B. Olivieri, M. Roriz, M. Endler, M.C. Junior,
computing: perspectives and challenges, IEEE Internet Things J. 4 (1) (2017)
Smartphone based outlier detection: a complex event processing approach for
75–87.
driving behavior detection, Journal of Internet Services and Applications 8 (1)
[15] S. Shadroo, A.M. Rahmani, Systematic survey of big data and data mining in
(2017) 1–13.
internet of things, Comput. Network. 139 (2018) 19–47.
[46] L. Lyu, J. Jin, S. Rajasegarar, X. He, M. Palaniswami, Fog-empowered anomaly
[16] M.H. Rehman, E. Ahmed, I. Yaqoob, I.A.T. Hashem, M. Imran, S. Ahmad, Big data
detection in IoT using hyperellipsoidal clustering, IEEE Internet Things J. 4 (5)
analytics in industrial IoT using a Concentric computing model, IEEE Commun.
(2017) 1174–1184.
Mag. 56 (2) (2018) 37–43.
[47] A. Daneshpazhouh, A. Sami, Entropy-based outlier detection using semi-
[17] E. Ahmed, I. Yaqoob, I.A.T. Hashem, I. Khan, A.I.A. Ahmed, M. Imran, A.
supervised approach with few positive examples, Pattern Recogn. Lett. 49 (2014)
V. Vasilakos, The role of big data analytics in internet of things, Comput.
77–84.
Network. 129 (2017) 459–471.
[48] Xiaoling Wang, Z. Xu, C. Sha, M. Ester, A. Zhou, Semi-supervised learning from
[18] M. Ge, H. Bangui, B. Buhnova, Big data for internet of things: a survey, Future
only positive and unlabeled data using entropy, in: Proceedings of International
Generat. Comput. Syst. 87 (2018) 601–614.
Conference on Web-Age Information Management, 2010, pp. 24–31.
[19] M. Aazam, S. Zeadally, K.A. Harras, Offloading in fog computing for IoT: review,
[49] A. Delimargas, E. Skevakis, H. Halabian, I. Lambadaris, N. Seddigh, B. Nandy,
enabling technologies, and research opportunities, Future Generat. Comput. Syst.
R. Makkar, Proceedings of MILCOM 2015-IEEE Military Communications
87 (2018) 278–289.
Conference, 2015, pp. 36–44.
[20] M. Mohammadi, A. Al-Fuqaha, S. Sorour, M. Guizani, Deep learning for IoT big
[50] M. Rafferty, X. Liu, D.M. Laverty, S. McLoone, Real-time multiple event detection
data and streaming analytics: a survey, IEEE Communications Surveys and
and classification using moving window PCA, IEEE Trans. Smart Grid 7 (5) (2016)
Tutorials 20 (4) (2018) 2923–2960.
2537–2548.
[21] X. Fei, N. Shah, N. Verba, K.-M. Chao, V. Sanchez-Anguix, J. Lewandowski,
[51] X. Kong, J. Chang, M. Niu, X. Huang, J. Wang, S.I. Chang, Research on real time
Z. Usman, CPS data streams analytics based on machine learning for Cloud and
feature extraction method for complex manufacturing big data, Int. J. Adv. Des.
Fog Computing: a survey, Future Generat. Comput. Syst. 90 (2019) 435–450.
Manuf. Technol. 99 (5–8) (2018) 1101–1108.
[22] F. Alam, R. Mehmood, I. Katib, N.N. Albogami, A. Albeshri, Data fusion and IoT
[52] S. Cheng, Z. Cai, J. Li, H. Gao, Extracting kernel dataset from big sensory data in
for smart ubiquitous environments: a survey, IEEE Access 5 (2017) 9533–9554.
wireless sensor networks, IEEE Trans. Knowl. Data Eng. 29 (4) (2016) 813–827.
[23] A. Ukil, S. Bandyoapdhyay, C. Puri, A. Pal, IoT healthcare analytics: the
[53] K. Yan, D. Zhang, Feature selection and analysis on correlated gas sensor data
importance of anomaly detection, in: Proceedings of the 2016 IEEE International
with recursive feature elimination, Sensor. Actuator. B Chem. 212 (2015)
Conference on Advanced Information Networking and Applications, AINA), 2016,
353–363.
pp. 994–997.
[54] X. Li, S. Peng, J. Chen, B. Lü, H. Zhang, M. Lai, SVM–T-RFE: a novel gene selection
[24] Mohiuddin Ahmed, A. Anwar, A.N. Mahmood, Z. Shah, M.J. Maher, An
algorithm for identifying metastasis-related genes in colorectal cancer using gene
investigation of performance analysis of anomaly detection techniques for big
expression profiles, Biochem. Biophys. Res. Commun. 419 (2) (2012) 148–153.
data in scada systems, EAI Endorsed Trans. Indust. Netw. and Intelligent. Syst. 2
[55] P. Szecowka, A. Szczurek, B. Licznerski, On reliability of neural network
(3) (2015) 1–16.
sensitivity analysis applied for sensor array optimization, Sensor. Actuator. B
[25] N. Shahid, I.H. Naqvi, S.B. Qaisar, Characteristics and classification of outlier
Chem. 157 (1) (2011) 298–303.
detection techniques for wireless sensor networks in harsh environments: a
[56] D.R. Wijaya, R. Sarno, E. Zulaika, Sensor array optimization for mobile electronic
survey, Artif. Intell. Rev. 43 (2) (2015) 193–228.
nose: wavelet transforms and filters based feature selection approach,
[26] Jin Wang, S. Tang, B. Yin, X.-Y. Li, Data gathering in wireless sensor networks
International Review on Computers and Software 11 (8) (2016) 659–671.
through intelligent compressive sensing, Proc. - IEEE INFOCOM (2012) 603–611.
[57] Y.-H. Yuan, Q.-S. Sun, Fractional-order embedding multiset canonical
[27] M. Stocker, M. Rönkkö, M. Kolehmainen, Making sense of sensor data using
correlations with applications to multi-feature fusion and recognition,
ontology: a discussion for residential building monitoring, in: Proceedings of IFIP
Neurocomputing 122 (2013) (2013) 229–238.
International Conference on Artificial Intelligence Applications and Innovations,
[58] M. Haghighat, M. Abdel-Mottaleb, W. Alhalabi, Discriminant correlation analysis:
2012, pp. 14–20.
real-time feature level fusion for multimodal biometric recognition, IEEE Trans.
[28] F. Ganz, P. Barnaghi, F. Carrez, Information abstraction for heterogeneous real
Inf. Forensics Secur. 11 (9) (2016) 1984–1996.
world internet data, IEEE Sensor. J. 13 (10) (2013) 3793–3805.
[29] S. Kamal, R.A. Ramadan, E.-R. Fawzy, Smart outlier detection of wireless sensor
network, Electronics and Energetics 29 (3) (2015) 383–393.

1513
A.E. Edje et al. Digital Communications and Networks 9 (2023) 1486–1515

[59] Z. Wu, K. Mao, G.-W. Ng, Enhanced feature fusion through irrelevant redundancy [88] G.J.L. Paulraj, S.A.J. Francis, J.D. Peter, I.J. Jebadurai, Resource-aware virtual
elimination in intra-class and extra-class discriminative correlation analysis, machine migration in IoT cloud, Future Generat. Comput. Syst. 85 (2018)
Neurocomputing 335 (2019) 105–118. 173–183.
[60] I. Mehmood, M. Sajjad, S. Baik, Mobile-cloud assisted video summarization [89] G. Shidik, A. Azhari, K. Mustofa, Improvement of energy efficiency at cloud data
framework for efficient management of remote sensing data generated by center based on fuzzy markov normal algorithm vm selection in dynamic vm
wireless capsule sensors, Sensors 14 (9) (2014) 17112–17145. consolidation, International Review on Computers and Software (IRECOS). 11 (6)
[61] J. Yu, M. Kim, H.-C. Bang, S.-H. Bae, S.-J. Kim, IoT as applications: cloud-based (2016) 511–520.
building management systems for the internet of things, Multimed. Tool. Appl. 75 [90] A. Beloglazovy, R. Buyya, Optimal online deterministic algorithms and adaptive
(22) (2016) 14583–14596. heuristics for energy and performance efficient dynamic consolidation of virtual
[62] M. Yuan, H. Sheng, Research on the fusion method of spatial data and multimedia machines in cloud data centers, Concurrency Comput. Pract. Ex. 24 (13) (2011)
information of multimedia sensor networks in cloud computing environment, 1–24.
Multimed. Tool. Appl. 76 (16) (2017) 17037–17054. [91] H. Li, W. Li, H. Wang, J. Wang, An optimization of virtual machine selection and
[63] C. Li, W. Wei, J. Li, W. Song, A cloud-based monitoring system via face placement by using memory content similarity for server consolidation in cloud,
recognition using Gabor and CS-LBP features, J. Supercomput. 73 (4) (2017) Future Generat. Comput. Syst. 84 (2018) 98–107.
1532–1546. [92] M. Elhoseny, A. Abdelaziz, A.S. Salama, A.M. Riad, K. Muhammad, A.K. Sangaiah,
[64] R. Varatharajan, G. Manogaran, M. Priyan, A big data classification approach A hybrid model of internet of things and cloud computing to manage big data in
using LDA with an enhanced SVM method for ECG signals in cloud computing, health services applications, Future Generat. Comput. Syst. 86 (2018)
Multimed. Tool. Appl. 77 (8) (2018) 10195–10215. 1383–1394.
[65] Q. Zhang, C. Zhu, L.T. Yang, Z. Chen, L. Zhao, P. Li, An incremental CFS algorithm [93] H.S. Narman, M.S. Hossain, M. Atiquzzaman, H. Shen, Scheduling internet of
for clustering large data in industrial internet of things, IEEE Trans. Ind. Inf. 13 things applications in cloud computing, Annals of Telecommunications 72 (1–2)
(3) (2007) 1193–1201. (2017) 79–93.
[66] Aurora Gonzalez-Vidal, Payam Brnaghi, Antonio F. Skarmeta, BEATS: blocks of [94] C.-J. Huang, C.-T. Guan, H.-M. Chen, Y.-W. Wang, S.-C. Chang, C.-Y. Li, C.-
eigenvalues algorithm for time series segmentation, IEEE Trans. Knowl. Data Eng. H. Weng, An adaptive resource management scheme in cloud computing, Eng.
30 (11) (2018) 2051–2064. Appl. Artif. Intell. 26 (1) (2013) 382–389.
[67] F. Bu, An efficient fuzzy c-means approach based on canonical polyadic [95] R. Jeyarani, N. Nagaveni, R.V. Ram, Design and implementation of adaptive
decomposition for clustering big data in IoT, Future Generat. Comput. Syst. 88 power-aware virtual machine provisioner (APA-VMP) using swarm intelligence,
(2018) 675–682. Future Generat. Comput. Syst. 28 (5) (2012) 811–821.
[68] S. Misra, S. Chatterjee, Social choice considerations in cloud-assisted WBAN [96] N.T. Hieu, M. Di Francesco, A.Y. Jääski, A virtual machine placement algorithm
architecture for post-disaster healthcare: data aggregation and channelization, for balanced resource utilization in cloud data centers, in: Paper Presented at the
Inf. Sci. 284 (2014) 95–117. 2014 IEEE 7th International Conference on Cloud Computing, 2014.
[69] J.H. Abawajy, M.M. Hassan, Federated internet of things and cloud computing [97] M.S. Mekala, P. Viswanathan, Energy-efficient virtual machine selection based on
pervasive patient health monitoring system, IEEE Commun. Mag. 55 (1) (2017) resource ranking and utilization factor approach in cloud computing for IoT,
48–53. Comput. Electr. Eng. 73 (2019) 227–244.
[70] C. Liu, Y. Cao, Y. Luo, G. Chen, V. Vokkarane, M. Yunsheng, P. Hou, A new deep [98] H. Hallawi, J. Mehnen, H. He, Multi-Capacity Combinatorial Ordering GA in
learning based food recognition system for dietary assessment on an edge Application to Cloud resources allocation and efficient virtual machines
computing service infrastructure, IEEE transactions on services computing 11 (2) consolidation, Future Generat. Comput. Syst. 69 (2017) 1–10.
(2018) 249–261. [99] I. Mohiuddin, A. Almogren, Workload aware VM consolidation method in edge/
[71] Akbar, A., Khan, A., Carrez, F., Moessner, K., Predictive analytics for complex IoT cloud computing for IoT applications, J. Parallel Distr. Comput. 123 (2019)
data streams, IEEE Internet Things J.. 4(5) (02017) 1571-1582. 204–214.
[72] S. Lakshmanaprabu, K. Shankar, A. Khanna, D. Gupta, J.J. Rodrigues, P. [100] M.M. Abed, M.F. Younis, Developing load balancing for IoT-cloud computing
R. Pinheiro, V.H.C. De Albuquerque, Effective features to classify big data using based on advanced firefly and weighted round robin algorithms, Baghdad Science
social internet of things, IEEE Access 6 (2018) 24196–24204. Journal 16 (1) (2019) 130–139.
[73] Wong Siaw Ling, Ooi Boon Yaik, Liew Soung Yue, A novel data reduction [101] Y.-H. Chen, C.-Y. Chen, Service oriented cloud VM placement strategy for Internet
technique with fault-tolerance for internet-of-things, Association for Computing of Things, IEEE Access 5 (2017) 25396–25407.
Machinery (ACM) 2 (1) (2017) 214–221. [102] S. Basu, M. Karuppiah, K. Selvakumar, K.-C. Li, S. Islam, H. K, M.M. Hassan,
[74] A. Ahmad, M. Khan, A. Paul, S. Din, M.M. Rathore, G. Jeon, G.S. Choi, Toward Md Bhuiyan, A. Z, An intelligent/cognitive model of task scheduling for IoT
modeling and optimization of features selection in big data based social internet applications in cloud computing environment, Future Generat. Comput. Syst. 88
of things, Future Generat. Comput. Syst. 82 (2018) 715–726. (2018) (2018) 254–261.
[75] C. Tong, X. Yin, S. Wang, Z. Zheng, A novel deep learning method for aircraft [103] K. Shvachko, H. Kuang, S. Radia, R. Chansler, The hadoop distributed file system,
landing speed prediction based on cloud-based sensor data, Future Generat. in: Proceedings of 2010 IEEE 26th Symposium on Mass Storage Systems and
Comput. Syst. 88 (2018) 552–558. Technologies, MSST), 2010, pp. 21–27.
[76] M. Mohammadi, A. Al-Fuqaha, M. Guizani, J.-S. Oh, Semisupervised deep [104] H. Yaish, M. Goyal, G. Feuerlicht, Multi-tenant elastic extension tables data
reinforcement learning in support of IoT and smart city services, IEEE Internet management, Procedia Comput. Sci. 29 (2014) (2014) 2168–2181.
Things J. 5 (2) (2018) 624–635. [105] C.-W. Tsai, C.-F. Lai, M.-C. Chiang, L.T. Yang, Data mining for internet of things: a
[77] H. Yan, J. Wan, C. Zhang, S. Tang, Q. Hua, Z. Wang, Industrial big data analytics survey, IEEE communications surveys & tutorials 16 (1) (2014) 77–97.
for prediction of remaining useful life based on deep learning, IEEE Access 6 [106] F. Samie, L. Bauer, J. Henkel, From cloud down to things: an overview of machine
(2018) 17190–17197. learning in internet of things, IEEE Internet Things J. 12 (2019).
[78] H. Li, K. Ota, M. Dong, Learning IoT in edge: deep learning for the internet of [107] J. Schmidhuber, Deep learning in neural networks: an overview, Neural Network.
things with edge computing, IEEE Network 32 (1) (2018) 96–101. 61 (2015) 85–117.
[79] C. Zhang, X. Pan, H. Li, A. Gardiner, I. Sargent, J. Hare, P.M. Atkinson, A hybrid [108] A. Ali, J. Qadir, R. ur Rasool, A. Sathiaseelan, A. Zwitter, J. Crowcroft, Big data
MLP-CNN classifier for very fine resolution remotely sensed image classification, for development: applications and techniques, Big Data Analytics 1 (1) (2016)
ISPRS J. Photogrammetry Remote Sens. 140 (2018) 133–144. 2–24.
[80] C. Liu, Y. Cao, Y. Luo, G. Chen, V. Vokkarane, M. Yunsheng, P. Hou, A new deep [109] G. Litjens, T. Kooi, B.E. Bejnordi, A.A.A. Setio, F. Ciompi, M. Ghafoorian, C.
learning based food recognition system for dietary assessment on an edge I. Sánchez, A survey on deep learning in medical image analysis, Med. Image
computing service infrastructure, IEEE transactions on services computing 11 (2) Anal. 42 (2017) 60–88.
(2018) 249–261. [110] M.H. Rehman, V. Chang, A. Batool, T.Y. Wah, Big data reduction framework for
[81] P. Li, Z. Chen, L.T. Yang, Q. Zhang, M.J. Deen, Deep convolutional computation value creation in sustainable enterprises, Int. J. Inf. Manag. 36 (6) (2016)
model for feature learning on big data in Internet of Things, IEEE Trans. Ind. Inf. 917–928.
14 (2) (2018), 790798. [111] S. Kraijak, P. Tuwanut, A survey on internet of things architecture, protocols,
[82] S. Leroux, S. Bohez, E. De Coninck, T. Verbelen, B. Vankeirsbilck, P. Simoens, possible applications, security, privacy, real-world implementation and future
B. Dhoedt, The cascading neural network: building the Internet of Smart Things, trends, in: Proceedings IEEE 16th International Conference on Communication
Knowl. Inf. Syst. 52 (3) (2017) 791–814. Technology, ICCT), 2015, pp. 23–30.
[83] J. Zhang, H. Huang, X. Wang, Resource provision algorithms in cloud computing: [112] D. Soni, A. Makwana, A survey on MQTT: a protocol of internet of things (IoT), in:
a survey, J. Netw. Comput. Appl. 64 (2016) 23–42. Proceedings of the International Conference on Telecommunication, Power
[84] A. Singh, D. Juneja, M. Malhotra, Autonomous agent based load balancing Analysis and Computing Techniques, ICTPACT), 2017, pp. 1–6.
algorithm in cloud computing, Procedia Comput. Sci. 45 (2015) 832–841. [113] A. Al-Fuqaha, M. Guizani, M. Mohammadi, M. Aledhari, M. Ayyash, Internet of
[85] D.C. Devi, V.R. Uthariaraj, Load balancing in cloud computing environment using things: a survey on enabling technologies, protocols, and applications, IEEE
improved weighted round robin algorithm for nonpreemptive dependent tasks, communications surveys and tutorials 17 (4) (2015) 2347–2376.
Sci. World J. 3 (9) (2016) 111–121. [114] C. Gomez, J. Oller, J. Paradells, Overview and evaluation of bluetooth low
[86] W. Jing, Q. Miao, G. Chen, An open scheduling framework for QoS resource energy: an emerging low-power wireless technology, Sensors 12 (9) (2012)
managemnt in the internet of things, KSII Transactions on Internet and 11734–11753.
Information Systems 12 (9) (2018) 4103–4121. [115] J.W. Hui, D.E. Culler, Extending IP to low-power, wireless personal area
[87] J. Li, Z. Lu, W. Zhang, J. Wu, H. Qiang, B. Li, P.C. Hung, SERAC3: smart and networks, IEEE Internet Computing (4) (2008) 37–45.
economical resource allocation for big data clusters in community clouds, Future [116] J. Nieminen, B. Patil, T. Savolainen, M. Isomaki, Z. Shelby, C. Gomez,
Generat. Comput. Syst. 85 (2018) 210–221. Transmission of IPV6 Packets over Bluetooth Low Energy Draft-Ietf-6lowpan-Btle-

1514
A.E. Edje et al. Digital Communications and Networks 9 (2023) 1486–1515

12, 2012. https://datatracker.ietf.org/doc/html/draftietf-6lowpan-btle-06. [121] V. Karagiannis, P. Chatzimisios, F. Vazquez-Gallego, J. Alonso-Zarate, A survey on


(Accessed 17 June 2019). application layer protocols for the internet of things, Transaction on IoT and
[117] E. Dahlman, S. Parkvall, J. Skold, 4G: LTE/LTE-advanced for Mobile Broadband, Cloud computing 3 (1) (2015) 11–17.
second ed.), second ed., Academic press United Kingdom and United State of [122] S. Song, B. Issac, Analysis of wifi and wimax and wireless network coexistence,
America, 2013. Journal of Computer Networks and Communications (IJCNC). 6 (6) (2014)
[118] F. Schaich, B. Sayrac, M. Schubert, H. Lin, K. Pedersen, M. Shaat, 63–78.
A. Georgakopoulos, FANTASTIC-5G: 5G-PPP Project on 5G air interface below 6 [123] R. Hai, S. Geisler, C. Quix, Constance: an intelligent data lake system, in:
GHz, in: Proceedings at the European Conference on Network and Proceedings of the International Conference on Management of Data, 2016,
Communications, 2015, pp. 1–7. pp. 11–18.
[119] OASIS, Advanced Message Queuing Protocol (AMQP) Cliams-Based Security [124] A. Meddeb, Internet of things standards: who stands out from the crowd? IEEE
Version, 1.0, 2017. https://www.oasis-open.org/committees/.../amqp-cbs-v1.0 Commun. Mag. 54 (7) (2016) 40–47.
-wd04.doc. (Accessed 12 May 2019). [125] C. Perera, A. Zaslavsky, P. Christen, D. Georgakopoulos, Context aware
[120] F.T. Johnsen, T.H. Bloebaum, M. Avlesen, S. Spjelkavik, B. Vik, Evaluation of computing for the internet of things: a survey, IEEE communications surveys &
transport protocols for web services, in: Proceedings of 2013 Military tutorials 16 (1) (2014) 414–454.
Communications and Information Systems Conference, 2013, pp. 54–62.

1515

You might also like