Reference Paper - Page 85
Reference Paper - Page 85
A R T I C L E I N F O A B S T R A C T
Keywords: The adoption of Internet of Things (IoT) sensing devices is growing rapidly due to their ability to provide real-
Internet of things time services. However, it is constrained by limited data storage and processing power. It offloads its massive
Cloud platform data stream to edge devices and the cloud for adequate storage and processing. This further leads to the chal
Edge
lenges of data outliers, data redundancies, and cloud resource load balancing that would affect the execution and
Analytic algorithms
Processes
outcome of data streams. This paper presents a review of existing analytics algorithms deployed on IoT-enabled
Network communication protocols edge cloud infrastructure that resolved the challenges of data outliers, data redundancies, and cloud resource
load balancing. The review highlights the problems solved, the results, the weaknesses of the existing algorithms,
and the physical and virtual cloud storage servers for resource load balancing. In addition, it discusses the
adoption of network protocols that govern the interaction between the three-layer architecture of IoT sensing
devices enabled edge cloud and its prevailing challenges. A total of 72 algorithms covering the categories of
classification, regression, clustering, deep learning, and optimization have been reviewed. The classification
approach has been widely adopted to solve the problem of redundant data, while clustering and optimization
approaches are more used for outlier detection and cloud resource allocation.
* Corresponding author. Department of Computer Science, Delta State University, Abraka, P.M.B 01 Abraka, Delta State, Nigeria.
E-mail addresses: [email protected] (A.E. Edje), [email protected] (M.S. Abd Latiff), [email protected] (W.H. Chan).
https://doi.org/10.1016/j.dcan.2023.10.002
Received 29 December 2019; Received in revised form 10 September 2023; Accepted 6 October 2023
Available online 13 October 2023
2352-8648/© 2023 Chongqing University of Posts and Telecommunications. Production and hosting by Elsevier B.V. on behalf of KeAi Communications Co. Ltd.
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
A.E. Edje et al. Digital Communications and Networks 9 (2023) 1486–1515
filtering processes) is evenly distributed across available cloud resources is called a smart sensor, which is with electronics that can perform
to achieve efficient execution, with minimum resource utilization, and multiple logic functions, two-way communication, make decisions, store
completion time. sensed information for future analysis, or offload it directly to the
Conversely, to the best of our knowledge, most of the existing liter Internet [5]. Therefore, the limitations of WSNs which include input
ature ’reviews in this field, are yet to fully explore the use of data-driven offset and span variation, cross sensitivity, and nonlinearity are auto
analytic-enabled cloud resource allocation algorithms for the execution matically corrected by the smart sensor processor. IoT sensing devices
of sensory data streams on IoT-enabled edge cloud Infrastructure. generate massive data that is dynamic and heterogeneous. In addition,
Therefore, it is crucial to investigate the existing data-driven analytic- the rapid rate at which unstructured and semi-structured data is being
enabled cloud resource allocation algorithms that are used to address generated is a common problem. There are four main characteristics of
the challenges of data outliers, data redundancy, and cloud resource IoT sensed data namely multi-source high heterogeneity, sensing data
load balancing on IoT-enabled edge cloud infrastructure. The contri inaccuracy, and weak semantic data with low-level and enormous data
bution of this work is as follows. dynamicity. Sensing data inaccuracy refers to the information collected
from IoT sensing devices, due to several limitations such as unreliable
1) A detailed analysis of the algorithms to resolve outlier and redun reading, which leads to data outliers. This brings about the complexity of
dancy issues in sensory data, highlighting the strengths and weak using the sensed data directly for its purpose. Therefore, appropriate
nesses of each algorithm in tabular form. multi-dimensional and data processing techniques need to be adopted
2) A detailed analysis of cloud resource allocation algorithms to address for accurate retrieval of sensed data.
resource load balancing challenges, for optimal execution of outlier Enormous data dynamics arise from interconnected multi-sensors,
and redundant sensory data. embedded in a large-scale environment. Communications between the
3) The identification and discussion of the various algorithms to various sensors always results in a large volume of data generated in real
perform their respective functions. Also, to compare their level of time, resulting in duplicate (redundant) data. Weak semantic data with a
usage in the IoT-edge cloud infrastructure. low level is attributed to the sensed data obtained from IoT sensing
4) We also highlight and discuss the various network communication devices. This is due to the spatial-temporal correlation relationships of
protocols that govern the interaction between the three-tiered IoT- the sensed data. Therefore, the extraction of useful information from the
enabled edged cloud architecture described in previous research. massive data generated is needs to be performed in an event-driven
5) Detailed current and potential challenges that pave the way for perspective. The acquisition of sensed data from distributed sensor
future research directions in this field are discussed in this paper. nodes varies from character to integer, video and audio streaming.
The provision of computational resources to store process sensed
The remainder of this paper is structured as follows. Section 2 in data, filtering or analysis cannot be handled by IoT sensor devices. This
troduces the background information about IoT-enabled edge cloud is due to the characteristics of sensed data, and the limited storage and
computing and the characteristics of IoT sensing data. Section 3 dis computation resources of IoT sensor devices. However, the cloud plat
cusses the research methodology used to update the current research form has been used in recent years to address these limitations. Its large
survey understudy. Section 4 discusses the existing literature surveys in pool of data storage resources and high computation power on complex
this field. Section 5 presents the analysis of existing algorithms deployed tasks leverages the limitations of the IoT sensor devices. The idea of
for resolving data outliers, data redundancy, and load balancing-related cloud computing was initiated in 1951 when John Macarthy envisioned
issues in IoT-enabled edge cloud infrastructure. Section 6 discusses the the importance of time-shared computers, to share hardware and soft
processes adopted by existing algorithms and network communication ware resources among multiple end-users with real time multi-tasking
protocols that govern the interaction between the three-layer architec and programming.
ture of the IoT-enabled edge cloud infrastructure. Section 7 presents the Madhavaiah and Irfan [6], defined cloud computing as a
current challenges that pave the way for future research directions. technology-based business model, delivered as a service over the
Finally, Section 8 presents a general discussion based on result of the Internet, where software and hardware computing services are accessed
research survey and ends with concluding remarks. virtually by end-users, based on-demand in a self-service perspective
irrespective of their geographical location. There are three services
2. Background offered by cloud platforms namely, Software as a Service (SaaS), Plat
form as a Service (PaaS), and Infrastructure as a Service (IaaS). SaaS is a
IoT technology has come a long way in recent years. The concept of web-based interface that allows end-users to access to cloud software
IoT was introduced by Kevin Ashton in 1999 and has been widely Auto- applications; PaaS enables developers to have access to various devel
ID Center. IoT is a worldwide network of interconnected devices opment tools for the implementation of software applications on its
addressable with standard communication protocols, with the Internet platform. On the other hand, IaaS provides storage and computation
as the convergence point [2,3]. Radio Frequency Identification (RFID) processing power. These services can be accessed from Cloud Service
and Wireless Sensor Networks (WSNs) are the most widely used IoT Providers (CSPs) such as Google, IBM Salesforce, Amazon, and
sensing devices since their inception. RFID is composed tag and reader Microsoft.
used to identify and track an object anywhere and anytime. It is used in Currently, the Google cloud platform provides intelligent IoT ser
the courier and logistic transportation industry to track goods in transit. vices that enable end-users to connect their physical IoT sensing devices
The WSNs consist of multiple sensor nodes deployed for environmental to the platform and process, analyze and store the sensed data. The
monitoring. WSNs communicate cooperatively and forward aggregated platform consists of fully managed cloud services and scalability, an
data to the network sink node or control system for further processing integrated software stack for on-premises computing, and machine
[4]. Both sensing devices can be integrated for better sensing and learning approaches for all IoT needs. Additionally, IBM launched its
tracking of objects by collecting information such as object locations, IBM IoT Connection Service in 2016 to formalize the use of IBM IoT for
movement, and temperature. connected offerings on the cloud, which ingests and transforms sensory
Over the past decade, IoT sensor devices have experienced tremen data obtained from sensors into meaningful insights. It also integrates
dous advances in development. Currently, IoT sensor devices pre- the existing functionalities of IoT for electronic solutions available on its
process, store, and transmit sensed data directly to the internet IBM Bluemix (an open standard for developing, managing, and running
without any human intervention. Unlike WSN, IoT sensor devices do not multiple applications) cloud platform with additional data storage, se
communicate with each other or inter-networked to transmit their curity, and monitoring functions. Microsoft introduces IoT services on
sensed data to a connected sink node. This emerging IoT sensing device its Cloud Azure platform namely, Azure IoT Hub and Azure IoT central.
1487
A.E. Edje et al. Digital Communications and Networks 9 (2023) 1486–1515
IoT Hub is an open cloud platform that enables end-users to securely network load by conserving bandwidth transmission rate. Table 1 shows
connect, monitor, and manage numerous devices to implement IoT ap the characteristics of the IoT sensing devices, edge, and the conventional
plications. Azure IoT Central is an IoT SaaS solution that makes it cloud data center, while
explicit for end-users to connect, monitor, and manage the physical Fig. 1 shows the three-layer physical architecture of the investigated
features of the IoT sensing devices. Even though the cloud may offer IoT-edge cloud infrastructure.
virtually unlimited resource storage and computational processing
power to leverage the limitations of IoT devices, the long-distance 3. Research methodology
network communication between them is a problem that needs a solu
tion. In other words, the long-distance communication between both The research survey understudy was conducted with the support of
technologies due to bandwidth availability may hinder the prospect of the methodology utilized by Kitchenham and Charters [9]. The explo
their integration if not curtailed. ration of the literature contributions, covering the years 2011–2019 was
The Long-distance communication between them leads to latency obtained from the academic research databases, which were considered
and delay which can hinder timely responses in critical situations. For the most relevant to achieve the objectives of the current study. These
example, healthcare workers needs to constantly monitor patients in databases include IEEE Xplore, Google Scholar, Springer, Scopus, and
critical condition by equipping them with IoT sensing devices in their ScienceDirect. The search phrase (“internet of things data” OR “mining
respective homes through the Internet provided by the cloud application algorithm” OR “edge” and “storage resource provisioning” OR “IoT
layer. Another challenge is in the area of privacy and security. Owners of data” OR “cloud data center”) was used to retrieve articles relevant to
IoT sensing devices tend not to send their data to the cloud data storage the current study. However, the results of the search query returned
center because of the unknown storage location. Recently, edge/ numerous research articles that were not relevant to the study.
gateway computing has been introduced lately to address these chal The relevant articles not retrieved after the initial search were ex
lenges. This distance is also responsible for the long delays that some pected to be present in the referenced list of these results and were
times exits between the clients’ IoT sensing devices and the traditional included in the analysis iteration. Research articles published only in
cloud [7,8]. English and contained in journals and conference proceedings were
Edge computing consists of clusters of servers that located close to considered. The initial result yielded a total of 502 retrieved articles.
the IoT sensing devices for timely response to service requests while Each article undergoes a series of quality assessment phases until it was
conserving bandwidth consumption rate and latency delay. On the other finally selected. These phases are composed of four sequences which are
hand, IoT sensing devices can offload their sensed data to the edge highlighted as follows;
servers when the load exceed their capabilities. The proximity between
edge and the IoT devices, provides an opportunity to control the latency • Evaluate the title and exclude it if it does not conform to algorithms
delay between the IoT devices and the traditional cloud. In addition, the used in the IoT-enabled edge-cloud platform (current study).
sensed data collected from IoT devices is stored and immediately pro • Read the abstract and discard it if it is not relevant to the current
cessed by the edge servers, with only a fraction of the data being sent to a study
cloud data center for long-term processing. This results in reduced • Read and evaluate the introduction and conclusion, reject if the
contribution is the same as other relevant articles.
• Analytically assess the research contribution quality and disqualify
Table 1 articles with low quality.
Comparative features of IoT sensing device(s), edge and cloud platform.
S/ Features IoT sensing Edge computing Cloud computing The considerations of articles accepted were considered based their
N devices degree of relevance to the current study. In addition, the writing quality,
1 Components Physical Clusters of Virtual resources soundness, clarity, and credibility of the contributions made by each of
devices servers the articles were considered.
2 Storage Minimum Limited Massive A total of 85 articles scale through the quality assessment, which are
capability
highly relevant to the current research question. These 84 articles
3 Data availability Source Process Process
4 Utilization of High, due to Minimum, due High bandwidth
further subjected to the process of extraction to retrieve the desired
Network continuous to the fact that consumption due information required to accomplish the objectives of the research study.
communication event sensed data is to the long The required information is highlighted below.
bandwidth rate sensing processed distance between
locally and the cloud and IoT
stored in edge devices
servers close to
the IoT sensing
devices
5 Computational Limited Limited Unlimited
resource power
6 Deployment Distributed Decentralized Centralized
7 Quality of service Continuous Faster, due to Slower, due to the
delivery in terms sensing location long distances
of timeliness proximity to IoT between IoTs and
devices cloud data-centers
8 Level of safety in Minimal risk Minimal risk on Long-distance
data transmission of data data attack communication
operations attack while while in transits. between IoT and
in transits. cloud pre-empts
attacks on data
while in transits
9 Resource and Not Edge servers Remote
service location applicable usually close to datacenters are
proximity for task IoT sensing usually far away
execution devices from IoT sensing
devices
Fig. 1. IoT-enabled edge cloud architectural design.
1488
A.E. Edje et al. Digital Communications and Networks 9 (2023) 1486–1515
• The algorithms used for outlier and redundant data detection. utilized in the Cloud of Things (CoT) platform. Firstly, the relevant
• Allocation of resources to execute IoT application requests, problem features of middleware are discussed followed by the presentation of
resolution, and outcome. various architecture and service domains. It also explores the types of
• Performance evaluation processes adopted by each algorithm. middleware that are appropriate for CoT-based platforms and outlines
• Strengths and weaknesses of each algorithm. future challenges and issues in the design of CoT middleware. Cui et al.
• The network communication protocols govern the transmission of [12] present an overview of the application of machine learning tech
sensed data from IoT sensing devices to the edge and to the cloud niques in the IoT domain. It discusses the current advances in applying
storage server. machine learning techniques to IoT-related processes such as IoT device
• The number of physical and virtual machines used to process (s) identification, security, IoT edge computing infrastructure, traffic
application requests for IoT-sensed data (based on detection of profiling, and network management. Also, research challenges and open
outlier and redundant data) on the edge enabled cloud Infrastructure issues of machine learning for IoT were extensively discussed.
as a Service (IaaS) platform. An overview of several machine learning algorithms that tend to
solve the challenges of IoT sensor data is presented in the research work
A total of 72 desired candidate articles emerged from the extraction of Mahdavinejad et al. [13]. It focused on the taxonomy of machine
process to be used in the current research under study. Fig. 2 presents a learning algorithms, describing how they have been used on IoT datasets
summary of the bibliometric data that includes 5 conferences and 67 to retrieve some relevant level of information. It also discussed the
Journal articles, for a total of 72 studies of the selected articles. It also prospects and challenges of the algorithms for IoT data analysis, paving
shows that the number of studies increased over the years. Therefore, it the way for the application of a Support Vector Machine (SVM) to
shows the novelty and increasing interest in using algorithms for IoT Aarhus smart city traffic data as a use case for a more detailed investi
data filtering/analysis based on the detection of outliers and redundant gation. Cai et al. [14] presented the recent achievements in the man
sensor data. In addition, the resource allocation algorithm is used to agement, processing, and extraction of IoT big data by utilizing several
provide optimal computation and storage resources for the execution of existing algorithms. Thus, the algorithms are defined and described in
sensory data filtering/analytic application requests on IoT-enabled edge terms of their significant features and capabilities, and the current
cloud computing infrastructure. The remaining 13 articles are consid challenges and opportunities associated with IoT big data are analyzed.
ered suitable for use as related research works in this field, which is Also, some typical examples and open issues in the application of al
discussed in the next section of this paper. Finally, 72 articles are gorithms for data acquisition are discussed. A thorough investigation of
qualitatively analyzed to synthesize the findings. the use of mining algorithms in the management of IoT big data by
Shadroo et al. [15]. It further identifies and discusses the architecture,
4. Related work framework, and applications of IoT big data. It also briefly discusses the
algorithms used for the processing of IoT data in three categories which
This section presents a brief description of previous literature review include descriptive, predictive, and classification.
in this research field which motivated the current research study. The In [16], a Novel Concentric Computing Model (CCM) is investigated
research survey conducted by Qiu et al. [10] is based on conventional for the use of IoT big data analytics applications. It discusses the sensing
and the latest machine learning algorithms for the processing and systems, and outer/inner gateway processors that make up CCM. In
managing IoT big data. It discusses relevant machine learning algo addition, it highlights current research work related to the IoT model for
rithms in recent research such as the representation of learning, deep big data analytic techniques. It also describes the current challenges that
learning, distributed and parallel learning as well as active learning, and need to be addressed for the deployment of CMM in the Internet of
kernel learning. The challenges and possible solutions to machine Things environments. Thus, various future research directions are pre
learning algorithms for the processing of IoT data are also analyzed. sented such as dispatching of significant data, real-time fusion of
Subsequently, the relationship between machine learning techniques streaming data, and data integration. Sharma and Wang [1] investigated
and signal processing techniques used in the processing of IoT big data is the enablers for live data analytics in wireless IoT networks and storage
highlighted and various open issues and research trends are outlined. provisioning by edge-enabled cloud computing environments. The
Farahzadi et al. [11] studied the middleware technologies that are framework for systematic processing between the cloud and the edge
device(s) is discussed. It also highlights the networks and the available
information in the cloud data center to support the edge computing units
to meet various performance requirements of the wireless IoT networks.
The key enablers in data analytics, such as NoSQL database and
distributed file systems, to handle the unstructured IoT big data in the
edge cloud are also discussed. In addition, machine learning techniques
are used to extract relevant data. Related challenges and selected future
research directions for researchers are also highlighted.
Recent advances in massive data analytics for IoT systems and the
potential requirements for managing big data, as well as enabling ana
lytic techniques enablers in IoT platforms [17]. Requirements such as
IoT connectivity, storage capabilities, quality of service, and real-time
services, and real-time analytics are discussed in detail. It explains the
role of data analytics in IoT applications such as smart health, smart
grid, and smart transportation as well as presents various open chal
lenges as future research directions. Ge et al. [18] investigated big data
technologies in several IoT domains to improve knowledge sharing
across the IoT domains. It explained the similarities and differences
between big data technologies and the analytics techniques (e.g., clas
sification, filtering, compression, extraction, indexing, prediction, and
storage) used in different IoT domains such as health, agriculture, and
Fig. 2. Survey of previous researches on data analytics algorithms for IoT- transportation to retrieve knowledge information. It further suggested
based edge cloud. how some big data technology deployed in a specific domain, can be
1489
A.E. Edje et al. Digital Communications and Networks 9 (2023) 1486–1515
1490
A.E. Edje et al. Digital Communications and Networks 9 (2023) 1486–1515
1491
A.E. Edje et al. Digital Communications and Networks 9 (2023) 1486–1515
Table 3
Comparison of outlier detection techniques.
Article Title Algorithm Process Problem Outcome Weakness Edge Cloud Data Center
Resolve Device
Cloud No. of No. of
Storage Physical Virtual
Server Machine Machine
(PMs) (Vms)
Data gathering in Adaptive Classification Abnormal Improved Unable to predict Remote Server N/A N/A N/A
Sensor nodes compressive sensing data accuracy and and identify
through sensing-based reduces Mean events that occur
intelligent autoregressive Squared Error on a frequent
compressive reconstruction and latency basis
sensing [26] algorithm
Making sense of Knowledge- based Classification Predict and Accurate Long non- Remote Server N/A N/A N/A
sensor data multi neural identify events prediction of automated
using ontology: network classifier that occur on a frequent learning process
a discussion for frequent basis anomaly sensing that relies on
residential data on a real- domain experts
building time basis for the
monitoring [27] provisioning of
sample data
Information Sensor symbolic Classification Issue of Improved Abstraction Remote server Yes N/S N/S
abstraction for aggregation minimizing the accuracy, accuracy still
heterogeneous approximation huge volume minimized data needs further
real world algorithm of sensing data volume and improvement
internet data latency
[28]
Smart outlier Fuzzy-based Classification Detection of Improved Unable to self- Remote Server N/A N/A N/A
detection of spatial-temporal error and event accuracy for check the
wireless sensor approach outliers in error/event prediction process
network [29] local/global outliers with using a
search space of minimum false mandatory
the sampled positive rate perception data in
data an IoT platform
Non-parametric Non-parametric Classification Problem of Enhanced Difficult to detect Remote Server Yes N/S N/S
sequence-based sequence learning self-check classification outliers in global
Learning algorithm identification accuracy with space data set
approach for using optimal increases is size
outlier perception for detection of
detection in IoT error/event error/event
[30] outliers outliers with less
detection false positive
rate
Cooperative Multivariate Classification The inability to Improved the Its static Remote Server N/A N/A N/A
sensor anomaly Gaussian-based differentiate Receiver transformation is
detection using principal between Operating unable to realize
global component erroneous and Characteristic optimal
information analysis event data (ROC) curves, prediction of
[31] from true and false erroneous data on
inconsistent positive rates for real-time basis
observations detecting
erroneously
sensed data
Recursive Recursive principal Clustering Inability to Improved It is Aduino Uno Yes N/S N/S
principal component make optimal aggregation computationally Microcontroller
component analysis prediction of with error intensive because
analysis-Based erroneous data growth and it tends to adapt
data Outlier in global space event detection recursively to the
detection and of massive accuracy changes in
sensor Data sensed data sensory data
aggregation in sets readings
IoT systems
[32]
A novel three-tier Logistic regression- Regression Ineffective Enhanced Inefficient Mobile Phones Yes N/S N/S
Internet of based prediction classification classification aggregation of Personal Server
Things algorithm of heart related accuracy rate data
architecture disease based on
with machine symptoms specificity, and
learning sensitivity
algorithm for
early detection
of heart diseases
[33]
Non-parametric Non-parametric Classification Problem of Enhanced Difficult to detect Remote Server Yes N/S N/S
sequence-based sequence learning self-check classification outliers in global
Learning algorithm identification accuracy with space data set
approach for using optimal increases is size
(continued on next page)
1492
A.E. Edje et al. Digital Communications and Networks 9 (2023) 1486–1515
Table 3 (continued )
Article Title Algorithm Process Problem Outcome Weakness Edge Cloud Data Center
Resolve Device
Cloud No. of No. of
Storage Physical Virtual
Server Machine Machine
(PMs) (Vms)
1493
A.E. Edje et al. Digital Communications and Networks 9 (2023) 1486–1515
Table 3 (continued )
Article Title Algorithm Process Problem Outcome Weakness Edge Cloud Data Center
Resolve Device
Cloud No. of No. of
Storage Physical Virtual
Server Machine Machine
(PMs) (Vms)
An automatic Linear prediction Regression Ineffective Efficient Not Specified Local Server N/A N/A N/A
health spectrum detection of detection of
monitoring algorithm voice disorder voice disorders
system for of patients with sustained
patients vowels and
suffering from running speech
voice based on
complications improved
in smart cities accuracy
[38]
Edge computing Convolution neural Deep Inaccurate Improved the High bandwidth Cloudlet Yes 1 3
with Cloud for network algorithm learning classification classification consumption Servers
voice disorder of voice accuracy in the during sense data
assessment and disorder detection of transmission from
treatment [39] symptoms voice (event) edge to the cloud
disorder
detection
Fog assisted-IoT Bayesian belief Deep Delay in the Improved Not considering Cloudlet Yes 8 N/S
enabled patient network algorithm learning classification accuracy of the spatio- Servers
health of sensed data classifying temporal Wireless
monitoring in acquisition dataset with less correlations Routers
smart homes time during among sensed
[40] classification data set
process
A new shelf life Back propagation Deep Issue of Effective Not specified Local Server N/A N/A N/A
prediction learning algorithm learning detecting and filtering of
method for farm elimination of normal sensed
products based erroneous data from the
on an outliers in big abnormal ones
agricultural IoT sensed data
[41]
IoT big-data Enhanced Clustering Challenges of Improved the Unable to Remote Server N/A N/A N/A
centered knowledge granule clustering high precision and minimize the
knowledge clustering complex accuracy of inter-cluster
granule analytic algorithm knowledge outlier detection distances of
and cluster granules for sensed data
framework for outlier
BI applications: detection
a Case base
analysis [42]
Fog intelligence Homoscedasticity Deep Improper Enhanced Duplicate sensed Arduino Uno N/A N/A N/A
for real-time IoT measurement learning selection of classification data and high Microcontroller
sensor data Leven’s test feed- threshold accuracy, computationally Local Server
analytics [43] forward neural leading to sensitivity and intensive
networks partial precision
algorithm classification
Efficient and Advance micro- Clustering Inefficient Improved Unable to Local Server Yes N/S N/S
flexible cluster-based outlier outlier detection addressed
algorithms for continuous outlier detection on with minimum uncertainty of
monitoring detection frequent data computational data streams,
distance-based algorithm stream and resource usage instances assigned
outliers over computation existential
data streams complexity probability
[44]
Smartphone- Complex event Clustering Computation Improved Weakness in Fog Server Yes N/S N/S
based outlier processing –based resource accurate identifying
detection: a Z-score and box complexity of detection outliers for
complex event plot constraints IoT outliers from emergency
processing devices online data scenarios due to
approach for streaming with lack of historical
driving less usage of data
behavior computation
detection [45] and memory
resources
Fog-empowered Hyperellipsoidal Clustering The Issue of Reduced energy A need for further Fog Server Yes N/S N/S
anomaly clustering high latency consumption improvement on
detection in IoT algorithm and energy and latency latency due to
using consumption while improving increase usage of
hyperellipsoidal anomaly computation
clustering [46] prediction resource
accuracy
(continued on next page)
1494
A.E. Edje et al. Digital Communications and Networks 9 (2023) 1486–1515
Table 3 (continued )
Article Title Algorithm Process Problem Outcome Weakness Edge Cloud Data Center
Resolve Device
Cloud No. of No. of
Storage Physical Virtual
Server Machine Machine
(PMs) (Vms)
Entropy outlier Entropy Outlier Clustering Insufficient Improved Cannot be Laptop Pc (1.6 N/A N/A N/A
detection using Detection Semi- labeled data outlier detection deployed in real GHz and 1 GB
semi-supervised supervised for training accuracy time big sensing RAM)
approach with (EODSP) algorithm and limited compared to data due to its
few positive positive other existing computational
examples [47] labeled approaches complexity
samples
IPCA for network Iterative Principal Clustering Variability of Improved Computation Remote Server N/A N/A N/A
anomaly Component feature scales outlier detection complexity in
detection [49] Analysis (IPCA) and the issue of efficiently iteratively
algorithm multiple mitigating the updating
number of limitations of distances of
dimension PCA neighborhood
data set
Real-time Moving Window Clustering Issue of time Improves the Unable to Remote Server Yes N/S N/S
multiple event Principal variance in prediction and disaggregate
detection and Component sensing data classification of multiple loss of
classification Analysis (MW- frequency outlier accuracy load and
using moving PCA) algorithm generation of
window PCA events
[50]
Research on real Robust Clustering Disaggregate Improved Unable to Remote Server Yes N/S N/S
time feature Incremental multiple loss of outlier detection determine the
extraction Principle load and in real-time by causes for the
method for Component generating of reducing the abnormal patterns
complex Analysis (RIPCA) events dimension of big
manufacturing algorithm IoT dataset and
big data [51] usage of
computation
resource
to half its original size. As a result, the compressed data is reconstructed, Then, the difference between each instance of the sequence image of the
allowing the adductive abstraction of the sensed data to discover events sensed data is computed. Also, the Influential Relative Grade (IRG) co
that occur over time. For example, the changes in temperature over a efficients for each sequence (class) sensed data are calculated to retrieve
day from cold to warm to cold, which represents a frequent or stable the relative mass function in each respective class. Therefore, outliers
temperature pattern. Therefore, newly observed states hidden from the are predicted as the classes with lower values while the inliers are classes
pattern are classified as outliers. with higher values. Furthermore, event outliers are detected by running
Kamal [29], introduced a fuzzy algorithm that utilizes spatiotem the algorithm on the fused parameter (attribute) dataset, while the error
poral similarity concept to detect outliers. However, could not provide type of outliers is obtained by running the algorithm on each parameter.
the self-check identification using perception data which is highly A Multivariate Gaussian-based Principal Component Analysis (MG-
required in an IoT Cloud-IaaS platform. It classifies the abnormal PCA) is designed in the research to predict erroneous sensing data
observation into error and event outliers. First, a data set generated by among irregular observations, based on the characteristic pattern of
sensor nodes is computed on the first-order difference |Si2–Si1|. Then, different dimensional sequence data [31]. The MG is first applied to the
the total difference is compared to the threshold value that is reached by retrieved sensed data set to determine the similarity among the data
the tolerance of the temperature sensor. Thus, if the total first-order points. It identifies the time point when the error occurred and further
difference does not exceed the threshold, the Si2 data point is consid retrieves the particular sensor node that is observed to be erroneous at a
ered similar to other data points. Otherwise, an outlier is obtained when particular time. Consequently, the PCA utilizes the principal vectors to
dissimilarity is observed on a data point. Second, the calculation is done determine the differences between data patterns for detecting the sensor
based on the distance between neighboring sensor nodes to discover the error readings that violate the inherent pattern extracted. However, the
spatial similarity between them. The Euclidean distance method used to MG-PCA approach is limited by the inability to track variations in dy
compute the similarity or correlation measure between two points (x, y) namic and heterogeneous sensing data due to its static transformation.
that have identical transmission range and time proximity. Then, the This has been addressed by the Clustered-based Recursive Principal
spatial similarity threshold is obtained by computing the mean distance Component Analysis (CR-PCA) algorithm proposed in Ref. [32]. It
of all data points in the proximity time. If the Euclidean distance d(x, y) initially aggregates the redundant sensed data while detecting the out
does not exceed the indicated threshold value, the data values at point X liers. The spatially correlated sensed data retrieved from the cluster head
are identified as similar to that of the data values at point Y. Otherwise, sensor members are aggregated by extracting the principal components
an error outlier is detected as a faulty sensor reading. and identifying the possible data outliers with the support of an
Conversely, a Non-Parametric Sequence-based Learning (N-PSL) al abnormal squared prediction error score, called the residual square. It
gorithm is proposed by Nesa et al. [30], predicting the outliers based on recursively updates its parameters to adapt to the dynamics of the sen
error event types. It considers the use of data perception for self-check sory data retrieved from the sensor devices.
detection both error and event outliers. The N-PSL algorithm is based A Logistic Regression-based prediction (LRP) algorithm is developed
on a gray relational analysis. In the initial stage, the sample data is to detect patients with heart diseases by classifying clinical sensory data
normalized by calculating the average image of each sampled data. collected from IoT wearable devices [33]. Sensed data collected from
1495
A.E. Edje et al. Digital Communications and Networks 9 (2023) 1486–1515
wearable sensing devices are constantly monitored. If the data values Two output neurons were used to represent voice disorder detection,
exceed the reliable predicted value, it’s considered abnormal, otherwise eight neurons were used for the voice disorder classification before
it becomes normal. Consequently, Santamaria et al. [34] proposed a being trained by fine-tuning the parameters for optimal detection of
fuzzy-based Human Activity Recognition (HAR) classifier algorithm to voice disorder from normal ones.
classify the sensed data into normal and abnormal activities of patients. A Bayesian Belief Network (BBN) algorithm is proposed in Ref. [40]
The algorithm updates the classification process by initiating some for the classification of sensory data. It classifies sensory data retrieved
constant values that are used to specify the number of clusters. It then from patients into two classes namely, abnormal and normal. The
selects a weighted component (fuzzier) and an initial membership ma retrieved data in the abnormal class indicates the severe or critical
trix with some threshold values are selected. The weighted components health status of the patients. On the other hand, the sampled data in the
regulate the class overlapping of the classes while assigning a data point normal event class indicates the normality of patient’s health status. A
to its cluster member. Furthermore, the threshold value is used to naïve Bayesian classification procedure known as conditional proba
evaluate the convergence in the iterations of the classification process. bility is used to achieve the classification process. Thus, a predefined
A Dynamic Symbolic Aggregation Approximation (SAX) is proposed value is set as the normal value, which indicates that the probability of
for the adaptive and non-adaptive window size, in the segmentation of all sampled data within the range of the predefined normal value will be
time sequence data stream with variation in real-time processing [35]. It classified as a normal class. Also, an abnormal class is obtained when the
divides the time sequence data set into equivalent segments and gen probability of having the sampled data value exceeds that of the normal
erates a string representation for each segment. First, the time sequence event class. To improve the prediction process, an important set of at
data is normalized to achieve a standard deviation and mean (average) tributes, namely the environment and the patient’s history, were used.
of one, before being converted it to a Piecewise Aggregation approxi Thus, the abnormal class is transmitted to the cloud for further pro
mation (PAA). Next, the data is divided into the desired number of cessing and analysis. Wu et al. [41], implemented a Back Propagation
windows and the average mean of the data falling in each window is Learning (BP) algorithm for the classification of sensed data retrieved
calculated by the PAA so that the size can be reduced. Then, a dis from sensing devices attached to agricultural crops. The sensed data are
cretization process is performed on the PAA coefficients (each window classified into abnormal and normal batches. The abnormal value or
size) by mapping the PAA coefficients to breakpoints which are gener attributes are discarded while the normal values are further processed
ated by the alphabet size (e.g. c), to determine the area of equal-size for on the cloud platform. The normal values are further divided into low,
retrieving the symbolic data representation. Puschmann et al. [36] normal, and high values based on predefined values consisting of − 1
developed an Adaptive K-means Clustering (AKC) for outlier detection. (low), 0 (normal), and 1 (high). The BP algorithm is then applied to
This is done by evaluating the dynamic sensor data and updating the accurately predict the crop yield. It multiplies the output and input data
cluster centroids according to the changes in the data stream at a given to obtain the gradient of the weight and places the weight in the opposite
time. Clusters are formulated based on the similar features of the sensory direction of the gradient by subtracting the ratio of it from the weight.
data stream retrieved over time. New cluster(s) are formed based on An Enhanced Knowledge Granule Clustering algorithm that is based
changes in data features. For example, if an incoming streaming data has on neuro-fuzzy analytic architecture is designed in Ref. [42]. It is used to
the feature types “Temp, Temp, Temp, Hum, and Hum ….n”, obviously extract complex knowledge granules from IoT sensory big data. First, the
the Temp features will be allocated to the initial cluster. The appearance facts are arranged in an array based on the multiple rule system to obtain
of the ‘Hum” will trigger the creation of another new cluster which will the knowledge granules for clustering. Each knowledge granule must be
contain the Hum feature data records. associated with a fitness tag, where the estimated value is present. This
Both the SAX and AKC approaches provide substantial assignment of is done through the attributes of the knowledge granule where the initial
sensory data instances to clusters but are unable to provide knowledge mapping for a cluster is performed by the fitness value, followed by the
information (i.e. inconsistency or consistent manner) about the data and next level mapping for sub-clusters under the previous cluster. In simple
how it is assigned to each cluster. These problems have been addressed words, based on the fitness rule, two clusters are said to be similar if both
by a Gaussian-based Dynamic Probabilistic Clustering (GDPC) algo have knowledge granules with homogenous attributes. The knowledge
rithm, proposed by Ref. [37]. It estimates the model parameters and any granules are mapped to individual clusters based on the attributes. Thus,
drifts in the data points. It further provides the membership likelihood of the sub-cluster within a cluster is maintained for the fitness of the
each data point to each cluster by utilizing the brier score. Brier score is explicitly identified knowledge granules. For example, in cluster, let X1
used to determine the abnormality of subsequent probabilities from be a knowledge granule such that X1 is mapped to sub-cluster (G < 0.5)
those objects or data points that are expected. Drifts or changes are if and only if G (X1) < 0.5; otherwise, X1 is mapped to sub-cluster (G >=
detected when the parameter of sensed data value is above the pre 0.5). Thus, the G values of clusters and sub-cluster are strongly estimated
defined threshold value of the brier score. Such drifts are known as by quantifying the outliers that are present. In addition, outliers that are
outliers. After drifts are detected, the brier score changes its behavior present in the clusters and sub-clusters degrade the G values.
and stabilizes for incoming sensor data. Furthermore, Raafat et al. [43] proposed a Homoscedasticity
A Linear Prediction Spectrum algorithm is introduced in Ref. [38] for Measurement-based Leven’s Test (HMLT)-based Feed-forward Neural
voice detection disorder, based on sensed data retrieved. It analyzes the Networks (FFNN) algorithm, for accurate classification of desired fea
energy variation in the spectrum to distinguish between disordered and tures of sensed data in the cloud. Sensed data retrieved from sensing
normal voices. This is done by dividing the vocal track into various tubes devices is filtered. Then, the HTMLT is applied to extract dissimilarity
from the glottis to the lips. It then performs an estimated analysis on the features from the denoised signal, by observing the signal for sudden
source signal using inverse filtering that triggers the computation of the changes. Then, the extracted features are inputted into the FFNN to
spectrum. Furthermore, the estimated signal is utilized to determine the proceed with the classification process. The FFNN classifies the sensed
energy distribution in vowel and running speech for the detection of data into abnormal and normal data. This is updated by sending the data
voice disorder. Muhammad et al. [39] develop a Deep Convolution from its input layer to the hidden layers. The neurons in the hidden
Neural Network (CNN) algorithm for classifying the sensed data into two layers are responsible for computing an activation function over the sum
segments namely voice disorder and normal voice. It uses its input image of input features, which are multiplied by a set of weight parameters.
consisting of blue, green, and red colors to classify the voice sampled The, results are output as either normal or abnormal sensed data. Data is
data obtained from the IoT sensing devices. Therefore, the use of transfer abnormal when there is a sudden change in the sensed data due to an
learning and a fine-tuned approach is used to train the CNN for optimal external event.
detection of voice disorder and to speed up the classification process, An Advanced Micro-cluster-based Continuous Outlier Detection
due to the limited voice sampled data obtained from the IoT devices. (AMCOD) algorithm is proposed in Ref. [44] for frequent monitoring of
1496
A.E. Edje et al. Digital Communications and Networks 9 (2023) 1486–1515
1497
A.E. Edje et al. Digital Communications and Networks 9 (2023) 1486–1515
1498
A.E. Edje et al. Digital Communications and Networks 9 (2023) 1486–1515
in Ref. [60]. It uses its Kernelize strategy to search the intra-class simi structure so that optimal informative frames (image) are extracted from
larity (pairswise correlation) and the similarity across different data the non-informative frames (image/data).
features in the same class to retain the relevant data in the fused data A Correlation Feature Selection-based Heuristic algorithm is intro
feature. Then, the irrelevant or duplicate data is eliminated. In simple duced to address the problem of duplicate sensed data on edge-based
words, it retains adequate dimensions of data features for class separa cloud IaaS [61]. It uses the feature predictive performance and
tion in each set of features and learned similarity features obtained by inter-correlation to guide its search for an optimal feature subset of
the discriminative structure. Firstly, it generates a between-class scatter sensed data. It also, considers the benefit of each feature of sensed data
matrix via the neighbor in proximity to the extra-class and intra-class. for predicting the class label, based on the level of inter-correlation
Then, it identifies the non-zero vectors of the corresponding non-zero among them. At the initial stage, it computes a matrix of
values in the between-class. Furthermore, the maximization of feature feature-feature correlations and feature-class from the training data set.
correlation between classes is obtained by computing the nonzero vec Then, an optimal search is performed to determine the feature subset
tors with their corresponding values that transform the entire matrices. space, by using the best first search technique to obtain the relevant
Therefore, the Kernelized intra-class correlation is used to concatenate features. Furthermore, Scale Invariant Feature Transform (SIFT) algo
the transformed features into a fused feature vector, as shown in Fig. 5(a rithm is developed by Yuan et al. [62], to manage the influx of sensing
and b). This results in the elimination of irrelevant redundant features data retrieved from multimedia sensor nodes. The retrieved data are first
present in the fused feature vector. fused by using the Laplace Pyramid Transform (LPT) method. Then, the
Jeffry-divergence (JD) and Inter-frame Correlation of Color Channels different sizes of Gaussian Kernels (known to have more accurate scale
on Boolean Series–based Ensemble-based Support Vector Classification transform) are selected to perform the scale transform of the fused data,
Algorithm is proposed in Ref. [59]. Thus, to minimize the massive to obtain the accurate candidate feature points. Therefore, the edge
amount of sensed data retrieved from camera sensing devices. response points of low contrast and instability of the sensed data are
The obtained video frames (sensed data) are compared based on their discarded. Each feature point is allocated a direction by the gradient
color and structures. If similarities are detected between two or more information of neighboring pixels to improve the accuracy of the feature
frames, their divergence is computed using the color histogram to obtain point matching. Li et al. [63] propose a Center-symmetric Local Gabor
the actual corresponding frame. Frames with high similarity measure Binary Pattern (CSLGBP) feature extraction algorithm to obtain the
are discarded. Then, a multi-fractal technique is used to discover the actual face image captured by camera sensor devices. The input face
frames, based on different texture structures at different scales with local image is convolved with the Gabor kernels to retrieve the magnitude
densities, to provide rich descriptors to categorize the structures of the information of well-defined specific orientations and scales. The speci
frames. Then, an SVM is used to train each category of the frame the fied orientations at the same scale are accumulated to formulate a new
1499
A.E. Edje et al. Digital Communications and Networks 9 (2023) 1486–1515
scale feature. The features of each scale are computed using the CS-LBP space to its tensor format by a bijection function. Then they are aggre
descriptor from the retrieved Gabor scale images to extract and obtain gated into clustered groups based on their similarity features. In addi
the relevant image. tion, the attributes of each object or data record are greatly reduced
Linear Discriminate Analysis-based enhanced Support Vector algo using the canonical polyadic decomposition scheme. Thus, to obtain an
rithm is proposed in Ref. [64], to address the uncertainty with sensed optimal compression rate as it reduces the huge volume of raw sensing
image signal or data retrieved from camera sensor devices. It computes data to some significant extent. Therefore, enabling the traditional
various characteristic features of the data sets and classifies the features fuzzy-c means to cluster the huge sensed data with low-end devices such
present in the pre-processed sensed image signal. It also detects the Q as controllers and mobile phones.
wave, R wave, and S wave in the pre-processed input image signal to Banag-Pseudo-cluster-based aggregation algorithm is developed in
determine the various heartbeat levels (e.g., Left Bundle Branch Block, Ref. [68], to determine the exigency or criticality of various data
Right Bundle Branch Block, Premature Ventricular Contraction, and collected from multiple sensor nodes. Data is aggregated into groups
Premature Atrial Contractions) and classify them accordingly. The based on the level of their exigency at the edge (gateway) platform.
weighted kernel function computes the weight which is used to identify Therefore, the data with the highest exigency value is aggregated first
the R, Q, and S waves for optimal classification of the heartbeat levels. before the others. This is done repeatedly and systematically until all the
Consequently, the Incremental Fast Searching Clustering-based sensed data are fused into their respective groups and sent to the cloud
K-Mediods (ICFSKM) algorithm is introduced in Ref. [65], to discover data center for further processing. Abawajy et al. [69] designed a
the underlying patterns of the dynamic sensing data, by integrating the Cobweb Expectation Maximization and K-means, which is also called
initial data patterns into the previous ones by using its combination the Rank Correlation Coefficient (RCC) algorithm for the clustering of
operations. The cluster centers are continuously updated by the kme ECG sensed data. First, it uses the fuzzy-based data fusion technique to
doids upon the arrival of new sensing data. In simplicity, it maintains a aggregate only the relevant values of the sensed data and discard the
set of clustered data with similar feature patterns, so it either creates others. Thus, the relevant data sets are grouped into different indepen
new sets of clusters or assign them to the previous cluster upon new dent clusters. Then, a consensus function is used to combine the clusters
sensing data arrival. to generate the final consensus cluster by partitioning all the elements or
A Blocks of Eigenvalues Algorithm for Time Series Segmentation values of the dataset. Furthermore, Liu et al. [70] proposed a Two-step
(BEATS) is proposed to remove the duplicate sensed data from large K-means Clustering (TKC) algorithm to cluster the image sensed data
datasets [66]. It divides the streams of time series data into 64 blocks, into two categories namely, Blurry and Clear Images. The Blurry images
clustered the streams in square matrices and transforms them into fre are discarded while the Clear Images are further processed at the edge
quency domain with the support of the Discrete Cosine Transform (DCT) platform. Clear image sensed data are segmented into two categories
technique. It is then quantized to obtain a finite data set. Then, the namely foreground (which contains the actual image data) and back
duplicate data is removed from the finite data set with the support of ground (which contains useless image data) by utilizing the watershed
Eigen-values computation as shown in Fig. 6. segmentation function at the edge. This is done by using the Clear image
Consequently, Bu [67], develop an Efficient High-order Tensor Fuzzy and removing the background image, resulting in the updating of the
C-means (EHOFCM) algorithm, based on the Canonical Polyadic foreground image.
Decomposition scheme for the clustering of IoT streaming data. The Adaptive Moving Window Regression (AMWR) algorithm was
traditional fuzzy c-means (FCM) technique allocates each object or data developed by Akbar et al. [71], to determine the optimal training win
record to two or more groups by computing a membership matrix. dow size of streamed data, by using a Lomb-Scale time series analysis.
However, IoT-sensed big data is characterized by heterogeneous fea For example, the temperature data retrieved over 24 h tend to contain
tures, which is a notable drawback to the conventional FCM for the repeated patterns or values. If the training window size of data used is
clustering of real-time IoT big data. The EHOFCM could solve the equivalent to the optimal periodicity of the data, it will learn all the local
problems as follows. Each data point or object is convert from the vector patterns, resulting in more accurate prediction. In addition, the window
sizes of data are predicted using the prediction horizon to ensure a
certain level of prediction accuracy. This allows the window size pre
diction to be increased when the accuracy of the model is high and
decreased when the performance of the prediction model decreases.
Then, the output of the predicted block of data is transmitted to the
Complex Event Processing engine in the form of an event tuple. Thus,
applying predefined rules on the predicted block of data to detect or
predict the complex event.
An Elephant Herd Optimization-based Linear Kernel Support Vector
(EHO-LKSV) algorithm is proposed in Ref. [72], selecting the desired
subset features from a dimensionally sensed data set. It greedily searches
for the element space and determines a feasible feature subset to
continuously improve the given input data, as it speeds up the compu
tation time of the entire process. Furthermore, the retrieved feature
subsets are classified into two different labels using a linear kernel
support vector technique to train the different data sets for optimal
prediction and accuracy results. Consequently, Wong et al. [73] pro
posed a novel Perceptually Important Points (PIP) algorithm, for the
reduction of IoT time series sensing big data. It divides the sensed data
into segments by identifying a set of important points either a set of local
minima or local maxima out of the sensed data pools. At the initial stage,
the time series feature alongside sensed data features is segmented into
odd and even values, after which the similarity between features was
determined by using the Jaccard similarity distance method. Similar
instances with the same time retrieval value are eliminated across fea
Fig. 6. Example of BEATS workflow [66]. tures, resulting in the reduction of the sensed data.
1500
A.E. Edje et al. Digital Communications and Networks 9 (2023) 1486–1515
Hadoop Artificial Bee Colony (HABC) algorithm is developed in Convolutional Neural Network (CNN) algorithm to retrieve the desired
Ahmad et al. [74], for redundancy of sensed data. In the initial stage, the sensed data on the cloud platform. It fine-tunes the sensed image dataset
classified sensed data are placed into a subset according to their simi (image of various foods) to generate a fine-grained model that is used for
larity characteristics by using the accuracy fitness values. In addition, the classification. Then, the fine-grained model is trained by Caffe. At
the parameter of Medication Rate (MR) is used to extract features from the initial stage, the model is loaded into the memory, as the data (food
neighboring subset data. Therefore, a random and uniform number image) is fed into the convolutional neural network as the input. Thus,
(from 0 to 1) is generated for each data in each sensed data subset. If it is the CNN features can be extracted by using the max-pooling and
observed that the value is less than the MR, then the feature is inserted Rectified Linear-Unit (ReLU) layers, to reduce the data feature di
into a new subset. Otherwise, if the new subset happens to be better than mensions and speed up the convergence of the computing process.
the initial exploratory subset, it is considered as the last new subset. Li et al. [81] proposed a Deep Convolutional Computation model
Thus, this process is repeated until the best feature subset is reached. A (DCCM) algorithm to learn hierarchical features of sensed data by uti
Deep Learning Long-short Term Memory (LSTM) algorithm is also pro lizing the tensor method, to extend the convolutional neural network
posed in Ref. [75], to predict the ground speed of aircraft landing, based from the vector space to the tensor space. Thus, the local features in the
on sensor data retrieved from the aircraft. It consists of six layers that are sensory data are optimally exploited and overfitting is avoided. Also, a
segmented into input, hidden, and output layers. A random forest al tensor convolutional layer is introduced to reach the deeper layers. The
gorithm is first used to classify the sensed data into twenty features. The initial layers are embedded on mobile devices, the intermediate layers
input consists of one layer, the hidden consists of four layers and the are presented in cloudlet and the deeper layers are embedded in the
output has only one layer. Consequently, the four hidden layers consist cloud server. The classification of the input sensed data (image) is
of 128, 64, 32, and 8 neurons while the output layer consists of one computed in the initial layers residing on the mobile device. Thus, the
neuron, which is used to obtain the predictive value of ground speed. back-propagation technique is used to train the layers by evaluating all
Mohammadi et al. [76] proposed a Deep Reinforcement Learning the layers until a desired confident classification result is obtained.
(DRL) algorithm to aggregate sensed data with the same distance posi Therefore, if it cannot classify the sample sensed data with sufficient
tion, labeled and imputed in the same cluster. Sensed data are clustered confidence, it is then transferred to the intermediate layers in the
based on their proximity level. It uses the variance auto-encoder func cloudlet for the classification process. The deeper layers are only
tion to identify the optimal data representing the closest distance in invoked when both the initial and intermediate layers are unable to
formation for locating the target object. Also, Yan et al. [77] proposed an classify the input data set to meet the desired confidence candidate. In
Integrated Deep Auto-Encoder algorithm for the management of sensed addition, the CDCNN can decide whether to reject or accept classifica
data obtained from sensor devices. Data such as the state data recorded tions based on the threshold value passed as an argument at runtime.
within a period at each sub-processes before the failure is retrieved from This improves the accuracy and speed of the entire classification process.
the DECG which is known as the historical information. The historical Table 4 identifies the problems solved, performance results, and weak
information is cleansed (e.g., filling missing data features) and divided nesses of the existing algorithms used for predicting data redundancy. It
into two categories, namely, distant records and recent records achieve also indicates the processes adopted by the algorithms, edge devices,
an optimal prediction. The distant records symbolize the records that are and cloud IaaS resource components as indicated in previous literature.
far away from the current time moment, while the recent records indi
cate records that are close to the current time moment. Thus, the distant 5.3. Cloud resource provisioning for user requests
records are used to simulate the damaging trend, while the recent re
cords are used to simulate the smoothing process of the recent change. Providing of efficient resource allocation ensures satisfactory cloud
Then, two outputs are fused and linear regression is performed to service for end-user requests. In IoT-enabled edge cloud computing,
convert hidden or discrete records to predict the Remaining Useful Life resources are allocated as Physical Machines and Virtual Machines in the
(RUL) of production machines. cloud IaaS platform, as shown in Fig. 7.
A deep learning based regression algorithm is proposed in Ref. [78]. How to integrate virtual machines into servers to support the
It consists of eight layers which are further grouped into three sections requested task determines the ability to minimize the resource alloca
namely lower layer, intermediate layers, and higher layers. The lower tion problem [83]. This research focuses on the problem of load
and intermediate layers are implemented in the edge servers while the balancing when migrating virtual machine(s) from the source server to
higher layers are implemented in the cloud. The input sensed data the destination server for executing data filtering or analytical applica
(image of dog and cat) from the camera sensor devices are transferred to tion requests. Load balancing refers to the pattern in which resources are
the lower layer in the edge servers for processing. The data are processed distributed to avoid overloading any Machine (Servers and VMs) as re
at the intermediate layer where a filter or feature detector is utilized to sources are optimally utilized [84]. Also, it determines the migration of
extract features to obtain the relevant data. This reduces the size of the tasks to underutilized VMs and Servers for effective resource sharing
input data to a significant size known as the relevant data. In addition, [85]. In this article, we analyze the existing algorithms used for
the reduced relevant data is transferred from the edge server to the cloud resolving the related issues of load balancing while allocating resources
for further processing. The reduced data is passed to the higher layers to execute the filtering data or analytic application requests on the cloud
(consisting of neurons) residing in the cloud server, where it is filtered IaaS platform.
(feature detector) to retrieve optimal data. Jing et al. [86] proposed a Dynamic Priority and Load Balancing
Hybrid Multilayer Perceptron Convolution Neural Network (MLP- (DPLB) algorithm for VMs resource(s) load balancing carrying the
CNN) algorithm is developed in Ref. [79], for the fusion and classifica scheduling of IoT application request tasks execution on IaaS. The dy
tion of sensed image data. Generally, it uses its fusion decision rule to namic priority function is composed of task value density and task
fuse the output sensed data based on the CNN confidence value. The computation urgency. In addition, the priority is subsequently increased
CNN confidence value is obtained by subtracting the maximum value of over a period of time to ensure timely execution of each task on the
a vector from its mean value, resulting in the optimal membership queue. The scheduling function consists of Earliest Completion Time
classification. However, if the CNN confidence value is higher than an (ECT) and retrieving the load status information of each VM with the
initial predefined threshold, it indicates that the CNN confidence is support of publish/subscribe method. The task are ordered according to
lower than another threshold. Thus, if the confidence of the CNN de their priority level, and the tasks with highest priority are scheduled first
pends on the initial and the other threshold, then the fusion output se to the optimal VMs among heterogeneous VMs that meet the QoS re
lection with the higher confidence value is regarded as the actual quirements with the support of the task migration manager. The Brier
classification result. Consequently, Liu et al. [80], develop a Score method is used to predict an overloaded VM, whereby if a VM
1501
A.E. Edje et al. Digital Communications and Networks 9 (2023) 1486–1515
Table 4
Comparison of redundant data elimination techniques.
Article Title Algorithm Process Problem Outcome Weakness Edge Cloud Data Center
Resolve Device
Cloud No. of No. of
Storage Physical Virtual
Server Machine Machine
(PMs) (Vms)
SVM–T-RFE: A Support Vector Clustering Inefficient Efficiently Candidate feature PC N/A N/A N/A
novel gene Machine (SVM) elimination of eliminated set consists of
selection Recursive feature redundant data highly correlated
algorithm for Function redundancy with minimum features
identifying Elimination computation time
metastasis-
related genes in
colorectal
cancer using
gene expression
profiles [53]
Feature selection SVM Recursive Clustering Candidate feature Improved N/S PC Yes N/S N/S
and analysis on Function set consists of elimination of
correlated gas Elimination- highly correlated feature
sensor data based features redundancy while
with recursive Correlation Bias retrieving actual
feature Reduction sensed data
elimination (SVM-
[54] RFE+CBR)
On reliability of Neural Network Deep learning Inappropriate Effectively and Unclear result due PC N/A N/A N/A
neural network Sensitivity selection of efficiently to limited number
sensitivity (NNS) desired features retrieved the best of input features for
analysis applied among various features with training features
for sensor array features improved
optimization accuracy
[55]
Sensor array Fast Correlation- Classification Unclear result Obtained best Computation time Remote Yes N/S N/S
optimization based Filter due to limited combination of complexity Server
for mobile (FCBF) number of input features while
electronic nose: algorithm features and discarding
wavelet overlapping of redundant ones
transform and features
filter based selectivity
feature
selection
approach [56]
Fractional-order Fractional-order Classification Deviation of Effectiveness and Not considering Server N/A N/A N/A
embedding Embedding relevant sensing robustness in vital correlation
multi-set Multiset data due to noise eliminating noisy among different
canonical Canonical and limited data feature sets
correlations Correlations training samples
with (FEMCCs)
applications to
multi-feature
fusion and
recognition
[57]
Discriminant Discriminant Classification The identification Improved Still pose with Laptop PC N/A N/A N/A
correlation Correlation and elimination accuracy for feature redundancy
analysis: real- Analysis (DCA) of redundant detecting and within the intra
time feature feature between- elimination of and extra class in
level fusion for class feature redundant multiple classes or
multimodal similarities features a single class
biometric
recognition
[58]
Enhanced feature Intra-class and Classification The neglecting of Improved Computation time Remote Yes N/S N/S
fusion through Extra-class some correlation accuracy of complexity Server
irrelevant Discriminative information detection and
redundancy Correlation among various elimination of
elimination in Analysis feature sets due to feature
intra-class and (IEDCA-IRE) over-fitting redundancy
extra-class between data
discriminative points
correlation
analysis [59]
Mobile-cloud Jeffry- Classification Issue of duplicate Improved Constrained with Mobile N/A N/A N/A
assisted video divergence sensed images accuracy of computation time Phones
summarization Boolean Series- relevant sensed complexity and
framework for image retrieval cannot be applied
(continued on next page)
1502
A.E. Edje et al. Digital Communications and Networks 9 (2023) 1486–1515
Table 4 (continued )
Article Title Algorithm Process Problem Outcome Weakness Edge Cloud Data Center
Resolve Device
Cloud No. of No. of
Storage Physical Virtual
Server Machine Machine
(PMs) (Vms)
1503
A.E. Edje et al. Digital Communications and Networks 9 (2023) 1486–1515
Table 4 (continued )
Article Title Algorithm Process Problem Outcome Weakness Edge Cloud Data Center
Resolve Device
Cloud No. of No. of
Storage Physical Virtual
Server Machine Machine
(PMs) (Vms)
data
aggregation
and
channelization
[68]
Federated Cobweb Clustering Highly Improved the Computationally Mobile Yes N/S N/S
internet of Expectation dimensionality of quality of sensed intensive and Phone
things and Maximization- sensed data set data by reducing consumes a large
cloud based K-Means and its noisy its dimensionality amount of
computing nature based on computer memory
pervasive aggregation
patient health strategy
monitoring
system [69]
A new deep Two-step K- Clustering Numerous blurry, Eliminated Unable to discover Mobile Yes N/S N/S
learning-based means background unusable sensor the actual Phone
food Clustering images (useless) data resulting in correlation among
recognition Algorithm data that limits improved the discovered
system for classification clustering patterns in the
dietary accuracy and accuracy sensed dataset
assessment on delayed
an edge transmission of
computing the data to the
service cloud
infrastructure
[70]
Predictive Adaptive Regression Challenge of Improved Utilizes huge N/A Yes N/S N/S
analytics for Moving Window complex event prediction amount of
complex IoT Regression streaming data accuracy in near computation
data streams algorithm without real-time and resource (memory
[71] leveraging minimized the space)
historical data for computation
prediction complexity
Effective features Elephant Herd Optimization/ Delay in the Enhanced feature N/S Fog Sever N/A N/A N/A
to classify big Optimization- Classification computation selection
data using based Linear processing of accuracy with
social internet Kernel Support sensed data minimum
of things [72] Vector (EHO- feature selection computation time
LKSV) algorithm and memory
usage
A novel data Perceptually Classification Problem of both Effective and Eliminate relevant Primary Yes N/S N/S
reduction Important Points local and global efficient sensed data and
technique with (PIP) optima in sensed elimination of alongside with Secondary
fault-tolerance data reduction. duplicate sensed duplicates ones due Server
for internet-of- data with same to missing data
things [73] time retrieval
Toward modeling Hadoop Optimization Computational Improved feature Not Specified Multi- Yes N/S N/S
and Artificial Bee complexity selection cluster
optimization of Colony involves accuracy with Hadoop
features algorithm extracting of response to with i5 3.4
selection in big features in real- timeliness GHz and 8
data-based time IoT GB RAM
social internet streaming data
of things [74]
A novel deep Deep Learning Deep learning Inaccurate Improved Weakness in the N/A Yes N/S N/S
learning Long-short term classification of classification selection of optimal
method for memory (LSTM) sensed data accuracy to some parameters to
aircraft landing algorithm retrieved from extent in a timely determine relevant
speed aircraft to manner sensed data from
prediction determine the irrelevant ones.
based on cloud- safety of its
based sensor landing speed
data [75]
Deep Deep Deep learning Problem of close Improved Highly Fog server N/A N/A N/A
reinforcement Reinforcement estimation of the classification computationally
learning in Learning target locations in accuracy and complex
support of IoT algorithm an indoor performance of
and smart city environment locating target
services [76] objects
(continued on next page)
1504
A.E. Edje et al. Digital Communications and Networks 9 (2023) 1486–1515
Table 4 (continued )
Article Title Algorithm Process Problem Outcome Weakness Edge Cloud Data Center
Resolve Device
Cloud No. of No. of
Storage Physical Virtual
Server Machine Machine
(PMs) (Vms)
Big data analytics Integrated Deep Deep learning Ineffective Effective Computationally Fog server Yes N/A N/A
for prediction Auto-Encoder retrieval of extraction of intensive
of remaining algorithm desired sensed desired sensed
useful life based data from data which
on deep massive data set enhances
learning [77] for prediction prediction
purpose accuracy
Learning IoT in Deep learning Deep learning Complications in Improved Highly Edge Yes N/S N/S
edge: deep algorithm obtaining optimal accuracy of computationally Servers
learning for the sensed data at desired sensed complex
internet of reduced sizes. data retrieval at
things with minimum size
edge computing
[78]
A hybrid MLP- Hybrid Deep learning Problem of Improved Computation Local Yes N/A N/A
CNN classifier Multilayer inaccurate classification intensive and huge Server
for very fine Perceptron classification of accuracy memory space
resolution Convolution different fine usage during
remotely Neural Network spatial resolution processing
sensed image algorithm remotely sensed
classification images
[79]
A new deep Convolutional Deep learning Inaccurate Improved Classification Mobile Yes N/S N/S
learning-based Neural Network classification of classification accuracy still needs Phones
food algorithm sensed data and accuracy by to be enhanced
recognition delayed eliminating
system for transmission of redundant data
dietary the data to the
assessment on cloud
an edge
computing
service
infrastructure
[80]
Deep Deep Deep learning Inefficient Improved Highly Local Sever N/A N/A N/A
convolutional Convolutional detection of the classification computationally
computation Computation correlations accuracy intensive
model for model algorithm between
feature learning heterogonous
on big data in sensed data
the internet of feature space
things [81]
The cascading Cascading Deep Deep learning Limited Reduced Classification Raspberry Yes N/S N/S
neural network: Convolution computational computation cost accuracy still needs Pi
building the Neural Network processing at reasonable to be enhanced Mobile
internet of algorithm resources on classification with an phone
smart things embedded mobile accuracy in a optimization Cloudlet
[82] devices timely manner algorithm server
workload exceeds the Brier Score it is considered as overloaded, but if it eight groups. Servers with optimal computation resources that is based
falls below the Brier score it is considered as underloaded. The Task on the parameters are selected from the resource pool to host VMs. As a
Migration Manager (TMM) then assigns or facilitates the migration of result, the virtual machines from overloaded servers are moved to the
tasks to the underload VMs to balance the loads on the available VMs. optimal server to process new jobs as they arrive.
Quasi-real-time Optimization-based Adaptive SERAC3 resource In [89], a Fuzzy Markov Normal (FMN) algorithm is proposed
allocation algorithm is introduced in Ref. [87], for selecting appropriate selecting VMs to be transferred from congested servers (hosts) to avoid
configuration of virtual machines to process IoT sensory big data oversubscribed hosts and minimize energy consumption. It categorizes
filtering application requests on the cloud IaaS upon arrival. It solved the the attributes of VMs based on their current utilization level and the
prevailing problem of the CP-BO algorithm, by extracting representative workload status of the host in which they reside with the support of
workloads for incoming sensing data, analyzing the data, and intelli fuzzy logic method. It then uses the Markov Normal technique is
gently determining an optimal configuration (type of virtual machines, deployed to determine which category of VMs should be migrated from
size of the virtual machine, and the number of virtual machines) for the the overloaded host to the less load target host. However, FMN only
clustering of each job in real time without considering the load performs migration of VMs based on host utilization without considering
balancing in PHs and VMs. However, problem of load balancing is solved the “memory utilization of VMs selection process which is the basic
in Ref. [88] by using a Virtual Machine and Selection algorithm for the requirement to be established before VMs migration” [90]. Therefore, an
processing of sensory data filtering or analytic application requests approximation Algorithm is proposed in Ref. [91], to solve the
(jobs) in the cloud IaaS platform. It uses parameters such as CPU utili content-based memory problems of VM selection from source to desti
zation, memory utilization, and job arrival rate to cluster servers into nation, with a single overloaded host and a destination host when the
1505
A.E. Edje et al. Digital Communications and Networks 9 (2023) 1486–1515
1506
A.E. Edje et al. Digital Communications and Networks 9 (2023) 1486–1515
requests. At the initial stage, new VM requests for executing jobs are use appropriate communication protocols to effectively communicate
classified according the amount of resource demand after which they are with other devices and networks on the IoT-based edge cloud
assigned to the VM class that is capable of executing each job. Thus, VMs infrastructure.
are migrated from low-load servers to intermediate-load servers within
the same class. It also determines which physical servers that are inac 6.1. Processes adopted in existing research
tive mode and put them into hibernation mode to minimize power
consumption. In this subsection, we discuss and analyze the processes adopted by
Abed and Younis [100] developed an Adaptive Firefly-enabled the existing algorithms (discussed in the previous section) to solve the
Weighted Round Robin (AFF-WRR) algorithm for dynamic and static problems (as highlighted in the tables of section 5) on IoT-enabled edge-
load balancing on VMs to process IoT application requests. The WRR is cloud computing.
responsible for estimating the weights of each VM based on three pa The classification process is a supervised machine learning technique
rameters namely CPU, memory, and latency. VMs with higher weights that assumes some prior knowledge to guide the partitioning operation,
are considered the most viable for executing large jobs followed by the formulating a set of classifiers for the representation of the best distri
least weighted VMs. The Adaptive Firefly (AF) tracks the status of VMs bution of patterns [103]. Furthermore, classification processes are
and sorts them according to their weighted level. VMs with optimal designed to use both labeled and unlabeled data during the classification
resources are selected to execute incoming jobs on a real-time basis. The process. The set of labeled data is mainly used to train the classifier, such
status of VMs is regularly monitored in milliseconds by the AF, while as the prediction function, while the unlabeled data is classified by the
WRR frequently rebalances the status of VMs based on the Firefly results. classifier. The classification output is a finite set of predefined discrete
Chen and Chen [101] addressed the issue of load balancing on VMs classes or values, depending on the number of classes the classification
and servers in the cloud by developing a service-oriented Virtual Ma problems belong to either binary or multi-class categories [104]. The
chine (VM) placement algorithm. It uses the genetic algorithm to opti binary category or classification consists of two labels e.g. 0/1, good/
mize the configuration of different VMs in order to achieve minimum bad, and white/black, while the multi-class category consists of multiple
communication overhead and total power consumption. In the initial labels. Consequently, the quality of the classification results is verified
stage, the population chromosome is generated, which represents the by determining the number of test patterns that are allocated to the
VMs. corresponding collections, which is called the accuracy rate.
It then assigns the required VMs that are capable of executing the The regression process is used to design the correlation between
jobs to the servers, ensuring that the VM load does not exceed the server input and output variables to achieve a predictive solution. The result of
limit. This is done through the fitness function where the communica regression processes is determine in the continuous domain. For
tion cost between the VMs is computed and summed up to obtain the example, in a diabetic monitoring application, a regression can predict
fitness value of one server. Therefore, the server with the highest fitness the symptoms of diabetes based on previous information. In general, the
is randomly selected from multiple servers to execute the job. Table 5 regression allows the prediction of the outcome of a specific event. It is
shows the solved problems, performances, and weaknesses of the algo widely used in the updating of IoT health and agriculture application
rithms used for Cloud IaaS resource allocation for the execution of domains.
sensory data filtering or analytic application requests on IoT-based edge The clustering process is an unsupervised learning process that ex
cloud infrastructure. It also shows the processes used by the algorithms, tracts hidden patterns and structures from a given data set. Unlike
edge devices, and cloud data center resource components as depicted in classification which has some prior knowledge to strategize the parti
previous research. tioning operation, clustering has no pre-knowledge of the strategy to be
Basu et al. [102] introduced a hybrid Genetic-Ant Colony Optimi used for the extraction process. It aggregates the data into groups, based
zation (GAACO) algorithm for scheduling the task requests of multi on their similar features and common structure as well as the data points
processor IoT applications on the Cloud IaaS. Each task is scheduled to a in different dissimilar clusters. Clustering is mainly used in recom
single processor at a time in a heterogonous processor system. A task can mender systems and outlier detection. The verification and evaluation of
only be executed when its predecessors have finished execution. Simply clustering results is based on the amount or number of dimensions of the
put, once a task starts processing on a specific processor, the next task data set to which the clustering algorithm is applied. For example, the
request scheduled on the same processor must wait for the previous task sum of squared errors is mainly used for data clustering while the peak-
to finish executing. At the initial stage, the task and processor with the signal-to-noise ratio is mainly used for image clustering [105].
best fitness solution are determined among multiple processors and Deep learning is a machine learning technique that consists of deep
incoming task requests with the support of GAACO. After which the and complex architectures [106,107]. These architectures consist of
heuristic function is used to estimate the makespan (maximum execu many layers that convert input (e.g. images) into output data (e.g. an
tion time) taken for each task it traverses all the levels in the graph actual image) while learning progressively on higher-level features
structure. Therefore, a task with a larger makespan is scheduled first in [108]. Deep leering, also known as Deep Neural Networks (DNN), was
GAACO to avoid starvation processing resources. The capability of the considered complex to train data effectively and efficiently, it performs
processors is computed by the heuristic function, where the processors both classification and clustering processes during operation. It began to
with the highest probabilistic ratio of resources are selected to execute gain popularity in 2010 when it was discovered that training and
the task with the highest makespan. This process is repeated for several analysis of large, high-dimension IoT big data could be realized with
iterations until all tasks in the graph structure are fully executed. optimal results [109]. The stacked auto-encoders (SAEs) and DNN layers
sequentially in an unsupervised manner (pre-training), and fine-tuning
6. Processes and network protocols for IoT-edge cloud the stacked network with a supervised approach, could provide better
computing performance. However, they are known to be inflexible and require a
reasonable amount of work to generate acceptable results.
Processes are a set of instructions that are currently being executed. Optimization is the process of modifying some features of a system to
These sets of instructions that are processed logically to solve specific improve its performance or use limited resources more efficiently. For
problems which scientists call algorithms. In simple terms, processes are example, an algorithm can be optimized to speed up its process execu
a set of instructions that are systematically applied by an algorithm to tion faster or to use minimum memory resources during process
solve a particular problem. On the other hand, network communication execution. Optimization techniques are mainly based on a bio-inspired
protocols govern the interaction between IoT sensing devices and edge- model whose algorithms are mainly used to solve optimization prob
cloud platforms. Therefore, it is important for IoT low-power devices to lems. The optimization-based process is adopted by the algorithms in
1507
A.E. Edje et al. Digital Communications and Networks 9 (2023) 1486–1515
Table 5
Comparison of resource allocation techniques for executing IoT applications.
Article Title Algorithm Process Problem Resolve Outcome weakness Edge Cloud Data Center
Device
Cloud No. of No. of
Storage Physical Virtual
Server Machine Machine
(PMs) (Vms)
An open Dynamic Priority Optimization Inability to Reduced Local and global N/A Yes 12 124
scheduling and Load execute dependent makespan and optimum issue
framework for Balancing (DPLB) tasks and violation improved load
QoS resource algorithm of SLA balance on VMs
management in
the internet of
things [86]
SERAC3: Smart Quasi-real-time Optimization Exhaustive search Improved the High resource N/S Yes 16 N/S
and economical Optimization cost for optimal selection of utilization due to
resource based Adaptive resource selection. optimal inefficient load
allocation for SERAC3 resource configurations balancing
big data clusters allocation with lower
in community algorithm exhaustive search
clouds [87] cost
Resource-aware Resource-aware Optimization Issue of Reduced the Unable to Raspberry Yes 3 12
virtual machine Virtual Machine unbalanced load dispatch time for consider the Pi B+
migration in IoT and Selection due to unforeseen the provisioning bandwidth
cloud [88] algorithm changes upon job of PHs and VMs in communication
arrival the cloud data between VMs
center
Improvement of Fuzzy Markov Clustering Inefficient Improved Load Unable to N/S Yes 16 640
energy Normal selection of VMs balancing with consider the VMs
efficiency at Algorithm migration from optimal placement memory contents
cloud data overloaded host of VMs on target before migration
center based on servers and
fuzzy Markov minimal energy
normal consumption
algorithm VM
selection in
dynamic VM
consolidation
[89]
An optimization of Approximation Optimization Latency delay of Reduced the Energy N/A Yes 100 4000
virtual machine Algorithm VMs dispatched migrated VMs consumption is
selection and from overloaded memory data with still on the high
placement by to destination minimum energy side
using memory server consumption
content
similarity for
server
consolidation in
cloud [91]
A hybrid model of Particle Swarm Optimization Global optima Reduced Weakness in local Router Yes 100 1000
Internet of Optimizer entrapment and computation time space entrapment
Things and algorithm tasks computation and optimal
cloud Genetic time complexity provisioning of
computing to Algorithm storage
manage big data
in health
services
applications
[92]
Scheduling Dynamic Optimization Inefficient Minimized Weakness in load Not Yes 8 N/S
internet of Dedicated server provisioning of computation balancing among Specified
things scheduling servers for delay and servers
applications in algorithm homogenous and improved
cloud heterogeneous IoT utilization of
computing [93] data servers
An adaptive Support Vector Regression/ SLA variation for Improved Not considering N/A Yes 6 100
resource Regression- Optimization resource resource computation cost
management Genetic (SVR-GA) utilization utilization
scheme in cloud algorithm configurations
computing [94] with SLA between
VMs and cloud
service Providers
An adaptive Adaptive Power- Optimization Unexpected Improved load Still challenged N/A Yes 100 N/S
Resource aware Virtual overload and high balancing with with high energy
management Machine energy less energy consumption
scheme in cloud Provisioner consumption utilization
computing [95]
(continued on next page)
1508
A.E. Edje et al. Digital Communications and Networks 9 (2023) 1486–1515
Table 5 (continued )
Article Title Algorithm Process Problem Resolve Outcome weakness Edge Cloud Data Center
Device
Cloud No. of No. of
Storage Physical Virtual
Server Machine Machine
(PMs) (Vms)
(APA-VMP)
algorithm
A virtual machine Max-BRU Optimization Unbalanced load Improved and Unable to N/A Yes 150 N/S
placement algorithm due to inefficient balanced use of estimate
algorithm for activation of resources of overloaded PMs
balanced desired servers multiple types of upon arrival of
resource servers deployed new jobs.
utilization in
cloud data
centers [96]
Energy-efficient Energy-efficient Optimization Unbalanced Improved the Weakness in local Laptop Pc Yes 100 500
virtual machine resource ranking resource utilization rate search
selection based and utilization utilization and and minimize the entrapment and
on resource factor-based high energy number of live VM computation time
ranking and virtual machine consumption migrations with complexity
utilization factor selection less energy
approach in algorithm consumption
cloud
computing for
IoT [97]
Multi-capacity Combinatorial Optimization High number of Minimized the Unable to Yes 128 340
combinatorial Ordering First-Fit running servers total number of consider the issue
ordering GA in Genetic and and resource running servers of global optima
application to Combinatorial waste per server in with less resource while determining
Cloud resources Ordering Next Fit local search space waste the best VMs
allocation and Genetic among various
efficient virtual algorithms ones
machines
consolidation
[98]
Workload aware Workload Aware Optimization Inability for edge Improved Not considering Laptop Pc Yes 500 1500
VM Virtual Machine cloud data centers convergence rate the
consolidation Consolidation to process tasks in with minimum communication
method in edge/ algorithm a power-saving active server overhead between
cloud mode and the usage and less servers and VMs
computing for issue of global energy
IoT applications entrapment consumption
[99]
Developing load Firefly and Optimization Overloaded PMs Improved Inefficient Yes 1000 5000
balancing for Weighted Round due to unbalanced resource searching of
IoT-cloud Robin algorithms load on every utilization with candidate
computing resource minimum resources for job
based on response time execution
Advanced
Firefly and
weighted round
Robin
algorithms
[100]
Service oriented Service-oriented Optimization Challenges of high Minimized Unable to Yes 250 N/S
cloud VM virtual machine communication communication schedule the VMs
placement placement overhead between cost between VMs, for task execution
strategy for algorithm VMs under the energy usage and which disrupt
internet of same service the total PM utility load balancing in
things [101] the PMs
An intelligent Hybrid Genetic- Optimization Scheduling task Efficient load Not considering Yes 1000 2000
/cognitive Ant Colony dependency balancing with local search
model of task Optimization reduced makespan entrapment
scheduling for (GAACO)
IoT applications algorithm
In cloud
computing
environment
[102]
previous research, to solve optimization problems for the allocation of 6.2. Network communication protocols deployed in IoT-Edge Cloud
resources required for the execution of IoT data filtering (outliers and computing
redundancy elimination) on analytic applications have been extensively
analyzed in this paper. Communication protocols such as message Query Telemetry Transfer
(MQTT), Wireless Fidelity (WiFi), Bluetooth, General Packet Radio
1509
A.E. Edje et al. Digital Communications and Networks 9 (2023) 1486–1515
Service (GPRS), and Advanced Message Queue (AMQP) were used in 7. Potential challenges of IoT-enabled cloud computing
previous research which is briefly discussed as follows: infrastructure
Message Query Telemetry Transfer (MQTT) was invented by IBM in
the year 1999 as a standardized publish/subscribe push protocol. It is Whiel IoT-enabled cloud systems tend to solve many problems, there
specifically designed to facilitate the transmission of data under long are a reasonable number of challenges that have yet to be addressed.
network delays and low-bandwidth network conditions [110,111]. It This is because the potential solutions needed to solve these challenges
mainly runs on both TCP/IP and other network protocol that is designed have not been unravelled by the algorithms in previous research. Also,
to provide lossless and bidirectional connection. Consequently, MQTT is some of these remaining challenges require a handful of consistent ef
suitable for resource-constrained IoT sensing devices that uses unreli forts from IoT-Cloud researchers and development communities, gov
able or limited bandwidth channels [112]. It was standardized at Oasis ernments, policy makers, and platform/hardware providers. Some of
in 2013 with a channel bandwidth of 5–20 MHz, Downlink rate of 256 these challenges are discussed as follows;
MB and an uplink rate of 127 MB over the TCP/IP port of 8883. Unstructured IoT sensing data. In real-world sensing events, the
Bluetooth is a wireless communication protocol designed to provide sensed data generated by sensor devices is unstructured due to their
short-range connectivity for small devices such as smartphones, laptops, dynamic and heterogeneous nature. While NoSQL and Ubuntu servers
and hand-held devices. It was standardized by the 802.15.3 in 1999, and are designed to store the unstructured data, they have yet to make a
operates in the 2.4 GHz frequency band at a low rate of 200 kb/s. Its significant impact on real-world IoT sensory enabled cloud infrastruc
main function is to allow audio and data streaming between devices. ture, as most researchers use structured data sources to experiment.
However, it consumes power energy during data transmission between However, the emergence of data lakes has proven to handle large vol
devices. This led to the introduction of Bluetooth Low Energy (BLE) in umes of IoT sensor data. It is able to store both unstructured and
the year 2010 to address this high power consumption. BLE is designed structured data without any predetermined idea of how data will be
to extend the application of Bluetooth for use in low-power devices such used. It also does not use query languages or scheme mapping and can
as wireless sensors and wireless controllers [113]. Currently, the IETF store any type of data without limitations. Lake is challenged with two
6LoWPAN Working Group (WG) has already recognized the importance major issues. First, loss of agility may occur when it is utilized to store a
of BLE for the Internet of Things and is beginning to develop a specifi huge pool of data that urgently needs analysis and decision making.
cation for the transmission of IPv6 packets over BLE [114,115]. It is most Because they have to go through several processes before any mean
commonly used by IoT sensing devices to transmit data to other devices. ingful data can be extracted from the data sample. Secondly, data
Fourth/Fifth Generation (4G/5G)-LTE Fourth Generation- Long- interchange may happen in the future since any data can be stored or
Term Evolution (4G LTE) are wireless network protocols designed and inserted [122]. This problem can be avoided by attaching metadata to
deployed for the Internet Protocol (IP)-based services, such as the the stored data and ensuring the attribute or source of the data. There
combination of multimedia capabilities and applications that with high- fore, it is necessary to further investigate on how algorithms can be used
speed mobile broadband [116]. It is considered to be ten times faster to manage these unstructured sensory data both in the simulation
than 3G in terms of transmission speed and covers a wider range. As a environment and in a real-world scenario.
result, its Packet Core (EPC) and IP-based network framework, enable Protocol diversity and Standardization. The IoT-enabled edge cloud
the smooth delivery of voice and data packets as compared to the older platform is challenged with a universal protocol and standard, as
models of cell towers using GSM and UMTS. However, it is fast reaching different protocols are used to communicate and interact between de
its limits due to the increasing demand for wireless data transfer as the vices of different development standards. While the platform has been
use of mobile phone usage grows and the reduction of latency in designed to enable multiple protocols to work together due to different
end-to-end connections due to the physical imposition of the Internet. requirements and their intended uses, but may lack the potential to
Therefore, the Fifth Generation (5G) mobile protocol has been intro support multiple protocols extensively. Therefore, it is worth further
duced to solve the aforementioned issues of the 4G. 5G is specifically exploring the development of an intelligent gateway as a possible so
designed to support efficiently support massive machine-to-machine lution that can provide seamless interoperability and integration be
and critical communications. Thus, a large number of actors and sen tween different protocols and algorithms that can intelligently select the
sors/meters that are deployed anywhere in the landscape will be able to optimal transmission channels for efficient data delivery. On the other
transmit their sensed data to other devices with a very low response time hand, various organizations, such as 3GPP, IEEE, ETSI, and M2M made
and high reliability [117]. It also has the potential to provide mobile some significant efforts to enforce standards for the development of IoT
broadband services such as high-speed multimedia streaming, video devices. They assume that interoperability will be provided by the
conferencing, Internet browsing, Voice-over-IP (VoIP), and efficient aforementioned standardization activities, but they may lead to higher
downloading and uploading of large files. uncertainty as they all provide specific and isolated solutions that can
Advanced Message Queue (AMQP) is a protocol that originated in the only cover their domains [123].
financial sector. It has been standardized by Oasis as a ubiquitous, secure Integration of contextual information. IoT data must be integrated
reliable, and open Internet protocol for handling messages [118]. It is with other data sources, such as context information that complement
regarded as a messaging middleware that uses different transport pro the understanding of the environment [124]. This is because IoT-sensed
tocols. AMQP provides asynchronous publish/subscribe communication data cannot understand the environment on its own. The emergence of
with messaging, in addition to its store-and-forward feature that ensures algorithms tends to speed up data filtering, analysis, and efficient
reliability during and after network disruptions [119,120]. This means reasoning due to the limited search space for the reasoning engine. For
that AMQP has the potential to be used in hazardous or hostile envi example, a sensor camera with the facial recognition cability can
ronments, as long as the overhead is not very high. perform surveillance in different contexts such as in government
Wireless Infidelity (Wi-Fi) is used to connect wireless devices such as buildings and residential areas [125]. Therefore, the sensed image data
laptops, smartphones, and PDAs. It is a brand of wireless communication collected from different contexts can assist the system to determine the
technology that is held by the Wi-Fi alliance to improve the interoper optimal action to be taken based on the retrieved face of an individual.
ability between wireless networking products based on the IEEE802.11 Overloading communication networks. With a large number of IoT-
standard [121]. It has a coverage range of 46 m (indoor) and 100 m enabled edge cloud components, maintenance and configuration of their
(outdoor) with a bandwidth channel of 20–40 MHz, followed by a underlying physical Machine-To-Machine (M2M) interactions and net
downlink rate of 600Mbps and an uplink of 248Mbps at a frequency works becomes more complex. The dynamicity and heterogeneity of IoT
band of 2.4 GHz. big sensing data rapidly overwhelms the communication networks of the
IoT-enabled Edge Cloud platform. Therefore, the volume and speed of
1510
A.E. Edje et al. Digital Communications and Networks 9 (2023) 1486–1515
the data must be taken into account in order to provide optimal Quality (under-loaded) server(s) to execute jobs. This, avoid execution time
of Service (QoS). One way to address this issue is to provide for the complexity and overloading of available recourses (VMs and Servers).
storage and management of IoT sensor data across tiers of the IoT-based There are various servers (physical machines) in the cloud IaaS platform
edge cloud (ITC). This will compel application designers to deploy dedicated to specific tasks leading to the effective management of
complete data contextualization algorithms and techniques to obtain computation and storage resource provisioning. According to Ref. [14],
optimal QoS delivery across the ITC platform. The contextual techniques there are three main types of sensed data servers in the cloud IaaS
must consider the storage capabilities of the essential processing devices namely NoSQL, Relational Database (MySQL), and Hadoop servers. The
such as the sensing devices, microcontrollers, and edge servers and the NoSQL server is mainly designed to store and manage IoT sensed data
cloud data centers. due to their unstructured pattern. It has some features such as distrib
Security challenge in collaborative edge-cloud processing. Further uted storage, dynamic schema, and horizontal scalability. However, it is
research is needed on how to perform cloud-side computations to limited in its ability to maintain consistency, isolation, atomicity, and
encrypt IoT sensor data, without revealing secrets or privacy to cloud durability of sensed data. In addition, it partially supports distributed
service providers. In addition, how the edge can send the sensing data to queries. On the other hand, Hadoop servers are unique distributed file
the cloud in a secure manner, ensuring that the sensing data is not repositories that store and efficiently manage massive unstructured
corrupted in the edge processing units, and cannot be intercepted by data. It enables IoT sensed data to be generated in XML format.
unauthorized persons (intruders) while in transit to the cloud platform, According to Shvachko et al. [103], the combination of both NoSQL
needs to be addressed. and Hadoop servers enables unified management and access to sensed
Real-time Filtering/Analytic data. Achieving useful and intelligent data. Relational database (MySQL) server stores massive structured
information in real time from a huge volume of sensed data collected data. However, different data are generated rapidly and the relationship
from several multiple IoT sensing devices has become a major challenge. between these data is of importance for a multitenant data storage
This is due to the unavailability real-time stream mining approaches. system [104]. Therefore, virtual relational data is merged with various
One way to overcome this challenge is the use of edge devices, which has conventional relational data in a single schema. Despite the potential
already been proposed. Nevertheless, there are other solutions (such as features of the cloud servers, they are still prone to the massively het
algorithmic techniques) that are in the early stages of implementation erogeneous and dynamic nature of IoT big data. One way to solve this
and need to be optimized to extract meaningful and intelligent data on a problem is to use a virtual machine for effective and reliable data pro
real-time basis, which needs to be addressed in the future. cessing and storage management on servers. Virtual machines subsets of
a server that can be used to perform highly intensive computational
8. General discussion and conclusion tasks. This enables a server to perform two or more tasks simulanously,
such as providing storage space for incoming sensed data and at the
As expected, the algorithms were able to resolve issues related to same time performing data filtering or analytic operations on the sensed
sensed data filtering based on outlier detection and redundancy elimi data using algorithms (e.g. algorithms used for both data outlier and
nation in a given data set. In addition, issues related to load balancing redundancy detection) based on user application requests.
for resource allocation, such as migrating VMs from source to target The Observation in Fig. 8(a) shows that redundancy problems were
server(s) to perform the execution of sensed data filtering or analytic mainly handled by the classification process, followed by deep learning
application requests, were significantly resolved. Outliers were pri and clustering, with limited use of optimization and less use of regres
marily detected by considering the data type, spatio-temporal, attributes sion processes. On the other hand, the clustering process was mainly
correlations, user specification threshold, outlier score, and identifying used to detect outlier-related problems, followed by classification and
the type of outlier (error and event). There are two main types of data, deep learning, with limited use of regression and with no use of opti
namely linear and non-linear. The linear data type is known as static and mization. The optimization process happens to be the most deplorable
is structured sequentially either in a list(s) or frame(s) format. Non- process for resource allocation in the Cloud IaaS, to execute sensory data
linear is dynamic data and is also known as time series or streaming filtering and analytic application requests (jobs), followed by clustering
data. Spatio-temporal simply means the distances between sensing data with limited usage of regression and without the use of classification
and time upon arrival from a particular source (sensor). In other words, process. In addition, clustering seems to outperform other processes in
sensing data within a specific close range are considered normal data terms of its usage by the existing algorithms studied in this research, as
while others are classified as outliers or anomalies. The similarity shown in Fig. 8(b). Followed by classification, optimization, deep
(correlation) between several data in a given dataset is also determined, learning, and regression processes respectively. This shows that the
as those with the same values are either clustered or aggregated into utilization of machine learning algorithms is also gaining more mo
several groups or subsets according to their similarity level. Outliers mentum in IoT big data filtering and analytics on IoT-enabled edge cloud
within the subsets are then identified based on threshold(s) or score. computing.
We also observed that outliers are of two types as detected by some of Observations from the tabulated information indicated that a
the existing algorithms, namely error and event. Error outliers are reasonable amount of sensed data filtering algorithms used, to solve
generated by defect sensors which are often classified as irrelevant or
un-wanted data and are therefore eliminated from the dataset. Event
outliers, on the other hand, are useful data, most often used to report or
predict unforeseen circumstances. For example, the detection of a gas
leak from a cylinder is called an event outlier. In terms of redundancy,
feature selection and pattern recognition have been strongly. Features or
attributes of a given data set are subjected to a similarity check to
identify data with similar attributes or features. Thus, similar features
are selected to be merged into a single data feature or better still one out
of the similar data features is retained while others are eliminated.
Similarly, similar data patterns are classified or clustered together while
the irrelevant ones are identified and discarded.
Load balancing issues have mainly been solved by considering the
number of incoming requests prior to arrival while searching for optimal
or under-loaded VMs to migrate from source (overloaded) to target Fig. 8a. Utilization frequency of processes.
1511
A.E. Edje et al. Digital Communications and Networks 9 (2023) 1486–1515
and space complexity and are suitable for real-time sensing data.
Furthermore, the classification process is easy to develop and maintain
in parallel hardware such as the cloud data center. However, it requires
lengthy training and testing procedures on sensed data with poor
interpretability. Deep learning mainly combines the services of clus
tering and classification to perform its operations on sensed data. They
are most suitable for large sensory data as observed from previous
research and tend to achieve high accuracy in terms of performance
compared to other methods. However, they require a large amount of
storage space and are more time-consuming to run than the other
methods. The optimization process has been mainly used at the cloud
data center to improve efficiency in terms of computation time
completion (makespan), minimum resource utilization and energy
consumption as observed from previous algorithms. Their main objec
Fig. 8b. The level of Utilizing Processes in (%). tive is to prioritize available resources with optimal ability to execute
the required task.
outlier and redundancy related problems were implemented at the edge In conclusion, data filtering or analytic algorithms are the main tools
and edge/cloud respectively. The processes at the edge/cloud are mainly used to extract knowledge from massive data generated from various IoT
based on retrieving relevant sensed data while discarding the irrelevant sensing devices. On the other hand resource allocation algorithms are
ones. In addition, the algorithms executed only at the edge platform are used to provide optimal computation and storage resources for
mainly accessed immediately by the end-user applications. Fig. 9 shows executing data filtering/analytic application requests in IoT-enabled
that most of the algorithms are implemented in the edge/cloud respec cloud IaaS platform. Therefore, to achieve the desired knowledge in
tively. This shows that the use of cloud to exploit the limitations of IoT formation, appropriate filtering algorithms that are effective and effi
sensing device(s) and that of edge devices to process of IoT big data are cient need to be deployed due to the characteristic nature of IoT-sensed
gaining more momentum in this research area. big data. In this paper, we identify and discuss some related literature
In the aspect of processes adopted by the existing algorithms, the surveys on the IoT-based edge cloud domain, which motivated the
clustering process outperforms other processes in terms of usability current research under study. Extensive background information about
level. It can extract useful information from large sensed data as IoT devices, sensing data characteristics and factors that motivating the
compared to others, due to its sensitive nature to outliers and redundant integration of IoT, edge/cloud. A detailed description of the adopted
(noisy) sensed data. Clustering is done by partitioning based on the research methodology used to update the current research under
distance between instances, where each instance is identified as a cluster consideration. Filtering/analytic algorithms from previous researches
and merges the instances that are closer to one another until all instances were analyzed based on issues related to outlier detection, redundant
are fused into a single cluster. Observation also shows that most of the data discovery and elimination. The provisioning of optimal resources
clustering process was implemented on static sensed data retrieved from (PHs and VMs) for the execution of IoT application requests, taking into
various sensor devices. However, clustering such as Moving Window account load balancing issues is also presented. The problem solved, the
Principal Component Analysis [50] and Robust Incremental Principal successes and the weaknesses of algorithms are highlighted in tabular
Component Analysis [51] algorithms were implemented on dynamic or form. In addition, the processes employed by the algorithms were dis
real-time sensing data. Clustering methods are also known to be rela cussed as well as the network communication protocols used for the
tively scalable and enable the number of clusters to be specified in transmission of sensor data on the IoT-enabled edge cloud domain.
advance, such as the Recursive Principal Component Analysis [32], Subsequently, the prevailing challenges that are yet to be resolved in the
Adaptive K-means [35] and Distance-based Algorithm [44]. On the IoTenabled edge cloud infrastructure are presented to help characterize
other hand, hierarchical clustering such as Enhanced Knowledge the research directions in this area. The significance of this research is to
Granule [42], Hyperellipsoidal [46], and Incremental Fast Searching provide new insight into the discovery of event and error outliers with
Clustering-based K-Mediods [65], specifies the number of clusters itself the use of machine and deep learning techniques. This have been
as it performs operation on any given dataset. ignored for long by computing research communities. The existing al
Classification methods are mainly used in health-related sensory gorithms were applied in the healthcare sector to detect prevailing
data collection to predict redundant data, as can be seen from the diseases and symptoms in patients and minimize cybercrimes and
existing research. They are known for their efficiency in terms of time internet fraud. Also, in manufacturing company such as automobile
production plants for detection of faulty equipment. Detection of do
mestic and industrial gas leak. Researchers in this area may capitalize on
the weaknesses of the existing algorithms to improve their performances
in future research. For example, managing IoT and cloud components to
minimize energy usage and emission of carbon-dioxide. Furthermore, to
improve the performance of resource allocation techniques to minimize
hazardous material use and resource waste during assigned task in the
cloud. Also, to apply outlier detection techniques to detect unauthorized
access to data repositories and assigning resources, to protect sensitive
information of cloud users’ request tasks. Subsequently, optimizing the
existing techniques for the retrieval of useful and intelligent data in real
time will be considered in future research. The authors are currently
implementing outlier techniques for detecting cancer in human brain.
1512
A.E. Edje et al. Digital Communications and Networks 9 (2023) 1486–1515
are hereby declared that there are no any competing or conflict of [30] N. Nesa, T. Ghosh, I. Banerjee, Non-parametric sequence-based learning approach
for outlier detection in IoT, Future Generat. Comput. Syst. 82 (2018) 412–421.
interests among them.
[31] Rui Zhang, P. Ji, D. Mylaraswamy, M. Srivastava, S. Zahedi, Cooperative sensor
anomaly detection using global information, Tsinghua Sci. Technol. 18 (3) (2018)
References 209–219.
[32] Tianqi Yu, X. Wang, A. Shami, Recursive principal component analysis-based data
outlier detection and sensor data aggregation in iot systems, IEEE Internet Things
[1] S.K. Sharma, X. Wang, Live data analytics with collaborative edge and cloud
J. 4 (6) (2017) 2207–2216.
processing in wireless IoT networks, IEEE Access 5 (2017) 4621–4635.
[33] P.M. Kumar, U.D. Gandhi, A novel three-tier Internet of Things architecture with
[2] L. Atzori, A. Iera, G. Morabito, The internet of things: a survey, Comput. Network.
machine learning algorithm for early detection of heart diseases, Comput. Electr.
54 (15) (2010) 2787–2805.
Eng. 65 (2018) 222–235.
[3] A. Botta, W. De Donato, V. Persico, A. Pescapé, Integration of cloud computing
[34] A.F. Santamaria, F. De Rango, A. Serianni, P. Raimondo, A real IoT device
and internet of things: a survey, Future Generat. Comput. Syst. 56 (2016)
deployment for eHealth applications under lightweight communication protocols,
684–700.
activity classifier and edge data filtering, Comput. Commun. 128 (2018) 60–73.
[4] K.M. Modieginyane, B.B. Letswamotse, R. Malekian, A.M. Abu-Mahfouz, Software
[35] Ş. Kolozali, D. Puschmann, M. Bermudez-Edo, P. Barnaghi, On the effect of
defined wireless sensor networks application opportunities for efficient network
adaptive and nonadaptive analysis of time-series sensory data, IEEE Internet
management: a survey, Comput. Electr. Eng. 66 (2018) 274–287.
Things J. 3 (6) (2016) 1084–1098.
[5] T. Islam, S.C. Mukhopadhyay, N.K. Suryadevara, Smart sensors and internet of
[36] Daniel Puschmann, P. Barnaghi, R. Tafazolli, Adaptive clustering for dynamic IoT
things: a postgraduate paper, IEEE Sensor. J. 17 (3) (2017) 577–584.
data streams, IEEE Internet Things J. 4 (1) (2017) 64–74.
[6] C. Madhavaiah, I. Bashir, Defining cloud computing in business perspective: a
[37] J. Diaz-Rozo, C. Bielza, P. Larrañaga, Clustering of data streams with dynamic
review of research, METAMORPHOSIS 1 (2) (2012) 50–65.
Gaussian mixture models: an IoT application in industrial processes, IEEE Internet
[7] F. Bonomi, R. Milito, J. Zhu, S. Addepalli, Fog computing and its role in the
Things J. 5 (5) (2018) 3533–3547.
internet of things, in: Proceedings of the First Edition of the MCC Workshop on
[38] Z. Ali, G. Muhammad, M.F. Alhamid, An automatic health monitoring system for
Mobile Cloud Computing, 2012, pp. 1–8.
patients suffering from voice complications in smart cities, IEEE Access 5 (2017)
[8] E.G. Petrakis, S. Sotiriadis, T. Soultanopoulos, P.T. Renta, R. Buyya, N. Bessis,
3900–3908.
Internet of Things as a Service (iTaaS): challenges and solutions for management
[39] G. Muhammad, M.F. Alhamid, M. Alsulaiman, B. Gupta, Edge computing with
of sensor data on the cloud and the fog, Internet of Things Journal 3 (2018)
cloud for voice disorder assessment and treatment, IEEE Commun. Mag. 56 (4)
156–174.
(2018) 60–65.
[9] B. Kitchenham, S. Charters, Guidelines for Performing Systematic Literature
[40] P. Verma, S.K. Sood, Fog assisted-IoT enabled patient health monitoring in smart
Reviews in Software Engineering, EBSE Technical Report EBSE-2007-01,
homes, IEEE Internet Things J. 5 (3) (2018) 1789–1796.
Durham, United Kingdom, 2007, pp. 1–53.
[41] M.Z. Wu, Y.T. Wang, Z.C. Liao, A New Shelf Life Prediction Method for Farm
[10] J. Qiu, Q. Wu, G. Ding, Y. Xu, S. Feng, A survey of machine learning for big data
Products Based on an Agricultural IoT, vol. 846, Trans Tech Publication, 2014,
processing, EURASIP J. Appl. Signal Process. 67 (2016) 2–16.
pp. 1830–1835.
[11] A. Farahzadi, P. Shams, J. Rezazadeh, R. Farahbakhsh, Middleware technologies
[42] H.-T. Chang, N. Mishra, C.-C. Lin, IoT big-data centred knowledge granule
for cloud of things: a survey, Digital Communications and Networks 4 (3) (2018)
analytic and cluster framework for BI applications: a case base analysis, PLoS One
176–188.
10 (11) (2015) 1014–1980.
[12] L. Cui, S. Yang, F. Chen, Z. Ming, N. Lu, J. Qin, A survey on application of
[43] H.M. Raafat, M.S. Hossain, E. Essa, S. Elmougy, A.S. Tolba, G. Muhammad,
machine learning for internet of things, International Journal of Machine
A. Ghoneim, Fog intelligence for real-time IoT sensor data analytics, IEEE Access
Learning and Cybernetics 9 (8) (2018) 1399–1417.
5 (2017) 24062–24069.
[13] M.S. Mahdavinejad, M. Rezvan, M. Barekatain, P. Adibi, P. Barnaghi, A.P. Sheth,
[44] M. Kontaki, A. Gounaris, A.N. Papadopoulos, K. Tsichlas, Y. Manolopoulos,
Machine learning for internet of things data analysis: a survey, Digital
Efficient and flexible algorithms for monitoring distance-based outliers over data
Communications and Networks 4 (3) (2018) 161–175.
streams, Inf. Syst. 55 (2016) 37–53.
[14] H. Cai, B. Xu, L. Jiang, A.V. Vasilakos, IoT-based big data storage systems in cloud
[45] I. Vasconcelos, R.O. Vasconcelos, B. Olivieri, M. Roriz, M. Endler, M.C. Junior,
computing: perspectives and challenges, IEEE Internet Things J. 4 (1) (2017)
Smartphone based outlier detection: a complex event processing approach for
75–87.
driving behavior detection, Journal of Internet Services and Applications 8 (1)
[15] S. Shadroo, A.M. Rahmani, Systematic survey of big data and data mining in
(2017) 1–13.
internet of things, Comput. Network. 139 (2018) 19–47.
[46] L. Lyu, J. Jin, S. Rajasegarar, X. He, M. Palaniswami, Fog-empowered anomaly
[16] M.H. Rehman, E. Ahmed, I. Yaqoob, I.A.T. Hashem, M. Imran, S. Ahmad, Big data
detection in IoT using hyperellipsoidal clustering, IEEE Internet Things J. 4 (5)
analytics in industrial IoT using a Concentric computing model, IEEE Commun.
(2017) 1174–1184.
Mag. 56 (2) (2018) 37–43.
[47] A. Daneshpazhouh, A. Sami, Entropy-based outlier detection using semi-
[17] E. Ahmed, I. Yaqoob, I.A.T. Hashem, I. Khan, A.I.A. Ahmed, M. Imran, A.
supervised approach with few positive examples, Pattern Recogn. Lett. 49 (2014)
V. Vasilakos, The role of big data analytics in internet of things, Comput.
77–84.
Network. 129 (2017) 459–471.
[48] Xiaoling Wang, Z. Xu, C. Sha, M. Ester, A. Zhou, Semi-supervised learning from
[18] M. Ge, H. Bangui, B. Buhnova, Big data for internet of things: a survey, Future
only positive and unlabeled data using entropy, in: Proceedings of International
Generat. Comput. Syst. 87 (2018) 601–614.
Conference on Web-Age Information Management, 2010, pp. 24–31.
[19] M. Aazam, S. Zeadally, K.A. Harras, Offloading in fog computing for IoT: review,
[49] A. Delimargas, E. Skevakis, H. Halabian, I. Lambadaris, N. Seddigh, B. Nandy,
enabling technologies, and research opportunities, Future Generat. Comput. Syst.
R. Makkar, Proceedings of MILCOM 2015-IEEE Military Communications
87 (2018) 278–289.
Conference, 2015, pp. 36–44.
[20] M. Mohammadi, A. Al-Fuqaha, S. Sorour, M. Guizani, Deep learning for IoT big
[50] M. Rafferty, X. Liu, D.M. Laverty, S. McLoone, Real-time multiple event detection
data and streaming analytics: a survey, IEEE Communications Surveys and
and classification using moving window PCA, IEEE Trans. Smart Grid 7 (5) (2016)
Tutorials 20 (4) (2018) 2923–2960.
2537–2548.
[21] X. Fei, N. Shah, N. Verba, K.-M. Chao, V. Sanchez-Anguix, J. Lewandowski,
[51] X. Kong, J. Chang, M. Niu, X. Huang, J. Wang, S.I. Chang, Research on real time
Z. Usman, CPS data streams analytics based on machine learning for Cloud and
feature extraction method for complex manufacturing big data, Int. J. Adv. Des.
Fog Computing: a survey, Future Generat. Comput. Syst. 90 (2019) 435–450.
Manuf. Technol. 99 (5–8) (2018) 1101–1108.
[22] F. Alam, R. Mehmood, I. Katib, N.N. Albogami, A. Albeshri, Data fusion and IoT
[52] S. Cheng, Z. Cai, J. Li, H. Gao, Extracting kernel dataset from big sensory data in
for smart ubiquitous environments: a survey, IEEE Access 5 (2017) 9533–9554.
wireless sensor networks, IEEE Trans. Knowl. Data Eng. 29 (4) (2016) 813–827.
[23] A. Ukil, S. Bandyoapdhyay, C. Puri, A. Pal, IoT healthcare analytics: the
[53] K. Yan, D. Zhang, Feature selection and analysis on correlated gas sensor data
importance of anomaly detection, in: Proceedings of the 2016 IEEE International
with recursive feature elimination, Sensor. Actuator. B Chem. 212 (2015)
Conference on Advanced Information Networking and Applications, AINA), 2016,
353–363.
pp. 994–997.
[54] X. Li, S. Peng, J. Chen, B. Lü, H. Zhang, M. Lai, SVM–T-RFE: a novel gene selection
[24] Mohiuddin Ahmed, A. Anwar, A.N. Mahmood, Z. Shah, M.J. Maher, An
algorithm for identifying metastasis-related genes in colorectal cancer using gene
investigation of performance analysis of anomaly detection techniques for big
expression profiles, Biochem. Biophys. Res. Commun. 419 (2) (2012) 148–153.
data in scada systems, EAI Endorsed Trans. Indust. Netw. and Intelligent. Syst. 2
[55] P. Szecowka, A. Szczurek, B. Licznerski, On reliability of neural network
(3) (2015) 1–16.
sensitivity analysis applied for sensor array optimization, Sensor. Actuator. B
[25] N. Shahid, I.H. Naqvi, S.B. Qaisar, Characteristics and classification of outlier
Chem. 157 (1) (2011) 298–303.
detection techniques for wireless sensor networks in harsh environments: a
[56] D.R. Wijaya, R. Sarno, E. Zulaika, Sensor array optimization for mobile electronic
survey, Artif. Intell. Rev. 43 (2) (2015) 193–228.
nose: wavelet transforms and filters based feature selection approach,
[26] Jin Wang, S. Tang, B. Yin, X.-Y. Li, Data gathering in wireless sensor networks
International Review on Computers and Software 11 (8) (2016) 659–671.
through intelligent compressive sensing, Proc. - IEEE INFOCOM (2012) 603–611.
[57] Y.-H. Yuan, Q.-S. Sun, Fractional-order embedding multiset canonical
[27] M. Stocker, M. Rönkkö, M. Kolehmainen, Making sense of sensor data using
correlations with applications to multi-feature fusion and recognition,
ontology: a discussion for residential building monitoring, in: Proceedings of IFIP
Neurocomputing 122 (2013) (2013) 229–238.
International Conference on Artificial Intelligence Applications and Innovations,
[58] M. Haghighat, M. Abdel-Mottaleb, W. Alhalabi, Discriminant correlation analysis:
2012, pp. 14–20.
real-time feature level fusion for multimodal biometric recognition, IEEE Trans.
[28] F. Ganz, P. Barnaghi, F. Carrez, Information abstraction for heterogeneous real
Inf. Forensics Secur. 11 (9) (2016) 1984–1996.
world internet data, IEEE Sensor. J. 13 (10) (2013) 3793–3805.
[29] S. Kamal, R.A. Ramadan, E.-R. Fawzy, Smart outlier detection of wireless sensor
network, Electronics and Energetics 29 (3) (2015) 383–393.
1513
A.E. Edje et al. Digital Communications and Networks 9 (2023) 1486–1515
[59] Z. Wu, K. Mao, G.-W. Ng, Enhanced feature fusion through irrelevant redundancy [88] G.J.L. Paulraj, S.A.J. Francis, J.D. Peter, I.J. Jebadurai, Resource-aware virtual
elimination in intra-class and extra-class discriminative correlation analysis, machine migration in IoT cloud, Future Generat. Comput. Syst. 85 (2018)
Neurocomputing 335 (2019) 105–118. 173–183.
[60] I. Mehmood, M. Sajjad, S. Baik, Mobile-cloud assisted video summarization [89] G. Shidik, A. Azhari, K. Mustofa, Improvement of energy efficiency at cloud data
framework for efficient management of remote sensing data generated by center based on fuzzy markov normal algorithm vm selection in dynamic vm
wireless capsule sensors, Sensors 14 (9) (2014) 17112–17145. consolidation, International Review on Computers and Software (IRECOS). 11 (6)
[61] J. Yu, M. Kim, H.-C. Bang, S.-H. Bae, S.-J. Kim, IoT as applications: cloud-based (2016) 511–520.
building management systems for the internet of things, Multimed. Tool. Appl. 75 [90] A. Beloglazovy, R. Buyya, Optimal online deterministic algorithms and adaptive
(22) (2016) 14583–14596. heuristics for energy and performance efficient dynamic consolidation of virtual
[62] M. Yuan, H. Sheng, Research on the fusion method of spatial data and multimedia machines in cloud data centers, Concurrency Comput. Pract. Ex. 24 (13) (2011)
information of multimedia sensor networks in cloud computing environment, 1–24.
Multimed. Tool. Appl. 76 (16) (2017) 17037–17054. [91] H. Li, W. Li, H. Wang, J. Wang, An optimization of virtual machine selection and
[63] C. Li, W. Wei, J. Li, W. Song, A cloud-based monitoring system via face placement by using memory content similarity for server consolidation in cloud,
recognition using Gabor and CS-LBP features, J. Supercomput. 73 (4) (2017) Future Generat. Comput. Syst. 84 (2018) 98–107.
1532–1546. [92] M. Elhoseny, A. Abdelaziz, A.S. Salama, A.M. Riad, K. Muhammad, A.K. Sangaiah,
[64] R. Varatharajan, G. Manogaran, M. Priyan, A big data classification approach A hybrid model of internet of things and cloud computing to manage big data in
using LDA with an enhanced SVM method for ECG signals in cloud computing, health services applications, Future Generat. Comput. Syst. 86 (2018)
Multimed. Tool. Appl. 77 (8) (2018) 10195–10215. 1383–1394.
[65] Q. Zhang, C. Zhu, L.T. Yang, Z. Chen, L. Zhao, P. Li, An incremental CFS algorithm [93] H.S. Narman, M.S. Hossain, M. Atiquzzaman, H. Shen, Scheduling internet of
for clustering large data in industrial internet of things, IEEE Trans. Ind. Inf. 13 things applications in cloud computing, Annals of Telecommunications 72 (1–2)
(3) (2007) 1193–1201. (2017) 79–93.
[66] Aurora Gonzalez-Vidal, Payam Brnaghi, Antonio F. Skarmeta, BEATS: blocks of [94] C.-J. Huang, C.-T. Guan, H.-M. Chen, Y.-W. Wang, S.-C. Chang, C.-Y. Li, C.-
eigenvalues algorithm for time series segmentation, IEEE Trans. Knowl. Data Eng. H. Weng, An adaptive resource management scheme in cloud computing, Eng.
30 (11) (2018) 2051–2064. Appl. Artif. Intell. 26 (1) (2013) 382–389.
[67] F. Bu, An efficient fuzzy c-means approach based on canonical polyadic [95] R. Jeyarani, N. Nagaveni, R.V. Ram, Design and implementation of adaptive
decomposition for clustering big data in IoT, Future Generat. Comput. Syst. 88 power-aware virtual machine provisioner (APA-VMP) using swarm intelligence,
(2018) 675–682. Future Generat. Comput. Syst. 28 (5) (2012) 811–821.
[68] S. Misra, S. Chatterjee, Social choice considerations in cloud-assisted WBAN [96] N.T. Hieu, M. Di Francesco, A.Y. Jääski, A virtual machine placement algorithm
architecture for post-disaster healthcare: data aggregation and channelization, for balanced resource utilization in cloud data centers, in: Paper Presented at the
Inf. Sci. 284 (2014) 95–117. 2014 IEEE 7th International Conference on Cloud Computing, 2014.
[69] J.H. Abawajy, M.M. Hassan, Federated internet of things and cloud computing [97] M.S. Mekala, P. Viswanathan, Energy-efficient virtual machine selection based on
pervasive patient health monitoring system, IEEE Commun. Mag. 55 (1) (2017) resource ranking and utilization factor approach in cloud computing for IoT,
48–53. Comput. Electr. Eng. 73 (2019) 227–244.
[70] C. Liu, Y. Cao, Y. Luo, G. Chen, V. Vokkarane, M. Yunsheng, P. Hou, A new deep [98] H. Hallawi, J. Mehnen, H. He, Multi-Capacity Combinatorial Ordering GA in
learning based food recognition system for dietary assessment on an edge Application to Cloud resources allocation and efficient virtual machines
computing service infrastructure, IEEE transactions on services computing 11 (2) consolidation, Future Generat. Comput. Syst. 69 (2017) 1–10.
(2018) 249–261. [99] I. Mohiuddin, A. Almogren, Workload aware VM consolidation method in edge/
[71] Akbar, A., Khan, A., Carrez, F., Moessner, K., Predictive analytics for complex IoT cloud computing for IoT applications, J. Parallel Distr. Comput. 123 (2019)
data streams, IEEE Internet Things J.. 4(5) (02017) 1571-1582. 204–214.
[72] S. Lakshmanaprabu, K. Shankar, A. Khanna, D. Gupta, J.J. Rodrigues, P. [100] M.M. Abed, M.F. Younis, Developing load balancing for IoT-cloud computing
R. Pinheiro, V.H.C. De Albuquerque, Effective features to classify big data using based on advanced firefly and weighted round robin algorithms, Baghdad Science
social internet of things, IEEE Access 6 (2018) 24196–24204. Journal 16 (1) (2019) 130–139.
[73] Wong Siaw Ling, Ooi Boon Yaik, Liew Soung Yue, A novel data reduction [101] Y.-H. Chen, C.-Y. Chen, Service oriented cloud VM placement strategy for Internet
technique with fault-tolerance for internet-of-things, Association for Computing of Things, IEEE Access 5 (2017) 25396–25407.
Machinery (ACM) 2 (1) (2017) 214–221. [102] S. Basu, M. Karuppiah, K. Selvakumar, K.-C. Li, S. Islam, H. K, M.M. Hassan,
[74] A. Ahmad, M. Khan, A. Paul, S. Din, M.M. Rathore, G. Jeon, G.S. Choi, Toward Md Bhuiyan, A. Z, An intelligent/cognitive model of task scheduling for IoT
modeling and optimization of features selection in big data based social internet applications in cloud computing environment, Future Generat. Comput. Syst. 88
of things, Future Generat. Comput. Syst. 82 (2018) 715–726. (2018) (2018) 254–261.
[75] C. Tong, X. Yin, S. Wang, Z. Zheng, A novel deep learning method for aircraft [103] K. Shvachko, H. Kuang, S. Radia, R. Chansler, The hadoop distributed file system,
landing speed prediction based on cloud-based sensor data, Future Generat. in: Proceedings of 2010 IEEE 26th Symposium on Mass Storage Systems and
Comput. Syst. 88 (2018) 552–558. Technologies, MSST), 2010, pp. 21–27.
[76] M. Mohammadi, A. Al-Fuqaha, M. Guizani, J.-S. Oh, Semisupervised deep [104] H. Yaish, M. Goyal, G. Feuerlicht, Multi-tenant elastic extension tables data
reinforcement learning in support of IoT and smart city services, IEEE Internet management, Procedia Comput. Sci. 29 (2014) (2014) 2168–2181.
Things J. 5 (2) (2018) 624–635. [105] C.-W. Tsai, C.-F. Lai, M.-C. Chiang, L.T. Yang, Data mining for internet of things: a
[77] H. Yan, J. Wan, C. Zhang, S. Tang, Q. Hua, Z. Wang, Industrial big data analytics survey, IEEE communications surveys & tutorials 16 (1) (2014) 77–97.
for prediction of remaining useful life based on deep learning, IEEE Access 6 [106] F. Samie, L. Bauer, J. Henkel, From cloud down to things: an overview of machine
(2018) 17190–17197. learning in internet of things, IEEE Internet Things J. 12 (2019).
[78] H. Li, K. Ota, M. Dong, Learning IoT in edge: deep learning for the internet of [107] J. Schmidhuber, Deep learning in neural networks: an overview, Neural Network.
things with edge computing, IEEE Network 32 (1) (2018) 96–101. 61 (2015) 85–117.
[79] C. Zhang, X. Pan, H. Li, A. Gardiner, I. Sargent, J. Hare, P.M. Atkinson, A hybrid [108] A. Ali, J. Qadir, R. ur Rasool, A. Sathiaseelan, A. Zwitter, J. Crowcroft, Big data
MLP-CNN classifier for very fine resolution remotely sensed image classification, for development: applications and techniques, Big Data Analytics 1 (1) (2016)
ISPRS J. Photogrammetry Remote Sens. 140 (2018) 133–144. 2–24.
[80] C. Liu, Y. Cao, Y. Luo, G. Chen, V. Vokkarane, M. Yunsheng, P. Hou, A new deep [109] G. Litjens, T. Kooi, B.E. Bejnordi, A.A.A. Setio, F. Ciompi, M. Ghafoorian, C.
learning based food recognition system for dietary assessment on an edge I. Sánchez, A survey on deep learning in medical image analysis, Med. Image
computing service infrastructure, IEEE transactions on services computing 11 (2) Anal. 42 (2017) 60–88.
(2018) 249–261. [110] M.H. Rehman, V. Chang, A. Batool, T.Y. Wah, Big data reduction framework for
[81] P. Li, Z. Chen, L.T. Yang, Q. Zhang, M.J. Deen, Deep convolutional computation value creation in sustainable enterprises, Int. J. Inf. Manag. 36 (6) (2016)
model for feature learning on big data in Internet of Things, IEEE Trans. Ind. Inf. 917–928.
14 (2) (2018), 790798. [111] S. Kraijak, P. Tuwanut, A survey on internet of things architecture, protocols,
[82] S. Leroux, S. Bohez, E. De Coninck, T. Verbelen, B. Vankeirsbilck, P. Simoens, possible applications, security, privacy, real-world implementation and future
B. Dhoedt, The cascading neural network: building the Internet of Smart Things, trends, in: Proceedings IEEE 16th International Conference on Communication
Knowl. Inf. Syst. 52 (3) (2017) 791–814. Technology, ICCT), 2015, pp. 23–30.
[83] J. Zhang, H. Huang, X. Wang, Resource provision algorithms in cloud computing: [112] D. Soni, A. Makwana, A survey on MQTT: a protocol of internet of things (IoT), in:
a survey, J. Netw. Comput. Appl. 64 (2016) 23–42. Proceedings of the International Conference on Telecommunication, Power
[84] A. Singh, D. Juneja, M. Malhotra, Autonomous agent based load balancing Analysis and Computing Techniques, ICTPACT), 2017, pp. 1–6.
algorithm in cloud computing, Procedia Comput. Sci. 45 (2015) 832–841. [113] A. Al-Fuqaha, M. Guizani, M. Mohammadi, M. Aledhari, M. Ayyash, Internet of
[85] D.C. Devi, V.R. Uthariaraj, Load balancing in cloud computing environment using things: a survey on enabling technologies, protocols, and applications, IEEE
improved weighted round robin algorithm for nonpreemptive dependent tasks, communications surveys and tutorials 17 (4) (2015) 2347–2376.
Sci. World J. 3 (9) (2016) 111–121. [114] C. Gomez, J. Oller, J. Paradells, Overview and evaluation of bluetooth low
[86] W. Jing, Q. Miao, G. Chen, An open scheduling framework for QoS resource energy: an emerging low-power wireless technology, Sensors 12 (9) (2012)
managemnt in the internet of things, KSII Transactions on Internet and 11734–11753.
Information Systems 12 (9) (2018) 4103–4121. [115] J.W. Hui, D.E. Culler, Extending IP to low-power, wireless personal area
[87] J. Li, Z. Lu, W. Zhang, J. Wu, H. Qiang, B. Li, P.C. Hung, SERAC3: smart and networks, IEEE Internet Computing (4) (2008) 37–45.
economical resource allocation for big data clusters in community clouds, Future [116] J. Nieminen, B. Patil, T. Savolainen, M. Isomaki, Z. Shelby, C. Gomez,
Generat. Comput. Syst. 85 (2018) 210–221. Transmission of IPV6 Packets over Bluetooth Low Energy Draft-Ietf-6lowpan-Btle-
1514
A.E. Edje et al. Digital Communications and Networks 9 (2023) 1486–1515
1515