0% found this document useful (0 votes)
128 views13 pages

Machine Learning For 6G Wireless Networks

Uploaded by

Pragash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
128 views13 pages

Machine Learning For 6G Wireless Networks

Uploaded by

Pragash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

©SHUTTERSTOCK/ELNUR

MACHINE LEARNING
FOR 6G WIRELESS
NETWORKS
Carrying Forward Enhanced Bandwidth, Massive Access,
and Ultrareliable/Low-Latency Service
Jun Du, Chunxiao Jiang, Jian Wang, Yong Ren, and Mérouane Debbah

T
o satisfy the expected plethora of demanding ser- networks has tended to be extremely heterogeneous,
vices, the future generation of wireless networks densely deployed, and dynamic. Combined with tight
(6G) has been mandated as a revolutionary para- quality of service (QoS), such complex architecture will
digm to carry forward the capacities of enhanced result in the untenability of legacy network operation
broadband, massive access, and ultrareliable and low- routines. In response, artificial intelligence (AI), especial-
latency service in 5G wireless networks to a more power- ly machine learning (ML), is emerging as a fundamental
ful and intelligent level. Recently, the structure of 6G solution to realize fully intelligent network orchestration
and management. By learning from uncertain and dynam-
Digital Object Identifier 10.1109/MVT.2020.3019650 ic environments, AI-/ML-enabled channel estimation and
Date of current version: 25 September 2020 spectrum management will open up opportunities for

122 ||| 1556-6072/20©2020IEEE IEEE VEHICULAR TECHNOLOGY MAGAZINE | DECEMBER 2020

Authorized licensed use limited to: Kongu Engineering College. Downloaded on February 23,2022 at 04:40:34 UTC from IEEE Xplore. Restrictions apply.
bringing the excellent performance of ultrabroadband different kinds of satellite Internet, consisting of a large
techniques, such as terahertz communications, into full number of satellites, were proposed and implemented in
play. Additionally, challenges brought by ultramassive recent years. For instance, the SpaceX project Starlink
access with respect to energy and security can be miti- initially planned to build a constellation of 12,000 sat-
gated by applying AI-/ML-based approaches. Moreover, ellites in low-Earth orbit, which has been expanded to
intelligent mobility management and resource allocation 42,000 recently. In addition, mobile network operators
will guarantee the ultrareliability and low latency of ser- are accelerating the dense deployment of small-cell base
vices. Concerning these issues, this article introduces stations to reduce service latency by avoiding backhaul
and surveys some state-of-the-art techniques based on transmission. Moreover, future large-scale Internet of
AI/ML and their applications in 6G to support ultrabroad- Things (IoT) systems in 6G will also bring challenges
band, ultramassive access, and ultrareliable and low- of spectrum management and massive or super access
latency services. control. Furthermore, the integration of highly dynam-
ic satellites, unmanned aerial vehicles (UAVs), and the
Motivation and Challenges Internet of Vehicles (IoV) will result in more frequent
Recently, the 5G wireless network was developed to sup- handovers, more uncertain user requirements, and
port enhanced mobile broadband (eMBB), massive more unpredictable wireless communication environ-
machine-type communications (mMTC), and ultrareli- ments than any previous generation of networks, which
able and low-latency communications (uRLLC) [1], makes it difficult to guarantee the ultrareliability and low
according to the report of the International Telecommu- latency of services.
nication Union. Benefitting from such high performance, Therefore, 6G networks are developing into more mul-
5G has opened new doors of opportunity toward emerg- tidimensional, heterogeneous, large-scale, and highly dy-
ing applications, e.g., augmented reality (AR), virtual namic systems. All of these characteristics make it urgent
reality (VR), tactile reality, mixed reality, and so on. How- to explore new techniques that are adaptive, flexible, and
ever, the new media, such as holographic communica- intelligent to bring a revolutionary leap of communica-
tions, will require much higher transmission speeds, up tions with ultrabroadband, ultramassive access support,
to terabits per second, than AR and VR. Thus, 5G is far ultrareliability, and low latency. In addition, enormous
from able to support the faster, more reliable, and larger- amounts of widely heterogeneous data generated from
scale communication requirements of these services. In 6G networks will require advanced mathematical tools
response, the investigation of future generations of wire- to extract meaningful information from these data and
less networks (6G) has been triggered, which promises then make decisions, including resource management
more powerful capacities in terms of ultrabroadband, and access control, pertaining to the proper functioning
super-massive access, ultrareliability, and low latency of 6G, which are hardly achieved by traditional network
than 5G does, as listed in Table 1 [1]. optimization techniques. In recent years, AI is emerging
To provide ubiquitous and various services, 6G net- as a fundamental paradigm to orchestrate communica-
works tend to be more comprehensive and multidimen- tion and information systems from bottom to top. For the
sional by integrating current terrestrial networks with
space-/air-based information networks and marine in-
formation networks; then, heterogeneous network re-
sources, as well as different types of users and data, will
Table 1  A comparison of key performance indexes
be also integrated, as depicted in Figure 1. According to between 4G, 5G, and 6G.
such architecture, 6G networks are conceived to be cell
4G 5G 6G
free, which means that users will move from one net-
work to another seamlessly and automatically to pursue Peak data rate 1 Gb/s 20 Gb/s $ 1 Tb/s
the most suitable and qualified communications without User-experienced 10 Mb/s 100 Mbit/s 1 Gb/s
data rate
manual management and configurations. On the con-
trary, current 5G networking technologies still mainly Spectrum 1× 3× 15 – 30×
efficiency
focus on a macro- and small-cell-based heterogeneous
architecture, which will be broken by the cell-free opera- Mobility 350 km/h 500 km/h $ 1,000 km/h
tion of 6G, and their performance will deteriorate when Latency 10 ms 1 ms # 100 μs
applied to 6G with brand new architectures. In addition, Connection density 10 5
10 6
107
how to manage and control 6G networks to realize the (devices/km2)
promising capacities of ultrabroadband, ultramassive Network energy 1× 100× 100–10,000×
access, ultrareliability, and low latency also poses great efficiency
challenges brought by increasing ultradense, heteroge- Area traffic 0.1 10 $ 1 Gb/s/m2
neous, and dynamic characteristics. Specifically, capacity Mb/s/m2 Mbit/s/m2

DECEMBER 2020 | IEEE VEHICULAR TECHNOLOGY MAGAZINE ||| 123


Authorized licensed use limited to: Kongu Engineering College. Downloaded on February 23,2022 at 04:40:34 UTC from IEEE Xplore. Restrictions apply.
foreseeable future, AI-enabled networks will open up new and unsupervised learning, deep learning (DL), and rein-
opportunities for smart and intelligent 6G networking. forcement learning (RL). DL aims to understand the rep-
As a major branch of AI, ML can establish an intel- resentations of data and can be modeled in supervised
ligent system that operates in complicated environ- learning, unsupervised learning, and RL. Therefore, in
ments. Recently, ML has mainly developed into many some surveys of ML, DL is not listed separately. As il-
branches, such as classical ML, including supervised lustrated in Figure 1, AI and ML techniques are expected

Software Defined Network Functions Terahertz


Cloud-Fog/Edge Network (SDN) Virtualization Communications
Fog/Edge Layer Cloud Layer
Management
Techniques

Cloud
Fog Gateway
Typical

FCP FCP FCP


SDWN
Cloud
Edge Gateway Controller
Gateway
Intelligent Wireless Base
Edge Node Radio Access Station
Tower Point
Resource/
Application
User Layer

Ultrareliable
Satellite Mobility Support
Space

Networks Body Area


≥1,000 km/h
Networks
Ultrabroadband High-Altitude Ultramassive
≥1 Tbit/s Platforms UAV
Access
Networks
Air

107 Devices/km2
Airborne
Internet
Land

Maritime
Broadband
IoV
Ocean

Smart City
Underwater

Underwater
Acoustic/Optical Ultralow-Latency
Communications Computing and Communication
≤100 µ s/s

Intelligent Network Management and Optimization in 6G

• Application Layer • Network Layers • Physical Layer

Smart city, smart home, smart Caching, traffic classification, anomaly Channel tracking/equalization/
health care, data mining/processing/ detection, throughput optimization, decoding, pathloss prediction/estimation,
prediction, dimension reduction, latency minimization, attack detection, intelligent beamforming, modulation
feature extraction, attack detection/ intelligent routing, traffic prediction/ mode selection, anti-jamming, channel
classification, caching, data offloading, control, access control, source access control, spectrum sensing/
error detection/prediction, allocation, encoding/decoding, and so on. management/allocation, physical-layer
data rate selection, and so on. security, and so on.

AI and ML Techniques

• Supervised Learning: • Unsupervised/Semisupervised Learning: • Reinforcement Learning: • Deep Learning:


Neural Networks K-Means, Q-Learning, Convolutional Neural
Decision Tree, ISOMAP, Policy Gradients, Network,
Naive Bayesian, Gaussian Mixture Model, SARSA, Recurrent Neural
K-Nearest Neighbors, Expectation Maximization, Deep Q Network, Network, Recursive
Logistic Regression, Locally Linear Embedding, and so on. Neural Network,
and so on. and so on. and so on.

Figure 1  An illustration of AI/ML applications in 6G to support ultrabroadband, ultramassive access, and ultrareliability/low latency.

124 ||| IEEE VEHICULAR TECHNOLOGY MAGAZINE | DECEMBER 2020

Authorized licensed use limited to: Kongu Engineering College. Downloaded on February 23,2022 at 04:40:34 UTC from IEEE Xplore. Restrictions apply.
to help 6G networks make more optimized and adaptive estimates. In this procedure, the prior transition prob-
data-driven decisions, alleviate communication chal- abilities between system states are important to the es-
lenges, and meet requirements from emerging services. timation performance of the Bayesian filter. RL then can
In this article, we focus on the scope of applying AI and be applied to optimize the state transition probabilities
ML to networking and resource management optimiza- from the feedback of previous estimates and, hence, im-
tion, aiming to bring about significant innovation of com- prove the performance of the Bayesian filter. Some other
munications on ultrabroadband, ultramassive access, feasible algorithms and applications in channel model-
ultrareliability, and low latency. ing and estimation are summarized as follows.
■■ Supervised learning: Supervised learning can be intro-
Intelligent Ultrabroadband Transmission in 6G duced to pathloss/shadowing prediction, localization,
In the bandwidth-hungr y age, 5G networks have interference management, channel estimation, and so
exploited the spectrum bands of sub-GHz and 1–6 GHz on. The feasible algorithms and models include radial
as efficiently as possible by introducing 24–100 GHz. basis function neural networks, feed-forward neural
However, the current spectrum bands are still hardly networks, K-nearest neighbor (KNN), multilayer per-
enough to meet the increasing demands. For instance, ception, relevance vector machine, and support vector
some emerging applications, such as holography, may machine (SVM).
require a data rate of up to terabits per second [1], ■■ Unsupervised learning: Channel modeling and estima-
which is almost three orders higher than typical 5G tion problems, such as optimal modulation, interfer-
communications. In response, terahertz communica- ence mitigation, duplexing configuration, node
tions, utilizing bands in the range of 0.1–10 THz as clustering, and multipath tracking, can be solved by
well as 140 -, 220 -, and 340 - GHz frequencies, are applying unsupervised learning algorithms, which
expected to support a data rate of up to terabits per include K-means, clustering algorithms, fuzzy C-means,
second [2]. To achieve such capacity-approaching per- and so on.
formance, accurate information of time-varying chan- ■■ DL: DL can be implemented for channel feature
nels is especially important to optimize terahertz extraction, channel state information (CSI) estima-
bandwidth allocation and improve spectrum efficien- tion, signal detection, and sparse signal recovery.
cy. In this section, we introduce some state-of-the-art Typical DL algorithms, such as convolutional neural
AI/ML applications in terahertz channel estimation networks, recurrent neural networks (RNNs), deep
and spectrum management. neural networks (DNNs), deep belief networks, and
deep Boltzmann machines, can be expected as good
AI-/ML-Enabled Terahertz Channel candidates.
Modeling and Estimation ■■ RL: RL can be introduced to channel tracking, chan-
At the terahertz frequency bands, the channels suffer nel selection, modulation mode selection, radio
from high atmospheric absorption resulting from the identification, and so on. Feasible algorithms and
water vapor in the air, which influences losses signifi- models include fuzzy RL, Q-learning, WoLF-PHC
cantly. In addition, free-space pathloss is also unavoid- (Win-or-Learn-Fast-Policy Hill-Climbing) Markov
able physically in terms of atmospheric attenuation. decision process (MDP), and partially obser v-
Furthermore, terahertz channels are observed as nonsta- able MDP.
tionary, especially for dynamic scenarios where both
users and objects might be moving. Therefore, tradition- Deep RL-Based Terahertz Spectrum Management
al channel models based on assumptions of being sta- At present, there exists no restriction on terahertz spec-
tionary or quasi-stationary can no longer apply to trum use. The spectra have been occupied already by
terahertz channels. some other applications, such as satellite services, spec-
ML algorithms are capable of analyzing the communi- troscopy, and meteorology [3]. Recently, the Federal
cation data and predicting likely signal loss in a given or Communications Commission has been investing in uti-
unknown environment. Therefore, many different types lizing terahertz spectrums for mobile services and appli-
of AI or ML algorithms can be applied to the physical cations. Therefore, spectrum-sharing methods are
layer (PHL) of 6G networks to deal with the difficulties necessary for the coexistence of future terahertz com-
just described for terahertz channel modeling and es- munications and the other existing applications listed
timation. For instance, to improve estimation accuracy previously. In addition, as discussed in the previous sec-
in dynamic scenarios, the RL-based Bayesian filter has tion, 6G networks tend to be multidimensional, ultra-
been introduced to the angle-of-arrival (AoA) estimation dense, and heterogeneous. Thus, considering that the
in terahertz channels in current studies. Specifically, propagation medium and channel characteristic in inte-
the Bayesian filter implements the estimation of the cur- grated 6G networks are significantly distinct compared
rent AoA from both current measurement and previous with terrestrial networks in 5G, it requires more effort to

DECEMBER 2020 | IEEE VEHICULAR TECHNOLOGY MAGAZINE ||| 125


Authorized licensed use limited to: Kongu Engineering College. Downloaded on February 23,2022 at 04:40:34 UTC from IEEE Xplore. Restrictions apply.
optimize the spectrum management of terahertz com- achieve near-optimal power allocation, promising max-
munications in 6G. imum sum-rate and scheduling fairness in real time. In
RL has the potential to realize smart or intelligent this framework, each transmitter collects QoS and CSI
spectrum management to deal with these problems, es- information of its neighbors, which will be analyzed to
pecially when large amounts of data can be leveraged estimate or extract random variations and delay in the
to train and predict. These training and prediction re- CSI using deep Q-learning [5]. Figure 2 illustrates this
sults can be taken advantage of to make decisions con- DRL-based power control mechanism.
cerning whether or not the spectrum band is occupied
and to take action, such as accessing or releasing the Dynamic Spectrum Allocation
spectrum band. In addition, through the interaction be- A distributed spectrum allocation framework also can
tween users and the wireless environment, users can be formulated based on multiagent DRL, in which the
optimize their strategies iteratively to maximize the agent refers to each device occupying the spectrum
value of reward functions, which can be established resource. In such a framework, the multiagent environ-
considering spectrum efficiency, network capacity, ment is established as a partially observable Markov
consumed energy, interference, and so on. However, RL game model. To deal with the instability of the environ-
is not competent for learning an effective action–value ment, a neighbor-agent actor-critic (NA AC) model,
policy when there exist random noise or measurement which trains the information from neighbor nodes in a
errors occupying the state observations, meaning that centralized manner but with decentralized implementa-
the number of states in the presence of random noise is tion, can be introduced to leverage the relationship
infinite in practice. Addressing this problem of random among devices sharing the spectrum resource to im­­
state measurements, deep RL (DRL) can be considered prove system performance, such as sum-data rate and
a suitable tool to optimize the decisions on spectrum spectrum efficiency. According to such a framework,
management in 6G networks involving dynamic spec- the historical information is used for training the RL
trum access, transmission power control, spectrum al- model but not the decisions of spectrum allocation [6].
location, and so on. This NAAC-based framework for spectrum allocation is
seen in Figure 2.
Distributed Spectrum Access
Distributed algorithms for dynamic spectrum access AI-/ML-Based Energy and Security
should be designed to adapt to general, complex, real- Management for Super IoT
world settings effectively and efficiently. Meanwhile, the 5G cellular networks introduced a new usage scenario
expensive computational consumption resulting from oriented to support massive IoT, namely, mMTC.
the large state space and partial observability in the sys- Toward 6G, the concept of a “super IoT” has been pro-
tem can be also mitigated. To achieve this goal, a long posed recently, which can be elaborated with symbiot-
short-term memory (LSTM) layer maintaining an internal ic radio and satellite-assisted IoT communications to
state and aggregate observations can be established to support an astonishing number of connected IoT devic-
ensure the ability of estimating the true state using past es and an extended coverage, respectively. Conse-
partial observations [4]. In addition, a dueling deep Q quently, more efficient energy management mechanisms
network (DQN) method also can be applied to improve are expected to support the large scale of IoT systems
the estimated Q-value resulting from bad states. After to operate stably for long periods of time. In addition,
training, users need to update only their DQN weights by privacy and security issues will face more severe chal-
communicating with the central unit and then map its lenges, especially for IoT systems collecting individual
local observation to spectrum access actions based on or sensitive information. This section introduces AI-/
the trained DQN. Such a spectrum access framework is ML-enabled energy and security management in super
implemented according to the procedure presented IoT systems.
in Figure 2.
Efficient Energy Management for Large-Scale
Distributed Dynamic Power Control Energy-Harvesting Networks
By applying challenging optimal problems, traditional In traditional IoT systems, low-power IoT devices are
power control techniques typically search the near- typically limited by the energy stored in their batteries.
optimal power allocation strategies. Such techniques Such energy shortcomings and limitations will bring
can hardly adapt to large-scale networks because of great challenges in energy management and optimiza-
their high computational complexity and precision tion when the scale of IoT systems grows sharply. In
requirement of instantaneous CSI. Model-free RL has response, energy-harvesting technologies have been
been introduced to deal with these problems in large- regarded as a promising approach to prolong the lifes-
scale and heterogeneous 6G networks and also can pan of super IoT systems by enabling IoT devices to

126 ||| IEEE VEHICULAR TECHNOLOGY MAGAZINE | DECEMBER 2020

Authorized licensed use limited to: Kongu Engineering College. Downloaded on February 23,2022 at 04:40:34 UTC from IEEE Xplore. Restrictions apply.
harvest energy from potential energy resources, e.g., solar energy and make it difficult to solve the optimization
and wind energy. However, such an energy-harvesting problem of energy management, since these energy
scheme is a random process resulting from the intensity constraints are always changing [7]. Feasible approach-
dynamic of energy resources, which means that the es for energy management in energy-harvesting-enabled
amount of energy stored in the battery of each IoT device IoT systems can be divided into offline management and
cannot be known precisely in advance. In addition, the online management, and the latter can be realized
controllable energy is constrained according to the cur- through centralized or distributed methods. Some typi-
rent stored energy, which is also capped by the battery cal mechanisms designed in recent studies are summa-
capacity. Therefore, these problems will lead to the rized in Table 2. Here, we analyze the advantages and
changing and uncertainty of the total controllable disadvantages of these approaches.

Interaction With
Environment Distributed Execution Centralized Training
Spectrum Access

Agent i LSTM
Advantage
Select an Action: of Action: Cell Cell Cell
A (sit, ait ) Input Output
Hidden
Channel Upload (sit, ati)
Selection and Local x0 x1 xn
Observations
Input: 1) Selected Channels at t –1
max Q (s t, at ) Average
i i 2) Capacity of Each Channel
a Q-Value of Input
3) ACK Signal Received Output
State V (sit )
Hidden

Observe
(st–1 t–1 Power Control
State i , a i ) and
Local Observations Mini-Batch θ t+1
sit train
Upload (sit, ati)
and Local
Communication

Observations Experience-
Environment

max Q (sit, a; θtarget

Take Backhaul Train DQN


Replay Optimizer
Power Control

Action Delay
t

Memory
a it
θ ttrain
Backhaul
Delay of
Receive
Update Td Slots θ ttrain Once per Tu Time Slots Output
a

Reward Input
θtarget Hidden
rit

Spectrum
Upload Historical s1t Allocation r1t
Information: Actor 1 Critic 1
States, Actions, a1t
Rewards of Agent i

Spectrum s2t
Resource Block r2t
Actor 2 Critic 2
Selection a2t

Download Weights
of Actor Network t
sN
Actor N Critic N
atN rNt

Figure 2 The DRL-based spectrum access, power control, and spectrum allocation in 6G networks. ACK: acknowledgment.

DECEMBER 2020 | IEEE VEHICULAR TECHNOLOGY MAGAZINE ||| 127


Authorized licensed use limited to: Kongu Engineering College. Downloaded on February 23,2022 at 04:40:34 UTC from IEEE Xplore. Restrictions apply.
Offline Management this centralized approach is not applicable when the
Recently, many offline-based energy management number of IoT devices is large, resulting from inevi-
approaches were designed by optimizing power alloca- table significant feedback overheads. In addition,
tion, access control, and so on. However, offline control MDPs always suffer the “curse of dimensionality,”
is essentially based on the assumption that the perfect which results in heavy computational loads and makes
information of energy and channel status can be MDPs intractable.
observed before the operation, which is hardly imple-
mentable in practice. Distributed Online AI-/ML-Based Management
Without any prior information about energy arrivals and
Centralized Online AI-/ML-Based Management channel status, a fully distributed online energy manage-
In contrast to offline management, online AI-/ML-based ment scheme will not require any information exchange
energy management approaches only require the pres- among IoT devices. Such distributed online energy man-
ent and previous energy arrivals and channel status agement is not easy to realize, considering that conver-
when implemented to improve the communication per- gence cannot always be guaranteed by applying such an
formance in energy-harvesting-enabled super IoT sys- approach, which results from the nonstationary envi-
tems. Based on the online AI/ML framework, power ronment. Moreover, many distributed online energy
allocation and access control problems can be estab- management schemes were proposed based on the
lished as stochastic control problems whose dis- assumption that the global system state is available for
cretized energy and channel states are then modeled as each device. To overcome these problems, a mean-field,
MDPs. However, this online AI/ML framework still multiagent DRL-based framework was proposed in [8] to
requires the perfect information of energy arrivals and learn the optimal power control to maximize the
channel status, which is hard to observe in practice. throughput of energy-harvesting-enabled super IoT sys-
The RL- and Lyapunov optimization-based methods tems. In [8], the throughput maximization problem was
emerged as a result, most of which searched approxi- modeled as a mean-field game having a unique station-
mate solutions in a centralized fashion. Nevertheless, ary solution, which ensures the convergence of the

Table 2 The typical energy management mechanisms in energy harvesting-enabled large-scale IoT systems.
First Introduced
(See “Tables 2 and 3
AI/ML Technique References”) Category Optimization Objective Applications
Water-filling A. Arafa 2018 [S1] Offline Throughput maximization Energy consumption
optimization
Integer linear H. Ayatollahi Offline Throughput maximization Communication
programming 2017 [S2] scheme selection
Directional water- O. Ozel 2011 [S3] Offline Throughput maximization, Power control
filling delay minimization
DNN, MDP M. K. Sharma Centralized online Time-averaged throughput Power control
2019 [S4] maximization
RL, DQN M. Chu 2019 [S5] Centralized online Uplink sum-rate maximization Multiaccess control
Lyapunov optimization H. Yu 2019 [S6] Centralized online Throughput maximization Power control
RL F. A. Aoudia 2018 [S7] Centralized online QoS maximization Energy harvesting and
RL A. Ortiz 2018 [S8] Centralized online Throughput maximization Power control
RL, MDP K. Wu 2019 [S9] Centralized online Data importance value Communication link
maximization control
Bayesian RL Y. Xiao 2015 [S10] Centralized online Long-term expected Power control, data
reward maximization transmission control
DNN, mean-field M. K. Sharma Distributed online Time-averaged throughput Power control
game (MFG) 2019 [S11] maximization
MDP, MFG D. Wang 2018 [S12] Distributed online Communication delay Power control
minimization
Stochastic game V. Hakami 2017 [S13] Distributed online Communication delay Power control
minimization
Multi-agent RL, A. Ortiz 2017 [S14] Distributed online Sum-rate maximization Power control
Markov game

128 ||| IEEE VEHICULAR TECHNOLOGY MAGAZINE | DECEMBER 2020

Authorized licensed use limited to: Kongu Engineering College. Downloaded on February 23,2022 at 04:40:34 UTC from IEEE Xplore. Restrictions apply.
problem. In addition, each IoT device applies DRL indi- embedded intelligence in IoT devices and systems,
vidually to find the optimal power control without any AI-/ML-based security technologies are leveraged to deal
prior information about energy arrivals and channel sta- with these security problems. Next, we discuss some
tus. This distributed approach can achieve throughput existing AI-/ML-based solutions and feasible research
close to centralized policies and can be implemented in directions for addressing authentication, access control,
large-scale IoT systems in practice. and attack detection in super IoT systems. Some recent
typical studies are summarized in Table 3.
Privacy and Security Guarantee for Super IoT
The extremely vast amounts of IoT devices and data AI-/ML-Based Authentication and Access Control
bring great challenges to privacy preservation and secu- Authentication and access control can help IoT devices
rity guarantee. To protect super IoT systems from vari- distinguish identity-based attacks and prevent unauthor-
ous kinds of threats and attacks, authentication, access ized devices from accessing authorized systems [9]. To
control, and attack detection are of paramount impor- improve authentication accuracy, different AI-/ML-based
tance; traditional privacy and security technologies are approaches can perform well based on different scenari-
hardly applicable to super IoT, resulting from the hetero- os and assumptions. In the following, we investigate
geneity of resources, volume of networks, limited energy some feasible AI-/ML-based solutions in authentication
and storage of devices, and so on. By providing and access control problems in super IoT systems.

Table 3 Typical AI-/ML-based security mechanisms in large-scale IoT systems.


AI/ML Typical Research (See
Techniques “Tables 2 and 3 References”) Security Problems Attacks Performance Optimization
Neural network J. M. McGinthy 2019 [S15] Authentication Spoofing Classification accuracy
and delay
DRL A. Ferdowsi 2019 [S16] Authentication Man-in-the-middle Extraction error rate,
and data injection detection delay
SVM, LSTM, J. Chauhan 2018 [S17] Access control Spoofing Classification accuracy,
DL, RNN feature extraction time,
inference time
DNN, SVM C. Shi 2017 [S18] Authentication Spoofing False alarm rate
Q-learning, L. Xiao 2016 [S19] Authentication Spoofing False alarm rate, average
Dyna-Q error rate, detection accuracy
Nash Q-learning Y. Li 2017 [S20] Access control DoS attack Root mean error
DRL Y. Wang 2019 [S21] Malware detection Malware attack Detection accuracy
RL H. S. Anderson 2018 [S22] Malware evasion Malware attack Successful rate of evasion
Q-learning, L. Xiao 2017 [S23] Malware detection Malware attack Detection accuracy
Dyna-Q and delay
RF, KNN, F. A. Narudin 2016 [S24] Malware detection, Malware attack, True positive rate, false positive
Bayesian net access control intrusion rate, detection precision
SVM, DQN M. P. Arthur 2019 [S25] Attack detection Jamming, spoofing, Detection accuracy
intrusion
DRL N. Abuzainab 2019 [S26] Attack detection, Jamming, Detection accuracy,
secure routing and eavesdropping system throughput
transmission
DL A.A. Diro 2018 [S27] Attack detection and DoS, probe, Detection precision
mitigation R2L, U2R and delay
Semi-supervised S. Rathore 2018 [S28] Attack detection and DoS, probe, Detection precision, positive
Fuzzy C-means mitigation R2L, U2R predictive value, sensitivity
Decision tree E. Viegas 2018 [S29] Anomaly intrusion Intrusion Detection accuracy
detection
KNN, ANN, RF, R. Doshi 2018 [S30] Attack detection and DDoS Detection accuracy
decision tree mitigation
DQN G. Han 2017 [S31] Secure channel Jamming SINR
selection
R2L: remote to local; U2R: user to root; SINR: signal-to-interference-plus-noise ratio.

DECEMBER 2020 | IEEE VEHICULAR TECHNOLOGY MAGAZINE ||| 129


Authorized licensed use limited to: Kongu Engineering College. Downloaded on February 23,2022 at 04:40:34 UTC from IEEE Xplore. Restrictions apply.
■■ RL: Q-learning-based approaches can be applied in Resulting from the uncertain channel environment and
PHL authentication, which is realized by comparing unpredicted spoofing model, each IoT device needs to
the PHL feature of a message with the claimed trans- estimate the false alarm and misdetection rate of
mitter. In this procedure, the authentication accuracy spoofing, the future states of which are independent of
depends on the test threshold in the comparisons. the previous states and actions. Therefore, the

Tables 2 and 3 References


[S1] A. Arafa and S. Ulukus, “Mobile energy harvesting nodes: [S16] A. Ferdowsi and W. Saad, “Deep learning for signal authen-
Offline and online optimal policies,” IEEE Trans. Green Com- tication and security in massive Internet-of-Things systems,”
mun. Netw., vol. 2, no. 1, pp. 143–153, Mar. 2018. doi: 10.1109/ IEEE Trans. Commun., vol. 67, no. 2, pp. 1371–1387, Feb. 2019.
TGCN.2017.2777668. doi: 10.1109/TCOMM.2018.2878025.
[S2] H. Ayatollahi, C. Tapparello, and W. Heinzelman, “Reinforce- [S17] J. Chauhan, S. Seneviratne, Y. Hu, A. Misra, A. Seneviratne,
ment learning in MIMO wireless networks with energy har- and Y. Lee, “Breathing-based authentication on resource-
vesting,” in Proc. IEEE Int. Conf. Commun. (ICC), pp. 1–6, Paris, constrained IoT devices using recurrent neural networks,”
France, May 21-25, 2017. doi: 10.1109/ICC.2017.7997229. Computer, vol. 51, no. 5, pp. 60–67, May 2018. doi: 10.1109/
[S3] O. Ozel, K. Tutuncuoglu, J. Yang, S. Ulukus, and A. Yener, MC.2018.2381119.
“Transmission with energy harvesting nodes in fading wire- [S18] C. Shi, J. Liu, H. Liu, and Y. Chen, “Smart user authentication
less channels: Optimal policies,” IEEE J. Sel. Areas Com- through actuation of daily activities leveraging WiFi-enabled
mun., vol. 29, no. 8, pp. 1732–1743, Sept. 2011. doi: 10.1109/ IoT,” in Proc. ACM Int. Symp. Mobile Ad Hoc Netw. Comput.,
JSAC.2011.110921. Chennai, India, July 2017, pp. 1–10. doi: 10.1145/3084041.
[S4] M. K. Sharma, A. Zappone, M. Debbah, and M. Assaad, “Deep 3084061.
learning based online power control for large energy harvest- [S19] L. Xiao, Y. Li, G. Han, G. Liu, and W. Zhuang, “PHY-layer
ing networks,” in Proc. IEEE Int. Conf. Acoustics, Speech Sig- spoofing detection with reinforcement learning in wireless
nal Process. (ICASSP), Brighton, May 12–17, 2019, pp. 8429– networks,” IEEE Trans. Veh. Tech., vol. 65, no. 12, pp. 10,037–
8433. doi: 10.1109/ICASSP.2019.8683468. 10,047, Dec. 2016. doi: 10.1109/TVT.2016.2524258.
[S5] M. Chu, H. Li, X. Liao, and S. Cui, “Reinforcement learning- [S20] Y. Li, D. E. Quevedo, S. Dey, and L. Shi, “SINR-based DoS at-
based multiaccess control and battery prediction with energy tack on remote state estimation: A game-theoretic approach,”
harvesting in IoT systems,” IEEE Internet Things J., vol. 6, no. IEEE Trans. Contr. Netw. Syst., vol. 4, no. 3, pp. 632–642, Sept.
2, pp. 2009–2020, Apr. 2019. doi: 10.1109/JIOT.2018.2872440. 2017. doi: 10.1109/TCNS.2016.2549640.
[S6] H. Yu, Z. Zhou, C. Pan, X. Zhao, and S. Mumtaz, “Online re- [S21] Y. Wang, J. W. Stokes, and M. Marinescu, “Neural malware
source allocation for energy harvesting based large-scale control with deep reinforcement learning,” in Proc. IEEE Mili-
multiple antenna systems,” in Proc. IEEE Globecom Work- tary Commun. Conf. (MILCOM), Norfolk, VA, Nov. 12–14, 2019,
shops (GC Wkshps), Waikoloa, HI, Dec. 9–13, 2019, pp. 1–6. pp. 1–8. doi: 10.1109/MILCOM47813.2019.9020862.
doi: 10.1109/GCWkshps45667.2019.9024449. [S22] H. S. Anderson, A. Kharkar, B. Filar, D. Evans, and P. Roth,
[S7] F. Ait Aoudia, M. Gautier, and O. Berder, “RLMan: An energy “Learning to evade static PE machine learning malware
manager based on reinforcement learning for energy har- models via reinforcement learning,” Jan. 30, 2018, arX-
vesting wireless sensor networks,” IEEE Trans. Green Com- iv:1801.08917.
mun. Netw., vol. 2, no. 2, pp. 408–417, June 2018. doi: 10.1109/ [S23] L . Xiao, Y. Li, X. Huang, and X. Du, “Cloud-based malware de-
TGCN.2018.2801725. tection game for mobile devices with offloading,” IEEE Trans.
[S8] A. Ortiz, T. Weber, and A. Klein, “A two-layer reinforcement Mobile Comput., vol. 16, no. 10, pp. 2742–2750, Oct. 2017. doi:
learning solution for energy harvesting data dissemination 10.1109/TMC.2017.2687918.
scenarios,” in Proc. IEEE Int. Conf. Acoustics, Speech Signal [S24] F. A. Narudin, A. Feizollah, N. B. Anuar, and A. Gani, “Evalu-
Process. (ICASSP), Calgary, Canada, Apr. 15–20, 2018, pp. ation of machine learning classifiers for mobile malware de-
6648–6652. doi: 10.1109/ICASSP.2018.8462056. tection,” Soft Comput., vol. 20, no. 1, pp. 343–357, Jan. 2016.
[S9] K. Wu, H. Jiang, and C. Tellambura, “Sensing, probing, and doi: 10.1007/s00500-014-1511-6.
transmitting policy for energy harvesting cognitive radio [S25] M. P. Arthur, “Detecting signal spoofing and jamming attacks
with two-stage after-state reinforcement learning,” IEEE in UAV networks using a lightweight IDS,” in Proc. Int. Conf.
Trans. Veh. Tech., vol. 68, no. 2, pp. 1616–1630, Feb. 2019. doi: Comput., Inform. Telecommun. Syst. (CITS), Beijing, China,
10.1109/TVT.2018.2888826. Aug. 28–31, 2019, pp. 1–5. doi: 10.1109/CITS.2019.8862148.
[S10] Y. Xiao, D. Niyato, Z. Han, and L. A. DaSilva, “Dynamic en- [S26] N. Abuzainab et al., “QoS and jamming-aware wireless net-
ergy trading for energy harvesting communication networks: working using deep reinforcement learning,” in Proc.
A stochastic energy trading game,” IEEE J. Sel. Areas Com- IEEE Military Commun. Conf. (MILCOM), Norfolk, VA, Nov.
mun., vol. 33, no. 12, pp. 2718–2734, Dec. 2015. doi: 10.1109/ 12–14, 2019, pp. 610–615. doi: 10.1109/MILCOM47813.2019.
JSAC.2015.2481204. 9020985.
[S11] M. K. Sharma, A. Zappone, M. Assaad, M. Debbah, and S. [S27] D. A. Abeshu and C. Naveen, “Distributed attack detection
Vassilaras, “Distributed power control for large energy har- scheme using deep learning approach for Internet of Things,”
vesting networks: A multi-agent deep reinforcement learning Future Gener. Comput. Syst., vol. 82, pp. 761–768, May 2018.
approach,” IEEE Trans. Cogn. Commun. Netw., vol. 5, no. 4, doi: 10.1016/j.future.2017.08.043.
pp. 1140–1154, Dec. 2019. doi: 10.1109/TCCN.2019.2949589. [S28] S. Rathore and J. H. Park, “Semi-supervised learning based
[S12] D. Wang, W. Wang, Z. Zhang, and A. Huang, “Delay-optimal distributed attack detection framework for IoT,” Appl. Soft
random access for large-scale energy harvesting networks,” Comput., vol. 72, pp. 79–89, May 2018. doi: 10.1016/j.asoc.2018.
in Proc. IEEE Int. Conf. Commun. (ICC), Kansas City, MO, May 05.049.
20–24, 2018, pp. 1–6. doi: 10.1109/ICC.2018.8422272. [S29] E. Viegas, A. Santin, V. Abreu, and L. S. Oliveira, “Enabling
[S13] V. Hakami and M. Dehghan, “Distributed power control for anomaly-based intrusion detection through model general-
delay optimization in energy harvesting cooperative relay ization,” in Proc. IEEE Symp. Comput. Commun. (ISCC), Natal,
networks,” IEEE Trans. Veh. Tech., vol. 66, no. 6, pp. 4742– Brazil, June 25–28, 2018, pp. 934–939. doi: 10.1109/ISCC.2018.
4755, June 2017. doi: 10.1109/TVT.2016.2610444. 8538524.
[S14] A. Ortiz, H. Al-Shatri, X. Li, T. Weber, and A. Klein, “Reinforce- [S30] R . Doshi, N. Apthorpe, and N. Feamster, “Machine learning
ment learning for energy harvesting decode-and-forward DDoS detection for consumer Internet of Things devices,” in
two-hop communications,” IEEE Trans. Green Commun. Proc. IEEE Secur. Privacy Workshop (SPW), San Francisco, CA,
Netw., vol. 1, no. 3, pp. 309–319, Sept. 2017. doi: 10.1109/ May 24, 2018, pp. 29–35. doi: 10.1109/SPW.2018.00013.
TGCN.2017.2703855. [S31] G. Han, L. Xiao, and H. V. Poor, “Two-dimensional anti-jam-
[S15] J. M. McGinthy, L. J. Wong, and A. J. Michaels, “Groundwork ming communication based on deep reinforcement learn-
for neural network-based specific emitter identification au- ing,” in Proc. IEEE Int. Conf. Acoustics, Speech Signal Process.
thentication for IoT,” IEEE Internet Things J., vol. 6, no. 4, pp. (ICASSP), New Orleans, LA, Mar. 5–9, 2017, pp. 2087–2091.
6429–6440, Aug. 2019. doi: 10.1109/JIOT.2019.2908759. doi: 10.1109/ICASSP.2017.7952524.

130 ||| IEEE VEHICULAR TECHNOLOGY MAGAZINE | DECEMBER 2020

Authorized licensed use limited to: Kongu Engineering College. Downloaded on February 23,2022 at 04:40:34 UTC from IEEE Xplore. Restrictions apply.
problem of threshold selection can be modeled as an ultrareliable and low-latency communications and also
MDP with finite states. are confronted with challenges brought by high dynam-
■■ Supervised learning: Different from the threshold decision ics, multidimensionality, and significant heterogeneity. In
in Q-learning-based approaches just described, the CSI this section, we discuss some AI-/ML-based solutions tar-
can be exploited through supervised learning to learn geting the improvement of the reliability and timeliness of
how the channel changes, and then the PHL authentica- communications in 6G.
tion problems can be formulated as binary classification
problems, which are threshold free. Typical supervised Intelligent Mobility and Handover Management in 6G
learning algorithms, such as decision tree, SVM, KNN, High-speed mobility of elements in 6G, including satel-
and ensemble learning, then can be introduced to such lites, UAVs, vehicles, and so on, will result in frequent
classification problems to identify the legitimate or illegit- handovers, making the connections and communica-
imate information according to the CSI. tions unstable and unreliable. Moreover, the service
■■ Unsupervised learning: Unsupervised learning, such as requirements of low latency and high transmission rate
nonparametric Bayesian methods, can be introduced will also make it more challenging to achieve efficient
in proximity-based authentication and access control mobility and handover management. Therefore, to sup-
to identify the IoT devices in the proximity without port ultrareliable and low-latency applications in 6G,
leaking the localization and other privacy-sensitive DRL, DL, and RL will be capable and powerful tools to
information of IoT devices. endow the mobility management with intelligence and
■■ DL: According to the CSI in Wi-Fi or other radio signals adaptivity [11].
generated by IoT devices, human physiological and ■■ DRL: In a UAV-enabled 6G network, UAVs can perform
behavioral characteristics can be learned by applying as DRL agents. They can observe the environment
multilayer DNN [10]. Based on activity recognition and states, such as the movement velocity, current posi-
identification, authentication and access control tion, and link quality, and then make the best decisions
schemes then can be designed. in terms of mobility and handover actions to maximize
their rewards, which can be defined considering the
AI-/ML-Based Attack Analysis and Detection link stability, channel quality, transmission latency and
Similar to applications in authentication and access con- capacity, and so on. By interacting with the dynamic
trol, AI/ML technologies also can be applied to analyze environment, UAVs will learn their strategies of mobili-
and detect different kinds of attacks, such as spoofing, ty and handover management automatically and
jamming, denial of service (DoS) or distributed DoS robustly to minimize transmission latency and hando-
(DDoS) attacks, eavesdropping, malware attacks, and so ver failure probability and then will achieve highly reli-
on [9]. For instance, supervised learning, including SVM, able wireless connections in the system.
KNN, random forest (RF), and DNN, can be introduced to ■■ DL: It is necessary to achieve the precise estimation of
detect these attacks by building classification and regres- state for mobility and handover management of UAVs.
sion models. In addition, unsupervised learning can inves- However, the inaccuracies associated with onboard
tigate unlabeled data to divide them into different groups; measurements, such as unpredictable drifts, biases,
e.g., multivariate correlation analysis can help to detect and immense noise resulting from significant vibra-
DoS and DDoS attacks. In some recent studies, RL algo- tion of UAVs’ rotors, make it difficult to obtain accu-
rithms have been applied to help IoT devices make deci- rate state estimates. A DL-based framework that can
sions on the selection of security protocols against apply the ANN, RNN, and so on may help to improve
attacks. The feasible algorithms include Q-learning, DQN, the accuracy of state estimation. To be specific, a
Dyna-Q, and so on. DNN can be trained to identify the associated mea-
surement noise models and then filter them out of the
AI-/ML-Enabled Ultrareliable/Low-Latency Applications final estimation. To further reduce computation com-
Satellite, UAV, and IoV communications will be integrated plexity, the dropout technique also can be adopted
in 6G networks, for which the high dynamics of channel, when training this DNN. In addition, DL also can be
environment, and traffic, as well as increasingly delay- applied to predict trajectories of UAVs. By learning
sensitive applications, require more reliable and low- the movement behaviors of UAVs according to the
latency transmission technologies to guarantee measurement information, the positional relation-
communication connectivity and timeliness. In addition, ships among UAVs can be analyzed. Based on such
accompanying frequent resource allocation, network information, mobility and handover mechanisms with
reconfiguration, and service customization also depend high success rates can be designed. Furthermore,
heavily on reliable, low-latency, and flexible network man- LSTM also can perform as a powerful tool to design
agement. To satisfy such needs, mobility management efficient mobility and handover management schemes
and offloading techniques are expected to support [12]. By training the previous and current mobility

DECEMBER 2020 | IEEE VEHICULAR TECHNOLOGY MAGAZINE ||| 131


Authorized licensed use limited to: Kongu Engineering College. Downloaded on February 23,2022 at 04:40:34 UTC from IEEE Xplore. Restrictions apply.
contexts of UAVs, the sequence of future time-depen- computation offloading or caching constitute a discrete
dent mobility states and trajectories of UAVs can be action space. On the other hand, the possible resource
obtained, which can be considered to optimize hando- volume, which will be provided by the selected node
ver parameters. for offloading, is usually a continuous value. Such re-
■■ RL: Cooperative Q-learning-based parameters on the source allocation problems with continuous–discrete
optimization of mobility-sensitive handover can learn hybrid decision spaces tend to be extremely complex,
the required parameters appropriate for specific especially when time-varying tasks, energy harvesting,
velocity conditions in UAV-enabled 6G networks, and security issues are also considered.
which can be adapted to the realistic environment, To provide low-latency computing services, we have
where UAVs have time-varying velocities. To avoid fre- carried out some preliminary work focusing on the hy-
quent handovers and reduce handover and connectivi- brid decision of computation offloading in 6G networks
ty failures, dynamic fuzzy Q-learning can be utilized to based on DRL. As demonstrated in Figure 3, energy-
optimize the handover parameters and then guarantee harvesting-enabled devices can offload their compu-
the reliability and efficiency of UAV-enabled connec- tational tasks to edge computing servers. The server
tions and communications. selection problem is modeled in a discrete action space;
meanwhile, the decision spaces of offloading ratio and
Intelligent Communication and Computing local computation capacity are continuous. In the DRL
Resource Allocation framework, at each step after observing the states of
Driven by the exponentially growing demands of multi- systems (such as task load, battery level, harvested
media data traffic and computation-heavy applications, energy of each device, channel status, and computa-
6G networks are expected to achieve a high QoS with tion capacity of each device and server), the possible
ultrareliability and low latency. In response, resource computation offloading decisions, including server
allocation has been considered an important factor selection, offloading ratio, and local computation ca-
that can improve 6G performance directly by configur- pacity allocation, are contained in the sets of possible
ing heterogeneous resources effectively and efficiently. actions. Each device then selects the best actions from
In 6G, the allocated resource can be divided into commu- these sets to maximize its reward, which is determined
nication resources, which include channels and band- by latency, energy cost, reliability, and so on. The de-
width, and computing resources, such as memory and tailed modeling and implementation of this proposed
processing power. In recent years, various traffic offload- mechanism are provided in [15]. To validate the ef-
ing, caching, and cloud/fog/edge computing mechanisms ficiency and superiority of our proposed hybrid ac-
designed to allocate these communication, storage, tion–critic-based computation offloading approach, we
and computing resources in heterogeneous networks, test the average rewards received and execution time
respectively, have become promising solutions to handle compared with those of deep Q-learning-based offload-
the increasing data and computational requirements ing, server execution, and device execution. The latter
with low-latency and on-demand services. In addition, two mechanisms indicate executing all computational
different AI/ML techniques, such as RL, DRL, double tasks at the selected server remotely and at the device
DRL, and so on, have been introduced to these resource locally, respectively. Simulation results in Figure 3(b)
allocation techniques to deal with the sophisticated opti- and (c) indicate that, with different task arrival rates
mization of decision making resulting from the multidi- and allowed maximum harvested energy, the proposed
mensionality, random uncertainty, and dynamics of 6G. approach can achieve the highest reward and smallest
By applying AI/ML tools, valuable information can be time latency among the four schemes.
extracted through training observed data, and then dif-
ferent functions for prediction, optimization, and deci- Conclusions
sion making in traffic offloading, caching, and cloud/fog/ To satisfy emerging services and applications, AI-/ML-
edge computing can be learned to support ultrareliable enabled 6G networks have been considered fundamen-
and low-latency services [13]. tal enablers to carry forward the capacities of eMBB,
However, most current RL- or DRL-based resource mMTC, and uRLLC in 5G to a more powerful and intelli-
allocation approaches were modeled in a discrete ac- gent level. In this article, we focused on some solutions
tion space, which restricts the optimization of offload- of applying AI and ML tools to 6G networking and
ing decisions in a limited action space [14]. Such a model resource management optimization. We illustrated intel-
assumption is unreasonable in practice, where the ligent terahertz techniques, such as AI-/ML-enabled
action space of offloading decisions is often a continu- terahertz channel estimation and spectrum manage-
ous–discrete hybrid. To be specific, in a task offload- ment, which are considered revolutionary, to achieve
ing-enabled 6G network, the strategies for determining an ultrabroadband transmission. In addition, we intro-
which node should be selected to implement traffic/ duced AI/ML applications in energy management,

132 ||| IEEE VEHICULAR TECHNOLOGY MAGAZINE | DECEMBER 2020

Authorized licensed use limited to: Kongu Engineering College. Downloaded on February 23,2022 at 04:40:34 UTC from IEEE Xplore. Restrictions apply.
especially for large-scale energy-harvesting networks. and dynamic environments, adapt to unpredictable
Moreover, AI-/ML-based security enhancement mecha- changes in an intelligent and automated fashion, and
nisms, including authentication, access control, and then achieve significantly improved performance in
attack detection, were discussed for super IoT systems. aspects of ultrabroadband, ultramassive access, ultrare-
Such intelligentization of energy and security will help liability, and low latency.
to achieve efficient and reliable ultramassive access. There are still many challenges to realize comprehen-
Furthermore, we introduced some efficient mobility and sive and mature applications of AI/ML techniques in 6G.
handover management approaches based on DRL, DL, Especially for current computing devices with limited
and Q-learning to realize ultrareliable and stable trans- power, memory, storage, and processing capacities, how
mission links and satisfy the high dynamics in 6G. Final- to modify AI-/ML-based algorithms and mechanisms,
ly, intelligent resource allocation technologies, which bring high complexity and huge amounts of com-
including traffic, storage, and computing offloading putation, to get closer to practical implementation is
mechanisms, were identified to meet the requirements worthy of further investigation. In addition, varied and
of ultrareliability and low latency in 6G services. As emerging application scenarios and new AI/ML tech-
investigated in this article, AI-/ML-enabled techniques niques may also bring challenges to the implementation
may allow future 6G networks to learn from uncertain of intelligent technologies in 6G.

Interaction With Distributed


Environment Execution Centralized Training
Communication Observe Upload Historical
State Device m Output Action
Environment Information: DRL
t
sm
States: Actor i : States, Actions, Continuous Action:
• Task Load L Select an Action and Rewards Actor µ (s; φµ) u1, u2, …, uN ui = (αi , fi )
• Battery Level b Take of Device m
• Harvested Action Strategy:
Energy e
t , ft, i
am m Offloading (1 – α)l
• Important Data to Server i ;
Factor Z Receive Locally Processing Download Weights Discrete Action:
Critic Q (s, i, u; φQ)
"

• Channel Gain gi Reward αl Data With CPU of Actor Network Q1, Q2, …, QN i = arg maxj ∈NQi
• Computing ri t Frequency f
Capacity fi
(a)

Hybrid-AC DQLO (5 States) DQLO (10 States) Server Execution


Device Execution Exhaustive Search Upper Bound

20 100

10
Average Consumed Time (s)
Average Rewards

0 10–1

–10

–20 10–2

–30

10–3
–40
0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 6 7 8 9 10 11 12
Requested Task Load ζ Maximum Harvested Energy emax (10–4 J)
(b) (c)

Figure 3 The framework and simulation results of a hybrid decision-controlled DRL-based dynamic computation offloading scheme in
large-scale IoT systems. (a) An illustration of a hybrid decision-controlled DRL-based dynamic computation offloading scheme. (b) Perfor-
mance versus different task arrival rate. (c) Performance versus different maximum harvested energy. Hybrid-AC: hybrid action–critic;
DQLO: deep Q-learning-based offloading.

DECEMBER 2020 | IEEE VEHICULAR TECHNOLOGY MAGAZINE ||| 133


Authorized licensed use limited to: Kongu Engineering College. Downloaded on February 23,2022 at 04:40:34 UTC from IEEE Xplore. Restrictions apply.
Acknowledgments     M é r o u a n e D e b b a h ( m e ro u a n e
This research was supported by the National Natural [email protected]) is vice president of
Science Foundation China under project 61971257, China the Huawei France Research Center. He is
Postdoctoral Science Foundation under special grant jointly director of the Mathematical and
2019T120091 and grant 2018M640130, and the project Algorithmic Sciences Lab as well as the
“The Verification Platform of Multi-tier Coverage Commu- Lagrange Mathematical and Computing
nication Network for Oceans (LZC0020).” The corre- Research Center. He has managed eight European Union
sponding authors of this article are Chunxiao Jiang and projects and more than 24 national and international
Yong Ren. projects. His research interests lie in fundamental mathe-
matics, algorithms, statistics, information, and communi-
Author Information cation sciences research. He is a Fellow of IEEE.
Jun Du ([email protected]) currently
holds a postdoctoral position with the References
[1] Z. Zhang et al., “6G wireless networks: Vision, requirements, archi-
Department of Electrical Engineering, tecture, and key technologies,” IEEE Veh. Technol. Mag., vol. 14, no.
Tsinghua University, China. Her research 3, pp. 28–41, Sept. 2019. doi: 10.1109/MVT.2019.2921208.
[2] I. F. Akyildiz, J. M. Jornet, and C. Han, “Teranets: Ultra-broadband com-
interests are mainly in resource alloca- munication networks in the terahertz band,” IEEE Wireless Commun.,
tion and system security of heteroge- vol. 21, no. 4, pp. 130–135, Aug. 2014. doi: 10.1109/MWC.2014.6882305.
[3] R. Singh and D. Sicker, “Beyond 5G: THz spectrum futures and im-
neous networks and space-based information networks. plications for wireless communication,” in Proc. 30th European Conf.
She is the recipient of the Best Paper Award from the Int. Telecommunication Society (ITS), Helsinki, Finland, June 16–19,
2019. [Online]. Available: https://www.econstor.eu/bitstream/
IEEE International Conference on Communications 2019 10419/205213/1/Singh-Sicker.pdf
and the Best Paper Award from the International Confer- [4] O. Naparstek and K. Cohen, “Deep multi-user reinforcement learning for
distributed dynamic spectrum access,” IEEE Trans. Wireless Commun.,
ence on Wireless Communications and Mobile Comput- vol. 18, no. 1, pp. 310–323, Jan. 2019. doi: 10.1109/TWC.2018.2879433.
ing in 2020. She is a Member of IEEE. [5] Y. S. Nasir and D. Guo, “Multi-agent deep reinforcement learning
for dynamic power allocation in wireless networks,” IEEE J. Sel. Ar-
    Chunxiao Jiang ([email protected] eas Commun., vol. 37, no. 10, pp. 2239–2250, Oct. 2019. doi: 10.1109/
.cn) is an associate professor at the JSAC.2019.2933973.
[6] Z. Li and C. Guo, “Multi-agent deep reinforcement learning based
School of Information Science and Tech- spectrum allocation for D2D underlay communications,” IEEE
nology, Tsinghua University, China. His Trans. Veh. Technol., vol. 69, no. 2, pp. 1828–1840, Dec. 2019. doi:
10.1109/TVT.2019.2961405.
research interests include the application [7] Y. Al-Eryani and E. Hossain, “The D-OMA method for massive mul-
of game theory, optimization, and statisti- tiple access in 6G: Performance, security, and challenges,” IEEE
Veh. Technol. Mag., vol. 14, no. 3, pp. 92–99, Sept. 2019. doi: 10.1109/
cal theories to communication, networking, and MVT.2019.2919279.
resource allocation problems, in particular, space net- [8] M. K. Sharma, A. Zappone, M. Assaad, M. Debbah, and S. Vassilaras,
“Distributed power control for large energy harvesting networks:
works and heterogeneous networks. He is a Senior Mem- A multi-agent deep reinforcement learning approach,” IEEE Trans.
ber of IEEE. Cogn. Commun. Netw., vol. 5, no. 4, pp. 1140–1154, Dec. 2019. doi:
10.1109/TCCN.2019.2949589.
    Jian Wang ([email protected] [9] L. Xiao, X. Wan, X. Lu, Y. Zhang, and D. Wu, “IoT security techniques
.cn) joined the faculty of Tsinghua Univer- based on machine learning: How do IoT devices use AI to enhance
security?” IEEE Signal Process. Mag., vol. 35, no. 5, pp. 41–49, Sept.
sity, China, in 2006, where he is currently 2018. doi: 10.1109/MSP.2018.2825478.
an associate professor with the Depart- [10] A. Ferdowsi and W. Saad, “Deep learning for signal authentica-
tion and security in massive internet-of-things systems,” IEEE
ment of Electronic Engineering. His re­­ Trans. Commun., vol. 67, no. 2, pp. 1371–1387, 2018. doi: 10.1109/
search interests include the application of TCOMM.2018.2878025.
[11] A. Stamou, N. Dimitriou, K. Kontovasilis, and S. Papavassiliou, “Au-
statistical theories, optimization, and machine learning tonomic handover management for heterogeneous networks in
to communication, networking, navigation, and resource a future internet context: A survey,” IEEE Commun. Surveys Tuts.,
vol. 21, no. 4, pp. 3274–3297, Fourthquarter 2019. doi: 10.1109/
allocation problems, in particular, heterogeneous net- COMST.2019.2916188.
works and intelligent collaborative systems. He is a [12] H. Ye, L. Liang, G. Y. Li, J. Kim, L. Lu, and M. Wu, “Machine learn-
ing for vehicular networks: Recent advances and application exam-
Senior Member of IEEE. ples,” IEEE Veh. Technol. Mag., vol. 13, no. 2, pp. 94–101, June 2018.
    Yong Ren ([email protected]) is a doi: 10.1109/MVT.2018.2811185.
[13] M. Min, L. Xiao, Y. Chen, P. Cheng, D. Wu, and W. Zhuang, “Learning-
professor with the Department of Elec- based computation offloading for IoT devices with energy harvest-
tronics Engineering and director of the ing,” IEEE Trans. Veh. Technol., vol. 68, no. 2, pp. 1930–1941, Feb.
2019. doi: 10.1109/TVT.2018.2890685.
Complexity Engineered Systems Lab at [14] Y. He, F. R. Yu, N. Zhao, V. C. Leung, and H. Yin, “Software-defined
Tsinghua University, China. His current networks with mobile edge computing and caching for smart cities: A
big data deep reinforcement learning approach,” IEEE Commun. Mag.,
research interests include complex sys- vol. 55, no. 12, pp. 31–37, Dec. 2017. doi: 10.1109/MCOM.2017.1700246.
tems theory and its applications to the optimization and [15] J. Zhang, J. Du, Y. Shen, and J. Wang, “Dynamic computation offload-
ing with energy harvesting devices: A hybrid decision based deep
information sharing of the Internet, the IoT and ubiqui- reinforcement learning approach,” IEEE Internet Things J., early ac-
tous networks, cognitive networks, and cyber-physical cess, June 2020. doi: 10.1109/JIOT.2020.3000527.
systems. He is a Senior Member of IEEE. 

134 ||| IEEE VEHICULAR TECHNOLOGY MAGAZINE | DECEMBER 2020

Authorized licensed use limited to: Kongu Engineering College. Downloaded on February 23,2022 at 04:40:34 UTC from IEEE Xplore. Restrictions apply.

You might also like