Dynamical Resource Allocation in Edge For Trustable Internet-of-Things Systems: A Reinforcement Learning Method
Dynamical Resource Allocation in Edge For Trustable Internet-of-Things Systems: A Reinforcement Learning Method
Abstract—Edge computing (EC) is now emerging as a Index Terms—Edge computing, Internet-of-Things (IoT),
key paradigm to handle the increasing Internet-of-Things trust management, resource allocation.
(IoT) devices connected to the edge of the network. By
using the services deployed on the service provisioning
system which is made up of edge servers nearby, these I. INTRODUCTION
IoT devices are enabled to fulfill complex tasks effectively. E are now embracing an era of Internet-of-Things (IoT).
Nevertheless, it also brings challenges in trustworthiness
management. The volatile environment will make it difficult
to comply with the service-level agreement (SLA), which
W The amount of IoT devices proliferated rapidly with the
popularization of mobile phones, wearable devices, and various
is an important index of trustworthiness declared by these kinds of sensors: There are 8.6 billion IoT connections estab-
IoT services. In this article, by denoting the trustworthiness lished in the end of 2018, and the number is predicted to increase
gain with how well the SLA can comply, we first encode the to 22.3 billion according to Ericsson’s.1 To face the challenge
state of the service provisioning system and the resource
allocation scheme and model the adjustment of allocated
of trustworthy connection and low latency in the future,
resources for services as a Markov decision process (MDP). researchers pay their attention to a novel computing paradigm
Based on these, we get a trained resource allocating policy called edge computing (EC) [1]. In contrast to cloud computing,
with the help of the reinforcement learning (RL) method. EC refers to decentralized tasks at the edge of the network. In
The trained policy can always maximize the services’ trust- EC paradigm, plenty of edge servers are established close to IoT
worthiness gain by generating appropriate resource alloca-
tion schemes dynamically according to the system states.
devices to deal with the requests from these devices before they
By conducting a series of experiments on the YouTube re- are routed to the core network [2]. Therefore, the computation
quest dataset, we show that the edge service provisioning and transmission between devices and cloud server can be
system using our approach has 21.72% better performance partly migrated to edge servers or cloud server, like.2 It enables
at least compared to baselines. IoT devices to fulfill complex analysis tasks with lower latency,
higher performance, and less energy consumption [3] by taking
Manuscript received January 29, 2020; accepted February 15, 2020. advantage of the services deployed on edges. What is more, we
Date of publication February 18, 2020; date of current version May 26,
2020. This research was supported in part by the National Key Research can even establish a standalone cluster where the edge servers
and Development Program of China under Grant 2017YFB1400600, in can work co-operatively [4] to get full control and improve
part by the National Science Foundation of China under Grant 61772461 the offline scheduling capability at edge side, like.3 Besides
and Grant 61825205, in part by the Natural Science Foundation of
Zhejiang Province under Grant LR18F020003, and in part by the Key these, with the help of edge servers in proximity, applications
Research Innovation Project of Hangzhou under Grant 20182011A06. are enabled to learn from the mobile users’ real-time context
Paper no. TII-20-0416. (Corresponding authors: Peng Zhao; Honghao information [5] to improve their quality of experience [6].
Gao.)
Shuiguang Deng is with the First Affiliated Hospital, Zhejiang Uni- However, simply establishing such a service provisioning sys-
versity School of Medicine, Hangzhou 310003, China, and also with tem is not enough [7]. Because there will be another significant
the College of Computer Science and Technology, Zhejiang University, problem—trust management that should be considered in this
Hangzhou 310027, China (e-mail: [email protected]).
Zhengzhe Xiang and Jianwei Yin are with the College of Com- scenario [8]. Except for the typical issues like key escrow and
puter Science, Zhejiang University, Hangzhou 310027, China (e-mail: application distribution [9], we should also be careful about
[email protected]; [email protected]). the business agreement complying. A service-level agreement
Peng Zhao is with the First Affiliated Hospital, Zhejiang Uni-
versity School of Medicine, Hangzhou 310003, China (e-mail: (SLA) is a commitment between a service provider and a client,
[email protected]). it can contain numerous service-performance metrics with cor-
Javid Taheri is with the Department of Computer Science, Karlstad responding service-level objectives like turnaround time (TAT)
University, 65188 Karlstad, Sweden (e-mail: [email protected]).
Honghao Gao is with the Computing Center, Shanghai University,
Shanghai 200444, China (e-mail: [email protected]).
1 [Online]. Available: https://www.ericsson.com/49d1d9/assets/local/
Albert Y. Zomaya is with the School of Information Technologies,
The University of Sydney, Sydney, NSW 2006, Australia (e-mail: mobility-report/documents/2019/ericsson-mobility-report-june-2019.pdf
2 [Online]. Available: https://docs.microsoft.com/en-us/azure/iot-edge/
[email protected]).
Color versions of one or more of the figures in this article are available about-iot-edge
online at http://ieeexplore.ieee.org. 3 [Online]. Available: https://docs.kubeedge.io/en/latest/modules/edgesite.
Digital Object Identifier 10.1109/TII.2020.2974875 html
1551-3203 © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: University of Glasgow. Downloaded on June 03,2020 at 06:08:57 UTC from IEEE Xplore. Restrictions apply.
6104 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 16, NO. 9, SEPTEMBER 2020
Authorized licensed use limited to: University of Glasgow. Downloaded on June 03,2020 at 06:08:57 UTC from IEEE Xplore. Restrictions apply.
DENG et al.: DYNAMICAL RESOURCE ALLOCATION IN EDGE FOR TRUSTABLE IoT SYSTEMS: A RL METHOD 6105
Authorized licensed use limited to: University of Glasgow. Downloaded on June 03,2020 at 06:08:57 UTC from IEEE Xplore. Restrictions apply.
6106 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 16, NO. 9, SEPTEMBER 2020
2) Routing step: Second, access server hj will select the B. Dynamic Service Resource Allocation
appropriate server hk to handle this request, and then send
The resource allocation scheme P can be represented with a
the request to it (called executor server He ), then the time
matrix where the element Pj,i means the resource quota (in %)
cost is
for si on hj . Without loss of generality, here we mainly focus on
Diin the computation resource like CPU and memory, because this
TR = (2)
Bj,k kind of resource is more rare and expensive than the storage
resource, and the followers can easily expand our model with
where Bj,k describes the topology and bandwidth be- some other pluralistic functions that consider more resources.
tween hj and hk . The elements of matrix B are nonneg- Therefore, the allocated resource for service si on hj can be
ative values, while Bj,k is set +∞ if j = k, and is set 0 calculated by li,j = Pj,i Lj . As mentioned in Section I, a sig-
if hk is not accessible for hj . We can find that an appro- nificant task in the trustable service provisioning system is to
priate routing policy is needed here to make decisions. ensure the SLA of services. But as the requests are time-varying,
Therefore, we use pi,j,k to denote the probability that the a fixed service resource allocation scheme cannot work well all
server hj dispatches requests for service si to server hk the time. Therefore, it will be effective for the system to replan
to describe routing policy. In reality, developers would the resource allocation schemes in runtime. In our work, the
like to use round-robin principle to balance the workload. allocation replanning for the resource on hosts is denoted with
With this principle, requests will be routed to different the matrix At ∈ R(n+1)×(n+1) , which describes how the service
hosts in order, namely pt∗,∗,k = n+1
1
. However, it implicitly allocation scheme will be replanned at time period t. Then, the
assumes that the machines are homogeneous, and does service resource allocation scheme of the next time period can
not consider the system context. With consideration of be produced by
different processing capacities of machines, we will give
the host hj a larger probability if it has better processing P t+1 = At × P t (6)
capacity for service si , like the weighted round-robin
approach. And the weight here is the processing capacity.
μ here we also have P t+1 ∈ R(n+1)×(m+1) . What is more, as Pj,i
Namely, we can have pi,j,k = n i,kμi,q .
q=0 is represented in percentage, we will have m i=0 Pj,i = 1 and 0
3) Execution step: Then, as the executor server hk has re- ≤ Pj,i ≤ 1. This is a constraint that can be used in training of the
ceived the request, it will use the corresponding service to RL-based approach [10] —for the neural network that generates
fulfill the task. Supposing the processing capacity (e.g., P , a sof tmax layer will be added to ensure this constraint.
the ability to handle instructions measured by MIPS) for
service si on host hk is μi,k , and the workload (e.g., the
instruction number to run the program) of si is wi , the IV. ALGORITHM DESIGN
time cost is In this trustable IoT service provisioning system, as we mainly
wi care about the how SLA is complied with, especially the TAT of
TE = (3)
μi,k SLA. Thus, before developing of the dynamic service resource
allocation for distributed edges algorithm (DeraDE), we should
where we assume μi,k is proportional to li,k , the resource,
clarify the expected TAT of services, and mine the relation
e.g., CPU or memory, allocated to si on hk like [22]
between the system trustworthiness and it.
(A more sophisticated model is shown in [23], here we
By denoting the request number of service si observed on
can find that the linear assumption will be correct except t
hj (or frequency) with ωi,j , the probability that hj dispatches
for the extreme conditions). When μk is the maximum t
si to hk with pi,j,k , then the probability to get request path
processing capacity of hk and the resource limitation of
l φ = (i, j, k) is
hk is Lk , we have μi,k = Li,kk μk according to (13) in [22].
4) Backhaul step: Finally, the results will first go to the
P r {φ = (i, j, k)} = P r {s = si , Ha = hj , He = hk }
access server and finally go back to the IoT device to
finish the whole life cycle, then the time cost is = P r {He = hk |s = si , Ha = hj }
Diout Dout × P r {s = si |Ha = hj }P r{Ha = hj }
TB = + ij (4) m t
Bk,j vu t
ωi,j
t i=0 ωi,j
= pi,j,k · m t · m n t .
where Diout is the average output data size of si . In this i=0 ωi,j i=0 j=0 ωi,j
way, when i, j, k are determined, so will SRLC φ. Then (7)
TAT for φ can be represented by
As introduced in Section III-A, we can get pti,j,k =
Tφ = TA + T R + T E + T B . (5)
μk Pk,i
t
/ nq=0 μq Pq,i
t
with the definition of Pj,i . Denote the total
With requests generated any time from IoT devices, the ser- size of si ’s input and output with Di (Di = Diin + Diout ) in time
vice provisioning system will repeat the above operations. period t, the expected TAT E[T ] of this time period can then be
Authorized licensed use limited to: University of Glasgow. Downloaded on June 03,2020 at 06:08:57 UTC from IEEE Xplore. Restrictions apply.
DENG et al.: DYNAMICAL RESOURCE ALLOCATION IN EDGE FOR TRUSTABLE IoT SYSTEMS: A RL METHOD 6107
However, as there are more than one time periods in service π ∗ = arg max Jβ (π). (14)
π
provision, we should consider them synthetically. As mentioned
in Section III-B, the resource allocation scheme of the next time V. RL-BASED APPROACH
period is determined by that of the previous time period, we can
use MDP to describe this process. In our model, as the resource allocation scheme P and re-
planning action A can be vectorized to vectors in Rn +2n+1
2
MDP is a stochastic control process described with a 4-tuple
(X, Y , P , R) where X is the state space, Y is the action space, and Rnm+n+m+1 by concatenating their rows, we would pre-
Pyt (xt , xt+1 ) is the probability that action y t in state xt will fer to represent the policy π with a determined function fΠ
: Rn +2n+1 → Rnm+n+m+1 which generates actions for any
2
lead to xt+1 , and Rt is the immediate reward for applying action
y t when state is xt . Besides these, we can use the policy π to given states. To meet this demand, we use a neural network Π
describe the distribution for actions for different states with parameter θ Π to approximate this function
π y x = P r y = y x = x . (9) y t = Π xt ; θ Π . (15)
In this way, the MDP provides a mathematical framework for On the other hand, we can find that there is another item
modeling decision making in situations where outcomes are Qπ (x, y) involved in (13). Similarly, we hope that it can also be
represented with a function fQ : Rn +2n+1 × Rnm+n+m+1 →
2
partly random and partly under the control of a decision-maker.
In our model, we denote xt = P t as the state of the provisioning R. So we use another neural network Q whose parameters are
system at the beginning of time period t. With observation xt , denoted with θQ to approximate fQ
the provisioning system strategically decides an action y t =
At following a resource replanning policy π. What is more, Qπ (x, y) = Q (x, y; θQ ) . (16)
because Et [T ] can be obtained at the end of this time period, the Now we just need to clarify structures of Π and Q. To train
Trustworthiness Gain for applying action y t for state xt can be them with adequate exploration, here we use an actor–critic
denoted with structure [24] and Off-policy. The Off-policy means that the
Rt = T − Et [T ] (10) behavior policy β is different in actor module and critic module.
In our work, we add a noise N t from a random process in
where T is a threshold to make a smaller E[T ] be rewarded determining actions with Π in actor module to construct behavior
more trustworthiness gain than a larger one. Therefore, given the policy β
time period t, if the current state xt = x we can then generate a
sequence like seq = [xt , y t , xt+1 , y t+1 , . . .] according to π. The yβt = Π xt ; θ Π + N t . (17)
accumulative trustworthiness gain for seq can be represented
with Thus, given state xt at beginning of time period t, we can take
∞ the action yβt generated by β for (6). After replanning, the
Gt = Rt+1 + γRt+2 + · · · = γ k Rt+k+1 (11) reward can be represented by Rt and the current state of the
k=0
provisioning system is changed to xt+1 . Repeat this process,
tuples like (xt , yβt , Rt , xt+1 ) will be stored in a replay memory
where 0 ≤ γ ≤ 1 is called discount factor to show how the
buffer M for future training. Taking advantage of the following
subsequent trustworthiness gain will contribute to Gt . As these
Bellman equation for Qπ (xt , y t )derived from (12), we can use
sequences may be diverse, we can use the expectation of Gt to
evaluate the value for state xt = x under a policy π. Q̂π xt , Π xt ; θ Π = E Rt + γQπ xt+1 , Π xt+1 ; θ Π
Therefore, we can evaluate the global trustworthiness gain (18)
for applying action y t = y at state xt = x with an action-value as the target of Qπ (xt , Π(xt ; θ Π )). Then we can create a su-
function pervised learning task to train the network Q for data batches
∞
(batch size = N ) sampled from M
t t
Qπ (x, y) = Eπ Gt |x = x, y = y . (12)
1 2
N
k=0
LQ = Q̂π xt , Π xt ; θ Π − Qiπ xt , Π xt ; θ Π
Thus, given a behavior policy β where β(y|x) describes the N i=1
probability for applying action y in state x and ρβ the probability (19)
Authorized licensed use limited to: University of Glasgow. Downloaded on June 03,2020 at 06:08:57 UTC from IEEE Xplore. Restrictions apply.
6108 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 16, NO. 9, SEPTEMBER 2020
update θΠ with ∇θΠ Jβ (θΠ ), which can be computed by Output: Q: the action-value network;
Π: the action network;
θΠ = θΠ − ηΠ ∇θΠ Jβ (θΠ ) (21) 1: for each episode do
in which the gradient ∇θΠ Jβ (θΠ ) can be approximated [25] with 2: initialize the service provisioning system;
3: for each time period t do
∇θΠ Jβ (θΠ ) 4: select an action y t with (17);
5: get new state xt+1 with (6);
≈ Ex∼ρβ ∇θΠ Π (x; θΠ ) · ∇y Q (x, y; θQ ) |y=Π(x;θΠ ) .
(22) 6: get trustworthiness gain Rt with (10);
7: store tuple (xt , y t , Rt , xt+1 ) in buffer M ;
Simply using networks Π and Q will result in an unstable 8: sample V tuples {(xni , y ni , Rni , xni +1 ) |
training process, because the parameters of Q will be updated 1 ≤ i ≤ V } ⊂ M to compute ∇θQ LQ and update
frequently while they are involved in the gradients of both Q Q;
and Π, so we introduce target networks Π and Q , which have 9: compute ∇θπ Jβ with (22) to update Π;
the same structure and initial parameters with Π and Q but are 10:
softly update Q and Π with (23);
updated softly with 11: end for
θΠ = τ θΠ + (1 − τ )θΠ 12: end for
Authorized licensed use limited to: University of Glasgow. Downloaded on June 03,2020 at 06:08:57 UTC from IEEE Xplore. Restrictions apply.
DENG et al.: DYNAMICAL RESOURCE ALLOCATION IN EDGE FOR TRUSTABLE IoT SYSTEMS: A RL METHOD 6109
TABLE I
EXAMPLE OF RECORD IN YouTube TRAFFIC DATASET
Fig. 4. (a) Running state of edge servers. (b) Service requests distribution. (c) Results of different approaches. (e) Training process of the approach
(f) Impact of network structure.
due to the insensitivity and the lack of well adopted platforms 3) Long shortterm memory algorithm (LSTM): LSTM was
and datasets, we generate the configurations of services and first introduced for solving long-term dependency prob-
servers in a synthetic way. To make the data as close to reality as lems [29]. It has a form of a chain of repeating modules
possible, we refer to the parameters of images on 4 for simulation of neural networks, and it has the ability to remove or
data generation. Fig. 4(a) shows the service requests on different add information to the cell state, carefully regulated by
servers, we can find that the workload on different servers are structures called gates. By using the knowledge controlled
quite imbalanced. If the resource of server 63.22.65.94 can by these gates, it works tremendously well on a large va-
be used to process service requests, as what we have modeled riety of problems, and are now widely used in time series
the cooperation of servers in this work, the performance will be analysis tasks like signal prediction, stock prediction, or
dramatically improved. even traffic prediction [30].
2) Performance Comparison: In this section, we will com-
B. Experiments and Analysis pare our approach with the baselines. As introduced in
1) Baselines Introduction: Traditionally, there are several
Section VI-B1, with the request data of the former N - 1 days
approaches to solve this resource allocation problem. They in consecutive days as training data and that of the last day
mainly take advantage of the locality of service requests or in the consecutive days as test data (e.g., we will use the data
try to get accurate request frequency of the next time period. of March 13, 2008–March 17, 2008 as training data, and the
Therefore, we choose the following representative algorithms data of March 18, 2008 as test data), we first use the LF to
as baselines: get the local service request frequency and allocate resources
1) Locality frequency algorithm (LF): This algorithm com- to services with this frequency percentage; and second, we use
bines the ideas of least recently used algorithm and stable the LIRS to schedule the priority to be deployed of services
frequency assumption, which counts the frequency of of different time periods (1 h = 1 time period) and allocate
service requests in different time periods and assumes the more resource to services with higher priorities; third, we use
frequencies of requests for different services on different the LSTM to predict the future service request frequency and
servers of the next time period are approximately equal allocate resource to services with their frequency percentage as
to the former ones. Therefore, in this model we have what we do with the LF; finally, we use the trained dynamic
service resource allocation policy with our approach to allocate
t
ωi,j resources.
lf t+1
i,j ≈ m n (24)
p=0 p=0
t
ωq,p By denoting imprvt (X) as the improvement of our approach
compared with baseline X (X ∈ {LF, LIRS, LSTM}) in time
2) Low inter-reference recency set algorithm (LIRS) [28]:
period t
In the LIRS algorithm, it utilizes the reuse distance of
a page, which is the number of distinct pages accessed
E[T ]tX − E[T ]tour
between two consecutive references of the page to quan- imprvt (X) = (25)
tify locality [28]. It maintains a hot set and a cold set to E[T ]tX
replace services in cache dynamically.
then we have imprvmin (X) = mint=0→T imprvt (X) to de-
4 [Online]. Available: https://hub.docker.com/DockerHub scribe the how much can our approach can improve at least
Authorized licensed use limited to: University of Glasgow. Downloaded on June 03,2020 at 06:08:57 UTC from IEEE Xplore. Restrictions apply.
6110 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 16, NO. 9, SEPTEMBER 2020
TABLE II
PERFORMANCE COMPARISON
Authorized licensed use limited to: University of Glasgow. Downloaded on June 03,2020 at 06:08:57 UTC from IEEE Xplore. Restrictions apply.
DENG et al.: DYNAMICAL RESOURCE ALLOCATION IN EDGE FOR TRUSTABLE IoT SYSTEMS: A RL METHOD 6111
Fig. 7. (a) Impact of M2D transmission rate. (b) Impact of M2M transmission rate.
server has a better processing capacity than the edge servers, make the service too complex if they want to make full use of the
the execution time will be optimized if requests are routed to resources on edges. If they have to fulfill complex tasks on edges,
h0 . However, when D̂in/out becomes bigger, the advantage of they would better keep them running with good processing
processing capacity will not help h0 because the transmission capacities.
cost starts to dominate the result. And we can also find that some Quality of network communication: As mentioned in the
of the resources in h0 will become idle with the increasing of former section, when service input and output data size becomes
D̂in/out . The processing capacity and service workload determine large, the transmission cost will become the main factor that
the computation cost of service requests, because the former one may impact the performance. Therefore, we will explore two
describes how fast can a host to fulfill a task and the latter one types of communications involved in this system. The first one is
describes how complex the task of service can be. By changing M2D communication—the IoT devices will connect to the edge
the average processing capacity of hosts (μ̂) from 500 to 2500, servers via the wireless network. By changing BM2D from 100
we can get the curve in Fig. 6. We can find that the bigger to 300, we can find that the average response time decreases. At
the μ̂ is, the better the performance will be. We can also find the same time, the resource allocation schemes do not change
that with the increasing of μ̂, the data throughput and request much. This is because the request comes from and goes back
flow reduce in h0 while those of edge servers increase. This to the IoT device, simply changing BM2D will only impact
is because the value of μedge 1
− μcloud
1
becomes small with the the transmission time of M2D communication, and the M2D
increasing of μ̂, so that the difference of execution cost on communication of different hosts and IoT devices are similar.
the cloud server and edge server becomes not important. On However, things will be different if we focus on the value of
the other hand, by changing the average service workload (ŵ) BM2M . By changing BM2M from 100 to 300, we can find that
from 50 to 250, we can find that the average response time the average response time decreases but the resource allocation
increases. This is reasonable because when the task becomes schemes and properties of hosts vary with the changing. Though
more complex, it will cost more time in execution. Besides the cloud server still has the worst communication qualities, but
this, we can find the data throughput and request flow of h0 the data throughput and request flow of h0 increases with the
becomes bigger. This is because the increasing workload makes
1
decreasing of Bedge-edge − Bedge-cloud
1
. This is because the cost of
edge servers overloaded, and they have to dispatch the requests dispatching requests to cloud server decreases with the increas-
to the cloud server. The conclusion of this result will be quite ing of BM2M —if the long-distance connection is not a problem
valuable, because it teaches the developers that they must not anymore, the cloud server will absolutely be the best choice.
Authorized licensed use limited to: University of Glasgow. Downloaded on June 03,2020 at 06:08:57 UTC from IEEE Xplore. Restrictions apply.
6112 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 16, NO. 9, SEPTEMBER 2020
VII. CONCLUSION [14] L. Tianze, W. Muqing, Z. Min, and L. Wenxing, “An overhead-optimizing
task scheduling strategy for ad-hoc based mobile edge computing,” IEEE
In this article, we tried to improve the trustworthiness of Access, vol. 5, pp. 5609–5622, 2017.
services in IoT environment with EC paradigm from the per- [15] S. Sardellitti, G. Scutari, and S. Barbarossa, “Joint optimization of radio
and computational resources for multicell mobile-edge computing,” IEEE
spective of how the SLA was complied with. We investigated the Trans. Signal Inf. Process. Over Netw., vol. 1, no. 2, pp. 89–103, Jun. 2015.
resource limitations of edge servers and resource consumptions [16] C. You, K. Huang, H. Chae, and B.-H. Kim, “Energy-efficient resource
of services, and explored the relationship between the allocated allocation for mobile-edge computation offloading,” IEEE Trans. Wireless
Commun., vol. 16, no. 3, pp. 1397–1411, Mar. 2017.
resources for service and their expected trustworthiness gain. [17] X. Zhang and Q. Zhu, “Spectrum efficiency maximization using primal-
Then, we highlighted the dynamical service resource allocation dual adaptive algorithm for distributed mobile devices caching over edge
problem, and proposed a RL-based approach to determine the computing networks,” in Proc. 51st Annu. Conf. Inf. Sci. Syst., 2017,
pp. 1–6.
service resource allocation scheme in different time periods by [18] L. Yang, J. Cao, G. Liang, and X. Han, “Cost aware service placement and
modeling the resource allocation problem with a MDP. Finally, load dispatching in mobile cloud systems,” IEEE Trans. Comput., vol. 65,
we conducted a series of experiments to compare this approach no. 5, pp. 1440–1452, May 2016.
[19] J. Chase and D. Niyato, “Joint optimization of resource provisioning in
with other baselines and showed the factors that may affect the cloud computing,” IEEE Trans. Services Comput., vol. 10, no. 3, pp. 396–
performance of the service provisioning system. However, as 409, May/Jun. 2017.
the modeling of the service provisioning system in this arti- [20] M. Jia, W. Liang, Z. Xu, and M. Huang, “Cloudlet load balancing in
wireless metropolitan area networks,” in Proc. 35th Annu. IEEE Int. Conf.
cle is simple and idealized, there are still lots of issues that Comput. Commun., 2016, pp. 1–9.
deserve further investigation in this problem. For example, we [21] T. X. Tran and D. Pompili, “Joint task offloading and resource allocation
will consider the cyclical fluctuation and seasonal fluctuation in for multi-server mobile-edge computing networks,” IEEE Trans. Veh.
Technol., vol. 68, no. 1, pp. 856–868, Jan. 2019.
describing the state of the system, and we will try to use some [22] Y. Chen, C. Lin, J. Huang, X. Xiang, and X. S. Shen, “Energy efficient
more sophisticated service running model, and consider other scheduling and management for large-scale services computing systems,”
useful SLA attributes. Besides these, because services can work IEEE Trans. Services Comput., vol. 10, no. 2, pp. 217–230, Mar./Apr. 2015.
[23] S. R. Das and R. M. Fujimoto, “An empirical evaluation of performance-
together to make a composite service, the cooperation of services memory trade-offs in time warp,” IEEE Trans. Parallel Distrib. Syst., vol. 8,
is also important in further research. no. 2, pp. 210–224, Feb. 1997.
[24] R. Lowe, Y. Wu, A. Tamar, J. Harb, O. P. Abbeel, and I. Mordatch, “Multi-
agent actor-critic for mixed cooperative-competitive environments,” in
Proc. Advances Neural Inf. Process. Syst., 2017, pp. 6379–6390.
[25] A. Vaswani et al., “Attention is all you need,” in Proc. Advances Neural
REFERENCES Inf. Process. Syst., 2017, pp. 5998–6008.
[26] M. Zink, K. Suh, Y. Gu, and J. Kurose, “Watch global, cache local:
[1] W. Shi, J. Cao, Q. Zhang, Y. Li, and L. Xu, “Edge computing: Vision and
YouTube network traffic at a campus network: Measurements and im-
challenges,” IEEE Internet Things J., vol. 3, no. 5, pp. 637–646, Oct. 2016.
plications,” in Proc. SPIE - Int. Soc. Opt. Eng., vol. 6818, Jan. 2008, doi:
[2] H. Wu, S. Deng, W. Li, M. Fu, J. Yin, and A. Y. Zomaya, “Service selection
10.1117/12.774903.
for composition in mobile edge computing systems,” in Proc. IEEE Int.
[27] Y.-Y. Fanjiang, Y. Syu, S.-P. Ma, and J.-Y. Kuo, “An overview and classifi-
Conf. Web Services, 2018, pp. 355–358.
cation of service description approaches in automated service composition
[3] S. Deng, H. Wu, W. Tan, Z. Xiang, and Z. Wu, “Mobile service selection for
research,” IEEE Trans. Services Comput., vol. 10, no. 2, pp. 176–189,
composition: An energy consumption perspective,” IEEE Trans. Autom.
Mar./Apr. 2017.
Sci. Eng., vol. 14, no. 3, pp. 1478–1490, Jul. 2017.
[28] W. Liu, F. Shi, and W. Du, “An LIRS-based replica replacement strategy
[4] S. Deng, Z. Xiang, J. Yin, J. Taheri, and A. Y. Zomaya, “Composition-
for data-intensive applications,” in Proc. IEEE 10th Int. Conf. Trust, Secur.
driven IoT service provisioning in distributed edges,” IEEE Access, vol. 6,
Privacy Comput. Commun., 2011, pp. 1381–1386.
pp. 54258–54269, 2018.
[29] K. Greff, R. K. Srivastava, J. Koutník, B. R. Steunebrink, and J. Schmidhu-
[5] Z. Xiang, S. Deng, S. Liu, B. Cao, and J. Yin, “Camer: A context-aware
ber, “LSTM: A search space odyssey,” IEEE Trans. Neural Netw. Learn.
mobile service recommendation system,” in Proc. IEEE Int. Conf. Web
Syst., vol. 28, no. 10, pp. 2222–2232, Oct. 2017.
Services, 2016, pp. 292–299.
[30] J. Mackenzie, J. F. Roddick, and R. Zito, “An evaluation of HTM and
[6] X. He, K. Wang, H. Huang, and B. Liu, “QoE-driven big data architecture
LSTM for short-term arterial traffic flow prediction,” IEEE Trans. Intell.
for smart city,” IEEE Commun. Mag., vol. 56, no. 2, pp. 88–93, Feb. 2018.
Transp. Syst., vol. 20, no. 5, pp. 1847–1857, May 2018.
[7] T. Wang, G. Zhang, A. Liu, M. Z. A. Bhuiyan, and Q. Jin, “A secure IoT
[31] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
service architecture with an efficient balance dynamics based on cloud and
recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016,
edge computing,” IEEE Internet Things J., vol. 6, no. 3, pp. 4831–4843,
pp. 770–778.
Jun. 2019.
[8] H. Tao et al., “TrustData: Trustworthy and secured data collection for event Shuiguang Deng (Senior Member, IEEE) re-
detection in industrial cyber-physical system,” IEEE Trans. Ind. Informat., ceived the B.S. and Ph.D. degrees, both in
vol. 16, no. 5, pp. 3311–3321, May 2020. computer science, from the College of Com-
[9] U. Jayasinghe, G. M. Lee, T.-W. Um, and Q. Shi, “Machine learning puter Science and Technology, Zhejiang Uni-
based trust computational model for IoT services,” IEEE Trans. Sustain. versity, Hangzhou, China, in 2002 and 2007,
Comput., vol. 4, no. 1, pp. 39–52, Jan./Mar. 2018. respectively.
[10] Z. Xu, Y. Wang, J. Tang, J. Wang, and M. C. Gursoy, “A deep reinforcement He worked with the Massachusetts Institute
learning based framework for power-efficient resource allocation in cloud of Technology in 2014 and Stanford University
rans,” in Proc. IEEE Int. Conf. Commun., 2017, pp. 1–6. in 2015 as a Visiting Scholar. He is currently
[11] X. He, K. Wang, H. Huang, T. Miyazaki, Y. Wang, and S. Guo, “Green re- a Full Professor of Computer Science with the
source allocation based on deep reinforcement learning in content-centric College of Computer Science and Technology,
IoT,” IEEE Trans. Emerg. Topics Comput., vol. PP, no. 99, 2018, doi: Zhejiang University. Up till now, he has authored or coauthored more
10.1109/TETC.2018.2805718. than 100 papers in journals and refereed conferences. His research
[12] L. Qi, M. Zhou, and W. Luan, “A two-level traffic light control strategy interests include edge computing, service computing, mobile computing,
for preventing incident-based urban traffic congestion,” IEEE Trans. Intell. and business process management.
Transp. Syst., vol. 19, no. 1, pp. 13–24, Jan. 2018. Dr. Deng serves as the Associate Editor for the journal IEEE ACCESS
[13] X.-S. Hua, “The city brain: Towards real-time search for the real-world,” and IET Cyber-Physical Systems: Theory & Applications. In 2018, he
in Proc. 41st Int. Assoc. Comput. Machinery’s Special Interest Group. Inf. was granted the Rising Star Award by IEEE TCSVC. He is a Fellow
Retrieval Conf. Res. Develop. Inf. Retrieval, 2018, pp. 1343–1344. of IET.
Authorized licensed use limited to: University of Glasgow. Downloaded on June 03,2020 at 06:08:57 UTC from IEEE Xplore. Restrictions apply.
DENG et al.: DYNAMICAL RESOURCE ALLOCATION IN EDGE FOR TRUSTABLE IoT SYSTEMS: A RL METHOD 6113
Zhengzhe Xiang received the bachelor’s de- Honghao Gao (Senior Member, IEEE) received
gree in computer science and technology the Ph.D. degree in computer science, in 2012.
in 2013 from Zhejiang University, Hangzhou, He started his academic career with Shang-
China, where he is currently working toward the hai University in 2012. He is currently a Distin-
Ph.D. degree in computer science with the Col- guished Professor in computer science with the
lege of Computer Science. Key Laboratory of Complex Systems Modeling
His research interests include the fields of and Simulation, Ministry of Education, China.
service computing, cloud computing, and edge His research interests include service comput-
computing. ing, model checking-based software verification,
wireless network, and intelligent medical image
processing, Hangzhou, China.
Dr. Gao is an Institution of Engineering and Technology (IET) Fellow,
British Computer Society (BCS) Fellow, European Alliance for Innovation
(EAI) Fellow, China Computer Federation (CCF) Senior Member, and
Chinese Association For Artificial Intelligence (CAAI) Senior Member.
Authorized licensed use limited to: University of Glasgow. Downloaded on June 03,2020 at 06:08:57 UTC from IEEE Xplore. Restrictions apply.