0% found this document useful (0 votes)
18 views14 pages

Task Offloading and Resource Allocation For Satellite-Terrestrial Integrated Networks

This document discusses the challenges and solutions for task offloading and resource allocation in satellite-terrestrial integrated networks, particularly for the Internet of Remote Things (IoRT). It proposes a hybrid cloud and satellite multiaccess edge computing architecture to optimize energy consumption and delay while enhancing computational efficiency through a multiagent reinforcement learning algorithm. The study emphasizes the importance of intersatellite collaboration and presents a novel approach to improve network resource management for future communication systems.

Uploaded by

Sanjana Agarwal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views14 pages

Task Offloading and Resource Allocation For Satellite-Terrestrial Integrated Networks

This document discusses the challenges and solutions for task offloading and resource allocation in satellite-terrestrial integrated networks, particularly for the Internet of Remote Things (IoRT). It proposes a hybrid cloud and satellite multiaccess edge computing architecture to optimize energy consumption and delay while enhancing computational efficiency through a multiagent reinforcement learning algorithm. The study emphasizes the importance of intersatellite collaboration and presents a novel approach to improve network resource management for future communication systems.

Uploaded by

Sanjana Agarwal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

262 IEEE INTERNET OF THINGS JOURNAL, VOL. 12, NO.

1, 1 JANUARY 2025

Task Offloading and Resource Allocation for


Satellite-Terrestrial Integrated Networks
Ting Lyu, Yueqiang Xu, Feifei Liu, Haitao Xu , Member, IEEE, and Zhu Han , Fellow, IEEE

Abstract—Low-Earth orbit (LEO) satellite networks can difficulty in fully covering some complex terrains, such as
achieve global network coverage without geographical restric- mountains and oceans, are no longer able to satisfy the global
tions and are essential to the future communication network. demand for ubiquitous connectivity for 6G and beyond [1], [2].
In this article, we study the computing offloading problem in a
satellite-terrestrial integrated network for the Internet of Remote To overcome the related challenges, satellite communication
Things (IoRT), which aims to reduce the total cost (weighted networks (SCNs) have emerged as a potential core technology.
sum of energy consumption and delay), and jointly offload node As an adequate extension of terrestrial cellular networks,
selection, offloading ratio, and computational resource allocation SCN can provide reliable and stable communication services
to achieve the dynamic management of network resources. regardless of geographical conditions [3]. Expanding coverage
First, we propose a hybrid cloud and satellite multilayer
multiaccess edge computing (MEC) network architecture that by providing services via satellite networks to users in areas
can provide heterogeneous computing resources to terrestrial inaccessible to terrestrial cellular networks or where infras-
users. Subsequently, since the problem under consideration is tructure has been compromised. Terrestrial users can connect
a mixed-integer nonlinear programming problem, we propose with a terrestrial cloud center through a satellite network.
a computing offloading algorithm for multiagent reinforcement However, with the development of wireless communication
learning, which is an integration of double deep Q learning
(DDQN) and deep deterministic policy gradient (DDPG). The technology, the demand of ground users for services, such as
algorithm can learn the optimal policy for actions containing a augmented reality and virtual reality has increased dramat-
mixture of discrete and continuous variables. Finally, an optimal ically. These services require high computational efficiency,
computational resource allocation scheme is proposed to improve and offloading tasks to the cloud may not satisfy real-time
the task computation efficiency. Simulation results show that requirements. To reduce the transmission delay of tasks, multi-
the proposed task offloading and resource allocation scheme can
achieve reasonable scheduling of computational tasks and optimal access edge computing (MEC) [4], [6], [7] has been proposed.
allocation of computational resources, reducing the cost of task Efficient and flexible computing services are provided by
computation. utilizing the computing resources at the network’s edge.
Index Terms—Computing offloading, deep reinforce- Therefore, low-Earth orbit (LEO) satellite edge computing
ment learning (DRL), multiagent, resource allocation, networks emerged [5], [8], [9]. Deploying MEC servers on
satellite-terrestrial integrated network (STIN). LEO satellites allows ground users to transfer computing tasks
to the satellite for execution, which can significantly reduce
latency and energy consumption [10], [11], [12].
I. I NTRODUCTION
ITH the unprecedented development of the Internet
W of Things (IoT), video transmission, and other emerg-
ing applications, traditional terrestrial networks, which have
A. Related Work
Satellite MEC (SMEC) networks have received extensive
attention from academia and industry and are considered an
Received 22 February 2024; revised 17 June 2024 and 22 August 2024; important research direction for future networks [13].
accepted 9 September 2024. Date of publication 23 September 2024; date 1) Network Architecture: Cui et al. [14] proposed a hybrid
of current version 25 December 2024. This work was supported in part geostationary Earth orbit (GEO) and LEO network architecture
by the Hebei Natural Science Foundation under Grant F2022402001; in
part by the National Natural Science Foundation of China under Grant to solve the load imbalance problem of the LEO network.
62341129 and Grant 6240010820; in part by the China Postdoctoral Science User tasks can be processed in LEO satellites or forwarded
Foundation under Grant 2024M750181; in part by the Postdoctoral Fellowship to ground gateways via GEO satellites. Di et al. [15] and
Program of CPSF under Grant GZB20240061; in part by the NSF under
Grant CNS-2107216, Grant CNS-2128368, Grant CMMI-2222810, and Grant Deng et al. [16] proposed a satellite-terrestrial network archi-
ECCS-2302469; in part by the US Department of Transportation, Toyota; tecture with efficient data offloading, considering a satellite
and in part by the Amazon and Japan Science and Technology Agency network combined with a terrestrial network, where the satel-
(JST) Adopting Sustainable Partnerships for Innovative Research Ecosystem
(ASPIRE) under Grant JPMJAP2326. (Corresponding author: Haitao Xu.) lite has to forward the terrestrial user’s data to a terrestrial
Ting Lyu, Yueqiang Xu, Feifei Liu, and Haitao Xu are with the gateway or terrestrial base station (BS) for processing. Cheng
Department of Communication Engineering, University of Science and et al. [17] proposed a space-air-ground integrated network
Technology Beijing, Beijing 100083, China (e-mail: [email protected];
[email protected]; [email protected]; [email protected]). (SAGIN) edge/cloud computing architecture to provide com-
Zhu Han is with the Department of Electrical and Computer Engineering, puting services for computationally intensive tasks. Kim
University of Houston, Houston, TX 77004 USA, and also with the et al. [18] proposed a satellite edge computing architecture
Department of Computer Science and Engineering, Kyung Hee University,
Seoul 446-701, South Korea (e-mail: [email protected]). to support IoT and improve network efficiency by optimizing
Digital Object Identifier 10.1109/JIOT.2024.3465656 the network slice scheduling scheme. Zhang et al. [19]
2327-4662 
c 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Birla Institute of Technology and Science. Downloaded on February 03,2025 at 17:12:23 UTC from IEEE Xplore. Restrictions apply.
LYU et al.: TASK OFFLOADING AND RESOURCE ALLOCATION 263

proposed a software-defined space-ground-vehicle integrated network to provide efficient computing services to terrestrial
network (SD-SGVIN) architecture for transport environments users at low cost.
where satellites improve network coverage and facilitate the This article studies the dynamic computing offloading problem
interconnection of road networks. based on hybrid strategies. In addition, a multiagent reinforce-
2) Resource Allocation: Song et al. [20] proposed a new ment learning-based computing offloading scheme is proposed,
terrestrial satellite IoT MEC framework to provide task- considering the practical environment of different application
processing capabilities for mobile devices in remote IoT. requirements and random task generation. Specifically, for
An energy-efficient computational offloading and resource the computing offloading requirements of varying computing
allocation algorithm is proposed to minimize the energy tasks, we propose an algorithm based on DRL to achieve an
consumption of all mobile devices. Cao et al. [21] investigated optimal solution for task offloading node selection and task
an LEOS edge-assisted multilayer MEC system to enhance the offloading ratio. Then, we design an optimal computational
coverage of a multilayer MEC system and solve the computing resource allocation scheme to achieve the optimal allocation of
problem for users in congested and isolated areas. To achieve computational resources in the dynamic environment and reduce
high transmission rates and communication robustness in the task computation cost. The main contributions of this article
satellite dynamic network environments, an optimal offloading are summarized as follows.
scheme based on deep reinforcement learning (DRL) [22] 1) A multilayer satellite-terrestrial network edge computing
has been proposed to accomplish computationally intensive architecture has been proposed to provide computing
and delay-sensitive tasks through joint offloading paths and services for user tasks with different computational
resource allocation. Gao et al. [23] considered satellite edge requirements. Collaborative processing of computing
computing with virtual network functions (VNFs) technology tasks is achieved through hybrid clouds and satellites,
and devise a potential game-based resource allocation method where intersatellite computing resources can be shared
for NFV placement in satellite edge computing to maximize through intersatellite links (ISLs). Computing tasks gen-
network benefits. A collaborative computational offloading erated by ground users can be computed locally, on
strategy for SMEC networks is proposed by integrating com- low-orbit satellites or on the cloud servers.
putational resources within the coverage of LEO satellites 2) To address the resource requirements of different com-
to minimize user-perceived delays. Zhou et al. [24] studied puting tasks, this article fully uses the computing
the computational task scheduling problem in SAGIN. They resources of the cloud and individual satellites to provide
proposed a novel deep risk-sensitive reinforcement learning heterogeneous computing services for ground users.
solving algorithm that minimizes offloading and computational Then, the objective optimization problem studied in this
delays for all tasks. article is to make the task computation spend the least
Although the above work has studied the network service while satisfying the computation delay and resource
architecture and resource allocation scheme of SMEC, it has constraints to ensure the completion of the task.
not fully considered the intersatellite collaboration. Different 3) Considering the service demands of different computing
from the above research works, this article designs a multilayer tasks and solving the joint decision problem of discrete
network service architecture and proposes a multiagent rein- decision (task offloading node selection) and continu-
forcement learning-based approach to jointly optimize the ous variable (task offloading ratio) at the same time,
resource allocation strategy and enhance computational col- this article proposes a multiagent computing offloading
laboration among satellites. strategy with joint deep Q network (DQN) and deep
deterministic policy gradient (DDPG). Extensive simula-
tion results show that the scheme has good convergence
performance and achieves better results than the bench-
B. Research Motivation and Contributions mark algorithm.
The seamless integration of ground stations and LEOS The remainder of this article is organized as follows.
edges enables increased availability, continuity, and scal- Section II presents the system model and the associated
ability of network services, providing solutions for the problem. In Section III, we offer the algorithm based on
next-generation wireless ecosystem. In this article, the MEC multiagent DRL. In Section IV, we show simulation results
system concept plays an essential role in satellite-terrestrial and discuss and analyze the corresponding results. We con-
integrated networks (STINs) by providing low computational clude this article in Section V.
latency, high energy efficiency, and broad coverage. However,
how do you design a reasonable network service architecture II. S YSTEM M ODEL
in an LEO edge computing network? At the same time, In this section, we propose a multisatellite hybrid terres-
it is important to conceive an efficient network resource trial cloud-based collaborative computation offload network,
management scheme for the designed framework to achieve including a network, computing, and cost model. Table I lists
a rational allocation of overall network resources to achieve the main notations used in this article.
efficient network access services. Based on the above analyses,
we combine the advantages of satellite and edge computing A. Network Model
in this article to design a multilayer MEC system for STIN, In this article, we consider an STIN in remote areas to solve
where the satellite network can complement the terrestrial the emergency services problem in remote areas, as shown in

Authorized licensed use limited to: Birla Institute of Technology and Science. Downloaded on February 03,2025 at 17:12:23 UTC from IEEE Xplore. Restrictions apply.
264 IEEE INTERNET OF THINGS JOURNAL, VOL. 12, NO. 1, 1 JANUARY 2025

TABLE I
L IST OF M AIN N OTATIONS

Fig. 1. System model.

high-complexity tasks to the cloud servers for processing via


LEO satellites. For communication between ground users and
LEO satellites, LEO satellites allocate the available spectrum
resources equally to all users. Similarly [23], [28], we consider
the quasi-static environment of the model proposed in this
article, i.e., the overall network topology of the satellite
network remains unchanged during the computational service.
Fig. 1. The proposed network comprises three main terminals: We assume that the interstellar link adopts laser communica-
1) the ground terminals; 2) the satellite terminal; and 3) the tion. The data rate of laser communication is very high (more
ground cloud. The set of all satellites and ground users can than 1 Gb/s), much larger than the size of the computational
be denoted as L = {1, 2, . . . , L} and I = {1, 2, . . . , I}, where task. In addition, network state information is transmitted via
the set of satellites L includes the set of LEO satellites and ISLs and satellite feeder links. Therefore, the transmission
the set of MEO satellites. LEO satellites establish ISL with delay between two neighboring satellites is ignored [29], and
surrounding satellites [25]. Satellites can communicate with only the propagation delay is considered. A satellite feeder
each other via ISLs. link with a high data rate can be achieved by applying a large
The ground devices are located in remote areas, so there directional antenna and wide bandwidth in the Ku/Ka band.
is no terrestrial communication network service. We consider
that ground user equipment can establish communication B. Computing Model
connections directly with LEO satellites [26]. It is assumed A tuple ψi = (Di , Ci , Timax ) is used to model a computing
that each ground user can access at most one satellite, but task [30]. Here, Di denotes the input data size (in bits)
each satellite can establish links to multiple users. The satellite of a computing task ψi , Ci denotes the amount of CPU
terminals are composed of the LEO and MEO satellites. We cycles required to complete the task ψi , and Timax denotes
assume that each satellite is equipped with an MEC server, the maximum tolerated delay of task ψi . In this article, we
which can be considered an edge computing node. Satellites consider computing tasks that are splittable [31]. Let αi be the
can provide computing services to ground users, because of proportion of task of user i that is offloaded to the associated
the limited computing power of ground devices, satellites are satellite L for computation, i.e., (1 − αi )Di is the fraction of
needed to assist with computationally intensive tasks. In this tasks computed at the user’s device.
article, we assume that tasks are splittable [27]. We consider In addition, the uploaded data is much larger than the down-
that the input size of the ground user computation task changes loaded data [32]. Therefore, the download of the calculated
over time. In other words, the computational demands of results is ignored in this article. For the whole network, the
ground users change over time. computing capability is gradually increasing at every level,
In the architecture proposed in this article, the computa- from the terminal to the cloud. Different computing capabili-
tional power of each plane is progressively increased from ties will directly affect the demand for real-time performance
the terminal to the cloud. Ground users can choose low- of computing tasks. To ensure the completion of the computing
complexity tasks to be executed locally. Ground devices tasks, the LEO satellites have the option of task migration or
can offload tasks to satellites via wireless links or forward onboard task execution as follows.

Authorized licensed use limited to: Birla Institute of Technology and Science. Downloaded on February 03,2025 at 17:12:23 UTC from IEEE Xplore. Restrictions apply.
LYU et al.: TASK OFFLOADING AND RESOURCE ALLOCATION 265

1) Onboard Task Process: We assume that the MEC servers where ds denotes the propagation delay between the user and
of the LEO satellites can be used for the simultaneous the satellite, tc and ttran = Di /ri denote the computation
processing of multiple tasks, but each task may be time and the transmission time, respectively. Consider that the
allocated different computational resources. computational resources allocated by the satellite or the cloud
2) Task Migration: Since the MEC server of each LEO for the ground user i ∈ I are fie , and the computational time
satellite has limited computing capacities, it may only be tc can be expressed as
able to complete some of the tasks within the required αi Ci
time. Due to the existence of ISL, when the computa- tc = . (4)
fie
tional capacity of the access satellite L cannot satisfy
the demand of the ground equipment, the task can be Similarly, satellite Lm is selected for task processing, where
passed to other satellites for collaborative computation. k = m. The task delay can be expressed as
Therefore, the satellite L will offload the tasks to a
nearby LEO satellite, MEO satellite, or the ground cloud Tme = ttran + tc + ds (5)
server. The relationship of computing capacities among where ds denotes the total propagation delay from the user to
the LEO satellites, the MEO satellites, and the cloud the satellite Lm . When intersatellite cooperative computation
server is given by FLEO < FMEO  Fcloud , where FLE0 , tasks are required, the task transmission link involves x-hop
FMEO , and Fcloud denote the computing capabilities of ISL. Therefore, the propagation delay ds can be expressed as
the LEO satellite, the computing capability of the MEO
s
satellite, and the computing capability in the cloud, ds = ds + ζ (6)
respectively. c
In summary, there are three computational schemes for where s denotes the distance between satellites, c denotes
uploading the computational task of ground user i to LEO the propagation speed of the light, and ζ denotes the number
satellite k. Let xik denote whether the computational task of of hops of the ISL. This article considers 1-hop ISLs, i.e.,
ground user i is processed by satellite k, where xik = 1 means ζ = 1. Let rk denote the transmission rate between satellite k
that the computational task of user i is computed at satellite k; and the ground cloud. If the ground cloud is selected for task
otherwise, xik = 0. Let yim denote whether the computational processing, the task latency is expressed as
task of user i is processed by the associated LEO satellite m, e
Tcloud = ttran + tc + dc + tcloud (7)
where yi,m = 1 denoted the computational task is offloaded
to the LEO satellite m for computation; otherwise,yi,m = 0. where dc denotes the total propagation delay from the user to
Similarly, let zi ∈ 0, 1 denote whether the computational task the ground cloud, and tcloud = Di /rk denotes the transmission
of user i is processed by the terrestrial cloud server, where time between satellite k and the ground cloud.
zi = 1 denotes that the computational task ψi is executed by 2) Energy Cost: The local energy consumption computed
the cloud server; otherwise, zi = 0. Considering that each by user i to complete the task is defined as
computational task has only one offloading node selection  2
strategy Ai = {xik , yim , zi } at each time slot, the computational Eiloc = κ fil Ci (1 − αi ) (8)
strategy of ground user i satisfies the constraint as
 where κ is a constant that depends on the chip architecture of
xik + yim + zi = 1, i ∈ I. (1) the user’s device. The energy consumption of ground device i
m∈L\{k} to offload the remaining tasks to the satellite is Eitran = pi titran ,
where pi denotes the transmission power of user i. Then, the
C. Cost Model total energy consumption of device i is Eiall = Eitran + Eiloc .
In the considered satellite networks, the cost for processing
the computing tasks is mainly composed of the delay cost and D. Problem Formulation
the energy cost. The completion time of the offloading part of the task
1) Delay Cost: The delays in calculating the tasks mainly depends mainly on the offloading strategy and the allocated
depend on the task’s offloading decisions. The local computa- computing resources, which can be expressed as
tional latency of user i can be expressed as  
Tiall = max xik Tke + yim Tme + zi Tcloud
e
, Tiloc , i ∈ I. (9)
(1 − αi )Ci
Tiloc = ,i ∈ I (2)
fil Execution delay directly determines the time to mission
completion. Energy consumption is important because ground
where fil denotes the local computing (LC) resources allocated devices have limited battery energy. By the weighted sum
by the user i. method, the cost of ground device i is defined as a weighted
If the tasks are processed at the LEO satellites Lk , the sum of energy consumption and delay, which is expressed as
computation time of tasks in the LEO satellite needs to be
taken into account. Let ri denote the transmission rate from Costi = w1 Eiall + w2 Tiall (10)
user i to satellite k. Then, the delay is given by
where w1 , w2 ∈ [0, 1] denote the weights of device energy
Tke = ttran + tc + ds (3) consumption and task latency, respectively.

Authorized licensed use limited to: Birla Institute of Technology and Science. Downloaded on February 03,2025 at 17:12:23 UTC from IEEE Xplore. Restrictions apply.
266 IEEE INTERNET OF THINGS JOURNAL, VOL. 12, NO. 1, 1 JANUARY 2025

In this work, we propose to minimize the cost of all solved by reinforcement learning methods. The MDP’s state
ground devices while satisfying the maximum tolerable delay space, action space, and reward function are shown below.
constraint. We formulate the joint optimization problem of 1) State: At each time step t, the state st of the envi-
the task offloading selection policy A, the task offloading size ronment observed by each agent is defined as st =
policy α, and the computation resource allocation policy fie as [ψ, At−1 , α t−1 , T t−1 , Et−1 ], where ψ = {ψi , i ∈ I}.
follows: Considering that the variables in state st have different ranges,
this requires smaller training rates and careful parameter

I
P 1 : min e Costi initialization, which slows down the training process. In
αi ,Ai ,fi this article, we use the min-max normalization method to
i=1
s.t. C1 : αi ∈ [0, 1] preprocess the state st so that the network can be trained more
 efficiently.
C2 : xik + yim + zi = 1
2) Action: At each time step t, the action at is a com-
m∈L\{k}
putational offloading policy made based on the observed
C3 : xik , yim , zi ∈ {0, 1} environmental state information, which mainly includes the
C4 : 0 ≤ fi,j ≤ fjmax , j ∈ L ∪ {cloud} task offloading size policy α t = {αit , i ∈ I} and the task

C5 : fi,j ≤ fjmax , j ∈ L ∪ {cloud} offloading node selection At = {Ati , i ∈ I} for task computation
j
for each user during the transmission period. Therefore, the
action at can be defined as at = [At , α t ].
C6 : all
Ti ≤ Timax (11) 3) Reward: The purpose of the reward function is to
evaluate the effect of the execution of a given action. After
where C1–C3 are the offloading constraints, C4 and C5 denote
executing action at in-state st at time step t, the state of the
the computation resource constraints of the satellite, C6 denote
environment changes to st+1 and returns the corresponding
the maximum tolerated delay constraint, and fi,j denotes the
reward rt . The reward function is not only related to the
computational resources allocated to user by the satellite or
objective function but also to the corresponding constraints.
the cloud, i.e., fie . The formulated optimization problem is
If the current action can reduce the system task overhead
a nonlinear programming problem. Each user’s optimal task
while satisfying all constraints, it receives a greater reward.
offloading strategy is not solely determined by their own
Therefore, the immediate reward rt at time step t is defined as
approach but is also significantly influenced by the strategies
employed by other users. This interdependence suggests that rt = ω1r Costi + ω2r rp (12)
task offloading strategies are interconnected and mutually
influential. In addition, the action space of the joint offloading where rp denotes the penalty for the task processing timeout,
strategy grows exponentially with the number of users and is and ω1r and ω2r denote the weight coefficients of the corre-
difficult to solve in polynomial time using traditional methods. sponding parts, respectively. As seen from the above equation,
To address the above challenges, we opt for a DRL approach in the design of the reward function, we consider the objective
to solve the formulated optimization problem in a distributed function and the conditional constraints to encourage the agent
and intelligence manner. to produce an overall satisfactory resource allocation strategy.

III. O PTIMAL C OMPUTATION O FFLOADING AND B. Solving Algorithm Based on Multiagent Reinforcement
R ESOURCE A LLOCATION S CHEME Learning
In this section, we first model the optimization problem Based on the above analysis, we propose a multiagent DRL
as a Markov decision process (MDP) to further describe framework to solve the problem in a partitioned way, i.e.,
the process of computational offloading and resource allo- the framework contains two decision-making computational
cation. Second, a computing offloading algorithm based on regions, as shown in Fig. 2. In the proposed solving frame-
the multiagent reinforcement learning is proposed for solving work, the final decision consists of two types, i.e., discrete
the optimization problem. Finally, we describe the allocation action and continuous action. To achieve joint optimization,
process of computational resources. a computational offloading algorithm based on joint double
DQN (DDQN) [33] and DDPG [34] is proposed. Based on the
characteristics of DDQN and DDPG, DDQN is used to learn
A. MDP Model the optimal offloading node selection strategy, and DDPG is
In this article, we consider that all ground users transmit used to learn the task offloading size strategy. In the multiagent
their tasks to the LEO satellite first, i.e., the LEO satellite DRL framework, each ground user device acts as an agent and
knows the computational offloading policies of all ground contains two decision regions in a single agent.
users. Ground users are agents interacting with the environ- In the proposed algorithm, the divide-and-conquer model
ment to gain optimization experience in computing offloading is employed, which reduces the coupling of the discrete-
allocation strategies. Meanwhile, consider that the offloading continuous joint strategy and improves the learning efficiency
decision of the previous time slot affects the offloading of the algorithm. The agent assigns the same observation
decision of the current time slot. Therefore, the optimization states to different decision regions and obtains the computing
problem can be formulated as an MDP, which can then be offloading node selection policy and task offloading size policy

Authorized licensed use limited to: Birla Institute of Technology and Science. Downloaded on February 03,2025 at 17:12:23 UTC from IEEE Xplore. Restrictions apply.
LYU et al.: TASK OFFLOADING AND RESOURCE ALLOCATION 267

Fig. 2. Structure of multiagent DRL-based algorithm framework.

through the decision regions in each round of the decision- generating a specific offloading policy, it interacts with the
making process, respectively. As shown in Fig. 2, while the environment, i.e., the device executes the relevant offloading
computing offload node selection strategy and the task offload policy. The environment receives decision information from
size strategy come from different decision regions, they are not all the agents to compute the optimal computational resource
independent of the overall solution process. For the two other allocation policy. The corresponding action reward information
decision regions, the state input information for each decision can be obtained according to the computational resource
step and the sample data required for algorithm training are allocation strategy. Based on the reward information from the
the same. At the same time, the state information update is environment, the agent judges the strengths and weaknesses of
performed by two different types of decisions together. This its action strategy and further improves its strategy to make a
means the coupling between computing the offloading node better offloading strategy. Through iterative interactions with
selection and the offloading task size is implicit in the state the environment, until all agents learn the optimal global pol-
information. During the training process of the algorithm, each icy, i.e., in a single agent, DDQN learns the optimal offloading
decision region abstracts the mutual fitting correspondence node selection policy, and DDPG learns the optimal offloading
from the state information. Therefore, when the agent makes a task size allocation policy. The following section focuses on
decision, each decision region can get the best action strategy the main decision-making elements in the multiagent DRL
based on the same state information, thus achieving joint framework.
optimization. 1) DDQN for Offload Node Selection: Considering that
To ensure the consistency of state information during the the DQN [35] algorithm produces an overestimation of the
training process, the algorithm adopts the paradigm of training target value, we use the DDQN algorithm instead of the DQN
from the same data source and distributed execution to achieve algorithm. In this section, we describe the decision-making
the joint optimization of two different types of strategies. process for task offloading node selection based on DDQN.
In each round of the policy generation, DDQN and DDPG Due to the discrete nature of the task offloading node selection
generate policies based on the state information obtained from policy, DDQN can be directly used to solve the task offloading
the previous round of policies, respectively. To improve the node allocation problem.
efficiency of the algorithms, the interaction experiences are DDQN is a reinforcement learning algorithm based on an
collected using experience buffering, and these experiences are evaluation model consisting mainly of a target network and
used for the training of the algorithms. When the agent finishes a target Q-network. It is worth noting that the two networks

Authorized licensed use limited to: Birla Institute of Technology and Science. Downloaded on February 03,2025 at 17:12:23 UTC from IEEE Xplore. Restrictions apply.
268 IEEE INTERNET OF THINGS JOURNAL, VOL. 12, NO. 1, 1 JANUARY 2025

have the same structure. The stability of the algorithm is Q value between the Q network and the target Q network can
improved by using the target Q network. At the beginning of be measured by using the following loss function:
each time slot t, the agent computes the action at based on the 
greedy strategy and the observed state st , where the ε −greedy ϕt (θ ) = |Q st , at , θ − zt | (15)
strategy is used to balance action exploration, i.e., to balance where
the exploration of new actions with the utilization of known      
actions. Therefore, action ai of agent i can be expressed as zt = rat t st , st+1 + γ ∗ Q st+1 , arg max Q st+1 , a, θ , θ̂ .
a

(16)
ai =
t  an existing action, probability ε
Randomly select
arg maxa∈A Q st , at , θ , probability(1 − ε) Then, by minimizing the loss function of agent i using
(13) gradient descent, the update of the weights of the Q-network
where Q(st , at , θ ) denotes the Q-value output from the can be expressed as
Q-network given the observation state st and the action as at , θ ← θ − ζd ∇θ ϕ(θ ) (17)
and θ denotes the weight of the Q-network.
2) DDPG for Task Offload Size: Considering that the where ζd is the learning rate. The target Q-network updates
action space of task offloading size is continuous, DDPG is the weights θ̂ by periodically copying the weights θ of the
used to deal with the problem of task offloading size allocation. Q-network.
DDPG is an off-policy reinforcement learning algorithm based DDPG is based on selected experience exp. The estimated
on the actor–critic framework, combining the advantages of Q-values of the critic network and the target critic network
the deterministic policy gradient (DPG) [36] algorithm and the can be expressed as Q(sti , ati , ρ) and Q(st+1 t+1
i , π̂ (si , ξ̂ ), ρ̂),
t+1
DQN algorithm. In the DDPG algorithm, the actor network is where π̂(si , ξ̂ ) denotes the target actor network, ρ is the
used to generate strategies, and the critic network is used to weight of the critic network, ξ̂ denotes the weight of the
evaluate the rewards of state-action pairs. Agent i generates target actor network, and ρ̂ denotes the weight of the target
the task offloading ratio decision through the DDPG decision critic network. The following loss function can represent the
region after obtaining the observation state st . Specifically, difference between the critic network and the target critic
agent i uses the actor network to compute the task offloading network:
ratio αit via sti , i.e., αit = π(sti , ξ ), where π(sti , ξ ) denotes 
the actor network’s strategy, and ξ denotes the weighting Lic (ρ) = (yi (t) − Q(si (t), ai (t), ρ))2 ∀i (18)
t
coefficient of the actor network. Since the DDPG learning
strategy is deterministic, random noise is introduced to balance where yi (t) = ri (t) + γ Q(si (t + 1), π̂ (si (t + 1), ξ̂ ), ρ̂).
exploration and exploitation. Therefore, the task offloading The gradient descent method is used to minimize (18), and
size allocation decision for agent i is the weight update of the critic network can be expressed as
 1
ati = π st , ξ + η 0 , i ∈ I (14) ρ ← ρ − ζp1 ∇ρ Lc (ρ) (19)

where η is denoted as the noise obeys a normal distribution where ζp1 is the learning rate.
N (0, 0.2). The lower and upper bounds are used to constrain According to the DPG theorem, the actor network updates
the task offloading ratio decision considering the task offload- its weights to obtain larger cumulative discount rewards [37].
ing size constraints. Therefore, the loss function of the actor network can be
3) Algorithm Update: After obtaining the action (αit , Ati ) of expressed as
the observed state sti based on the time slot t, the immediate
Lia (ξ ) = −Q(si (t), π(si (t), ξ ), ρ), i ∈ I. (20)
reward rit and the new state st+1 i will be returned to agent i from
the environment. After that, the experience (sti , ati , rit , st+1 i ) The gradient descent method is used to minimize (20), and
gained by agent i at time slot t is stored in the experience the weight update of the actor network can be expressed as
buffer B through the experience replay policy. Next, a mini-
batch of experience Ns is randomly selected to update the ξ ← ξ − ζp2 ∇ξ La (ξ ) (21)
network weights in the decision region of the algorithm. It is
where ζp2 is the learning rate.
worth noting that the experience buffer contains experiences
For the target actor network and the target critic network,
from different time slots. Meanwhile, both decision regions
the weight update depends on the actor and critic network,
are updated using the same experience during the algorithm.
respectively, which are calculated as shown below
Specifically, in the update process of the algorithm, the experi-
ence exp = (sti , ati , rit , st+1
i ) is first selected from the experience ξ̂ ← τ ξ + (1 − τ )ξ̃
buffer, and the experience exp is used to train the neural ρ̂ ← τρ + (1 − τ )ρ̃ (22)
network in the two decision regions. Based on the states sti
and st+1
i in the chosen experience exp, the output Q-values where τ ∈ [0, 1] is the update frequency factor. Soft updating
of the Q-network and the target Q-network in the DDQN are of the network is achieved by τ controlling the proportion
denoted as Q(st , at , θ ) and Q(st , at , θ̂ ), respectively, where θ̂ of weights copied from the primary network to the target
is the weight of the target Q-network. The difference in output network.

Authorized licensed use limited to: Birla Institute of Technology and Science. Downloaded on February 03,2025 at 17:12:23 UTC from IEEE Xplore. Restrictions apply.
LYU et al.: TASK OFFLOADING AND RESOURCE ALLOCATION 269

The details of the computing offloading algorithm based Algorithm 1: Multiagent DRL-Based on Computing
on the joint DDQN and DDPG are shown in Algorithm 1. Offloading Algorithm
In each episode, each agent first obtains the current state Initialize user tasks ψ;
of the environment. Then, it inputs two decision regions to initialize the DDQN decision region’s network
obtain multidimensional actions based on the current state. coefficients, i.e., θ and θ̂ ;
A random selection process is added before executing each Initialize the network coefficients for the DDPG decision
action to increase the exploration of the policy concerning the regions, i.e., ξ , ρ, ξ̂ , and ρ̂;
environment. Specifically, for discrete action decision regions, Initialize replay memory to capacity Bi , i ∈ I;
the ε − greedy policy is employed to generate task offload for ep = 1 to Eamx :
node selection decisions. For the concatenated action decision Initialize the satellite communications scenario;
region, exploration of the environment is increased by adding for t = 1 to Tmax :
action noise to the task offload size decision. Next, the network for i to I:
environment gets the computational resource allocation policy Oberserve the initial state s0i ;
based on the multidimensional action a and executes it. It is Choose the task offloading node selection action
worth noting that instant action rewards and new environment Ai (t) based on DDQN;
states are obtained through Algorithm 2. Then, the network Choose the task offloading size ratio action αi (t)
environment updates the state st+1 and returns the reward rt based on DDPG;
simultaneously. Meanwhile, each agent collects the experience ai (t) ← {Ai (t), αi (t)};
(st , at , rt , st+1 ) and stores it in the experience replay buffer B. End for
Then, a mini-batch of samples is extracted from B to update All agents take the joint action a(t), observe the
the network parameters. Finally, the algorithm keeps iterating new state s(t + 1), and obtain the reward r(t) according
the above process until the maximum number of training is to Algorithm 2;
reached. store tuple (si (t), ai (t), ri (t), si (t + 1)) in Bi ;
for i to I:
C. Optimal Computing Resource Allocation Policy Randomly sample minibatch of Ns tuples
(si (t), ai (t), ri (t), si (t + 1)) from Bi ;
In this section, the satellite or the cloud will allocate com- Update θ according to (17), and update θ̂ by
putational resources to the terrestrial devices whose tasks are copying θ ;
offloaded. To improve computational efficiency, the optimal Update ρ according to (19), update ξ according to
computational resource allocation is considered when the (21), update ξ̂ and ρ̂ according to (22);
device offloads computational tasks to the cloud servers or the End for
MEC server on the satellite. Knowing the offloading policies End for
of the ground equipment (task offloading node selection policy End for
and task offloading ratio policy), the optimal computational
resource allocation problem can be expressed as


I Proof: It can be observed that problem P2 is convex, and
αi Ci
P 2 : min w2 the slater condition is satisfied. First, the Lagrange function of
e fi fie the convex optimization problem P2 can be expressed as
i=1
s.t. C4, C5  
 e αi Ci 
αi Ci L fi = w2 e + λ fi − f
e max
C7.ttran + d (Ai ) + ≤ Timax (23) fi
(26)
fie i

where where λ ≥ 0 denotes the Lagrange multiplier. The first-order


⎧ derived of L concerning fie can then be written as
⎨ ds , xik = 1 
∂L fie αi Ci
d (Ai ) = ds , yim = 1 (24) = −w2  2 + λ. (27)
⎩ ∂fie fe
dc , zi = 1. i

To address the above problem, we give the optimal compu- Letting ([∂L(fie )]/[∂fie ]) = 0, one obtains

tational resource allocation scheme for satellites and the cloud, w2 αi Ci
as shown in the following theorem. fi =
e∗
. (28)
λ
Theorem 1: The optimal computational resource allocation
policy for satellite k ∈ L or the cloud concerning ground device Observation of the above equation shows that λ = 0, hence
i ∈ I with known task offloading a for all ground devices can f
j i,j − f j
max = 0. Substituting (28) into C6 can be solved

be expressed as to obtain λ. Based on the above solution, we can obtain the


computational resource share allocation policy fie denoted as
√ √
αi Ci w2 αi Ci w2 αi Ci
fie∗ = max max ,  √ . (25) fie =  √ .
Ti − ttran − d (Ai ) i w2 αi Ci
(29)
i w2 αi Ci

Authorized licensed use limited to: Birla Institute of Technology and Science. Downloaded on February 03,2025 at 17:12:23 UTC from IEEE Xplore. Restrictions apply.
270 IEEE INTERNET OF THINGS JOURNAL, VOL. 12, NO. 1, 1 JANUARY 2025

Algorithm 2: Environmental Reward Feedback Algorithm


for i in I:
node_list ← {Ai , αi };
End for
for k in L ∪ cloud:
for node in node_list:
if node.Ai = k:
Calculate the optimal computational resource
allocation policy fie∗ according to (29);
End if
End for
End for
r = {}; Fig. 3. Illustration for the convergence of the proposed algorithms.
for i in I:
Calculate the reward ri according to (12) and the action generation process of Algorithm 1 can be expressed as
computational resource allocation policy fie∗ ; O(E ·T ·I(Hi ·H1 +H1 ·H2 +· · ·+Hn ·Ho )), where I denotes the
return r ← ri ; number of users. The time complexity of Algorithm 2 mainly
End for depends on the loop that calculates the reward value. The time
return r; complexity of Algorithm 2 is O(I(L + 1)), where L denotes
the number of satellites. Therefore, the time complexity of the
reward feedback part in Algorithm 1 is O(E · T · I(L + 1)).
During the training of the algorithm, the time complexity of
Finally, considering C4 and C7, we can the backpropagation of the network is O(E · T · I · Ns (Hi · H1 +
obtain the optimal computational resource alloca- H1 · H2 + · · · + Hn · Ho )), where Ns denotes the size of the
tion policy √ fi
e∗ = √ max [([αi Ci ]/[Timax − ttran −
 sampling experience. By comparing the time complexity of

d (Ai )]), ([ w2 αi Ci ]/[ i w2 αi Ci ])]  the three parts, it can be found that the time complexity of the
Based on the analysis of the above theorem, we design network training process is higher than the other two parts.
an environmental reward feedback algorithm based on the Therefore, the time complexity of Algorithm 1 is O(E · T · I ·
optimal computational resource allocation policy, as shown Ns (Hi · H1 + H1 · H2 + · · · + Hn · Ho )).
in Algorithm 2. First, the computing offloading decisions of
all ground users are counted. Then, allocate computational IV. N UMERICAL S IMULATIONS
resources to each user according to (29). Finally, the reward
value for each user is calculated and returned. It is worth In this section, we first describe in detail the performance
noting that obtaining the optimal computational resource simulation settings used to evaluate the algorithms proposed
allocation policy according to Theorem in this article. Then, the simulation results are given and
 e 1 may exceed the
discussed.
load of the MEC server, i.e., f
i i > f max . Therefore,
k
we consider using (29) to obtain the optimal computational
resource allocation policy. This may result in the user being A. Simulation Settings
allocated computational resources that do not meet the user’s The neural network in Algorithm 1 uses a four-layer fully
computational latency requirements. This is a poor strategy connected neural network, including one input layer, two
for the user. The reward value is adjusted by the penalty term hidden layers, and one output layer, where the first hidden
in the reward function (12), which enables the user to learn a layer and the second layer both have 100 neurons in the
better strategy. DQN decision domain. In the DDPG decision domain, the first
hidden layer has 400 neurons, and the second hidden layer
D. Complexity Analysis has 300 neurons. In the neural network, the Relu function
The time complexity of Algorithm 1 depends on three main is used as an activation function, and the Adam optimizer is
components: 1) the action generation process; 2) the reward used for optimal training of the network [38]. Algorithm 1
feedback of Algorithm 2; and 3) the network training process. runs with the number of episodes Emax = 1000 and the length
The deep neural network in Algorithm 1 is a fully connected of time slots Tmax = 40 to train the model. The size of the
neural network. First, the time complexity of analyzing the memory replay buffer is B = 2000, the mini-batch sample size
action generation process can be expressed as O(Hi · H1 + H1 · is Ns = 128, the discount factor is γ = 0.99, and the learning
H2 + · · · + Hn · Ho ), where Hi , Hn , and Ho are the sizes of rate is 0.01. Reward value weighting factor wr1 = wr2 = 1.
the input, hidden, and output layers, respectively. It is worth
noting that the size of the input layer of the action network in B. Performance Evaluation
Algorithm 1 depends on the state dimension and the size of the To reduce the complexity of the simulation model, we
output layer depends on the action dimension. In addition, the refer to [39] and [40] to simplify the number of users and
number of training sessions depends on the training episode satellites. This is because for the number of satellites, any
E and the period T. Therefore, the time complexity of the large constellation can be logically divided into several small

Authorized licensed use limited to: Birla Institute of Technology and Science. Downloaded on February 03,2025 at 17:12:23 UTC from IEEE Xplore. Restrictions apply.
LYU et al.: TASK OFFLOADING AND RESOURCE ALLOCATION 271

Fig. 4. Computation cost under different task input data sizes. Fig. 6. Computational cost of tasks under different number of users.

Fig. 5. Computation cost of tasks under different number of CPU cycles


required per bit.
Fig. 7. Comparison of latency and energy consumption.
satellite networks [41], [42]. In this article, we assume a
small satellite subnetwork containing three satellites and a 3) Multiagent DQN Scheme (MA-DQN): Similar to [48],
ground cloud center, where the satellite subnetwork has two the MA-DQN algorithm is used to solve the offloading policy
LEO satellites and one MEO satellite, the altitude of the for ground users. Each ground user collaboratively learns the
LEO satellites is 1000 km [43], and the distance between optimal task offloading policy by training a single DQN. To
the LEO satellites is 800 km, the distance between the LEO make the DQN algorithm applicable to the model proposed
satellites and the ground cloud end is 2000 km [44], and the in this article, we discretize the continuous action space and
distance between the LEO and the MEO satellites is 1000 km. divide it equally into ten parts. In addition, the network
For terrestrial users, we consider that each user device has only architecture of the DQN algorithm is the same as that of
one computation task per unit of time, where the input size of the DDQN, and the ε − greedy strategy is also used in the
the computation task is [300, 900] Kb , the maximum tolerable exploration process.
delay of the computation task is [ 0.5, 2] s, and the number of 4) Double DQN: Similar to [49], the user generates the
CPUs requested by the computation task ranges from [0.5, 1] offloading policy through the DDQN algorithm. In this sce-
Kcycles/bits. We assume the computing power of the terrestrial nario, each ground user acts as an agent and trains a single
user device is 0.5 GHz, the computing resources of the LEO DDQN network to learn the optimal offloading policy collab-
satellite and the cloud are 3 and 10 GHz, respectively [29], oratively. It is worth needing that DDQN has the same action
and the computing resources of the MEO satellite are 5 space as MA-DQN.
GHz. The transmit powers of user-satellite is 2 W [45]. The First, the convergence of the algorithm proposed in this
user-satellite and satellite-cloud communication resources are article is shown in Fig. 3. As shown in Fig. 3, we can see
200 and 300 Mb/s, respectively [46]. The propagation delay that the reward value decreases with the increase of episodes
between LEO satellites and the ground users is all set to be until it reaches a relatively stable value. It is reasonable that
30 ms [47]. the algorithm’s reward depends on the task’s computational
In addition, we compare the algorithm proposed in this expense. As the plot iterates, the algorithm’s computational
article with four benchmark schemes, which are as follows. offloading strategy is optimized, and the computational cost of
1) Local Computing Scheme: In this scheme, the user the task becomes smaller and smaller. The experimental results
performs all tasks locally, regardless of whether the user’s task show that the algorithm proposed in this article is convergent.
computing requirements are satisfied or unsatisfied. To compare with the above benchmark algorithms, we consider
2) Random Baseline Scheme (Random): In this scheme, the number of ground user devices equal to 10.
each ground user randomly selects a strategy for task process- Fig. 4 illustrates the relationship between the computation
ing within the constraints of the task offloading strategy. cost and the size of the input task. As shown in Fig. 4, the

Authorized licensed use limited to: Birla Institute of Technology and Science. Downloaded on February 03,2025 at 17:12:23 UTC from IEEE Xplore. Restrictions apply.
272 IEEE INTERNET OF THINGS JOURNAL, VOL. 12, NO. 1, 1 JANUARY 2025

Fig. 8. Computation cost of tasks under different network architectures.

computation cost of all methods gradually increases as the


computation task size increases. This is because the larger
the task size, the more data is input and the more energy
and time consumed for task execution and data transfer. It
can be observed that the proposed scheme in this article
has a lower task computation cost than the other schemes
because the DDPG and DDQN schemes learn to coordinate
the decisions of task offloading ratio and offloading node
selection to get better performance. Hence, the proposed
scheme in this article can provide computing services with
smaller computation spend compared to LC, Random, MA-
DQN, and DDQN schemes. It is worth noting that since (a)
the Random scheme selects strategies randomly, there is no
guarantee of the superiority of a strategy. When a better
strategy is selected, it decreases the task cost. When a worse
strategy is selected, it increases the task cost. Therefore, there
are some bumps in the Random scheme curve.
Fig. 5 illustrates the trend of computation cost versus the
number of CPUs requested by a task. As the number of
requested CPUs for a task increases, it increases the required
computational power. As a result, the cost of all methods
increases. As shown in Fig. 5, we can observe that the LC
scheme has the fastest growth because users have a fixed and
limited LC power that cannot meet the demands of computing
(b)
tasks in real time. The cost of the other three schemes increases
as the computational power increases, with the MA-DQN Fig. 9. Computation cost of tasks under different computing resource
and DDQN schemes increasing more slowly. However, the allocation policies. (a) Different offloading policy. (b) Same offloading policy.
computation cost of the algorithm proposed in this article is
still the smallest. Fig. 7 shows the comparison of latency and energy con-
Fig. 6 compares the total computation cost of the different sumption, where the number of users is 12 and the computing
schemes with the number of user devices. It can be seen power of the user device is 0.5 GHz. It can be observed that the
from Fig. 6 that the entire computation cost of offloading percentage of delay is greater than the energy consumption.
achieved by all the algorithms increases as the number of It is because in this article only the energy consumption of
user devices increases. Both satellite or LC resources and the terminal device is considered, so the delay is greater than
communication resources are limited. As a result, as the the energy consumption. In Fig. 7, the proposed scheme in
number of user devices increases, the computing resources this article not only exhibits a lower total cost compared
available to each user decrease, thereby increasing the total to other schemes but also demonstrates lower delay and
computing cost. We can also observe that the proposed scheme energy consumption within the cost metrics. This demonstrates
in this article can provide lower-cost computing services than that the proposed scheme is effective in reducing the delay
other schemes. This is because the proposed scheme can and energy consumption, indicating the effectiveness of the
intelligently coordinate the offloading ratio and offloading offloading decisions.
node selection decisions, and thus, it achieves the lowest task To further analyze the advantages of the network architec-
computation cost than the other schemes. ture proposed in this article. We consider the computational

Authorized licensed use limited to: Birla Institute of Technology and Science. Downloaded on February 03,2025 at 17:12:23 UTC from IEEE Xplore. Restrictions apply.
LYU et al.: TASK OFFLOADING AND RESOURCE ALLOCATION 273

cost of the tasks with different numbers of satellites for the agents take actions involving discrete and continuous vari-
same number of tasks, where the local computational power ables. Finally, we propose an optimal computational resource
is 0.2 GHz and the number of users is 12. We compare the allocation scheme used for satellite/cloud resource allocation
single satellite network (SSN) architecture and double satellite to improve the computational efficiency of the task. Through
network (DSN) architectures, respectively, and the specific simulation experiments, we find that the multiagent reinforce-
results are shown in Fig. 8. As shown in Fig. 8, we can find ment learning-based approach achieves lower computational
that the computational expense of the task gradually decreases cost performance than the existing benchmark algorithms
as the number of satellites increases. It can be observed that (including LC, Random, MA-DQN, and DDQN algorithms).
the proposed scheme in this article still has the smallest
computational spend for the task under different network archi-
tectures. This is because more edge nodes can provide richer R EFERENCES
computational resources. Meanwhile, reasonable selection of [1] T. de Cola and I. Bisio, “QoS optimisation of eMBB services in
offloading nodes and allocation of computing resources can converged 5G-satellite networks,” IEEE Trans. Veh. Technol., vol. 69,
no. 10, pp. 12098–12110, Oct. 2020.
reduce tasks’ data transmission delay and computation energy
[2] H. Xu, S. Han, X. Li, and Z. Han, “Anomaly traffic detection
consumption, thus reducing computation cost. The proposed based on communication-efficient federated learning in space-air-ground
scheme in this article has the most minor task computation cost integration network,” IEEE Trans. Wireless Commun., vol. 22, no. 12,
compared to the other schemes. This implies that the scheme pp. 9346–9360, Dec. 2023.
[3] G. Giambene, S. Kota, and P. Pillai, “Satellite-5G integration: A network
proposed in this article can provide a better computational perspective,” IEEE Netw., vol. 32, no. 5, pp. 25–31, Oct. 2018.
offloading strategy. [4] J. Chen, Z. Chang, X. Guo, R. Li, Z. Han, and T. Hämäläinen, “Resource
Fig. 9 illustrates the impact of computational resource allo- allocation and computation offloading for multi-access edge computing
with fronthaul and backhaul constraints,” IEEE Trans. Veh. Technol.,
cation policies on task computation cost. Fig. 9(a) illustrates vol. 70, no. 8, pp. 8037–8049, Aug. 2021.
the task computation spend for each scenario under different [5] S. Sthapit, S. Lakshminarayana, L. He, G. Epiphaniou, and C. Maple,
computational resource allocation strategies. LC does not “Reinforcement learning for security-aware computation offloading
in satellite networks,” IEEE Internet Things J., vol. 9, no. 14,
consider allocating computational resources at the edge end, pp. 12351–12363, Jul. 2022.
so the LC scheme is not affected by the computational [6] J. Du et al., “MADDPG-based joint service placement and task offload-
resource allocation policy. It can be observed that the optimal ing in MEC empowered air–ground integrated networks,” IEEE Internet
Things J., vol. 11, no. 6, pp. 10600–10615, Mar. 2024.
computational resource allocation policy among the other four
[7] H. Xu, W. Huang, Y. Zhou, D. Yang, M. Li, and Z. Han, “Edge
schemes can reasonably allocate the computational resources computing resource allocation for unmanned aerial vehicle assisted
and reduce the computation cost of the tasks. Meanwhile, mobile network with blockchain applications,” IEEE Trans. Wireless
since the uniform allocation policy gives the same computing Commun., vol. 20, no. 5, pp. 3107–3121, May 2021.
[8] J. Du et al., “Resource pricing and allocation in MEC enabled blockchain
resources to each user, i.e., users under the same offload systems: An A3C deep reinforcement learning approach,” IEEE Trans.
selection policy have the same computing resources. The Netw. Sci. Eng., vol. 9, no. 1, pp. 33–44, Jan. 2022.
scheme proposed in this article optimizes the task offloading [9] T. Lyu, H. Xu, F. Liu, M. Li, L. Li, and Z. Han, “Two layer
Stackelberg game-based resource allocation in cloud-network conver-
ratio according to the computational task requirements and gence service computing,” IEEE Trans. Cogn. Commun. Netw., early
balances the task ratio between local computation and offload- access, Apr. 23, 2024, doi: 10.1109/TCCN.2024.3392809.
ing processing. As a result, the algorithm proposed in this [10] J. Zheng, Y. Cai, Y. Wu, and X. Shen, “Dynamic computation offloading
for mobile cloud computing: A stochastic game-theoretic approach,”
article achieves better performance, i.e., lower computation IEEE Trans. Mobile Comput., vol. 18, no. 4, pp. 771–786, Apr. 2019.
cost. Fig. 9(b) illustrates the impact of different computational [11] Y. Hao, M. Chen, L. Hu, M. S. Hossain, and A. Ghoneim, “Energy
resource allocation policies on the computation cost of a task efficient task caching and offloading for mobile edge computing,” IEEE
Access, vol. 6, pp. 11365–11373, 2018.
under the same offloading policy. Since the computational
[12] S. Yu, R. Langar, X. Fu, L. Wang, and Z. Han, “Computation offloading
resource allocation policy does not affect the LC scheme, we with data caching enhancement for mobile edge computing,” IEEE
do not consider the LC scheme. It can be observed that the Trans. Veh. Technol., vol. 67, no. 11, pp. 11098–11112, Nov. 2018.
optimal computational resource allocation scheme reduces the [13] A. Alsharoa and M.-S. Alouini, “Improvement of the global con-
nectivity using integrated satellite-airborne-terrestrial networks with
task spend of each scheme under the same offloading policy. resource optimization,” IEEE Trans. Wireless Commun., vol. 19, no. 8,
pp. 5088–5100, Aug. 2020.
[14] G. Cui, P. Duan, L. Xu, and W. Wang, “Latency optimization for hybrid
GEO–LEO satellite-assisted IoT networks,” IEEE Internet Things J.,
vol. 10, no. 7, pp. 6286–6297, Apr. 2023.
V. C ONCLUSION [15] B. Di, H. Zhang, L. Song, Y. Li, and G. Y. Li, “Ultra-dense LEO:
In this article, we study the computing offloading problem Integrating terrestrial-satellite networks into 5G and beyond for data
offloading,” IEEE Trans. Wireless Commun., vol. 18, no. 1, pp. 47–62,
for STIN and jointly optimize the computing offloading node Jan. 2019.
selection and the task offloading ratio to minimize the task [16] R. Deng, B. Di, S. Chen, S. Sun, and L. Song, “Ultra-dense LEO
computation cost (weighted sum of task computation energy satellite offloading for terrestrial networks: How much to pay the
satellite operator?” IEEE Trans. Wireless Commun., vol. 19, no. 10,
consumption and task computation delay). First, a hybrid pp. 6240–6254, Oct. 2020.
satellite and the cloud multilayer MEC network architecture is [17] N. Cheng et al., “Space/aerial-assisted computing offloading for IoT
proposed to provide richer computational resources for users. applications: A learning-based approach,” IEEE J. Sel. Areas Commun.,
To solve the task offloading problem for ground users, we vol. 37, no. 5, pp. 1117–1129, May 2019.
[18] T. Kim, J. Kwak, and J. P. Choi, “Satellite edge computing architecture
propose a multiagent reinforcement learning-based approach, and network slice scheduling for IoT support,” IEEE Internet Things J.,
which integrates DDQN and DDPG, being able to help vol. 9, no. 16, pp. 14938–14951, Aug. 2022.

Authorized licensed use limited to: Birla Institute of Technology and Science. Downloaded on February 03,2025 at 17:12:23 UTC from IEEE Xplore. Restrictions apply.
274 IEEE INTERNET OF THINGS JOURNAL, VOL. 12, NO. 1, 1 JANUARY 2025

[19] L. Zhang, F. Liu, and J. Sun, “Sofware-defined space-ground-vehicle [42] X. Gao, R. Liu, A. Kaushik, J. Thompson, H. Zhang, and Y. Ma,
integrated network architecture for unconscious payment,” IEEE Access, “Dynamic resource management for neighbor-based VNF placement
vol. 10, pp. 128122–128131, 2022. in decentralized satellite networks,” in Proc. 1st Int. Conf. 6G Netw.
[20] Z. Song, Y. Hao, Y. Liu, and X. Sun, “Energy-efficient multiaccess (6GNet), Paris, France, 2022, pp. 1–5.
edge computing for terrestrial-satellite Internet of Things,” IEEE Internet [43] G. Cui, X. Li, L. Xu, and W. Wang, “Latency and energy
Things J., vol. 8, no. 18, pp. 14202–14218, Sep. 2021. optimization for MEC enhanced SAT-IoT networks,” IEEE Access,
[21] X. Cao et al., “Edge-assisted multi-layer offloading optimization of LEO vol. 8, pp. 55915–55926, 2020.
satellite-terrestrial integrated networks,” IEEE J. Sel. Areas Commun., [44] H. Zhang, R. Liu, A. Kaushik, and X. Gao, “Satellite edge computing
vol. 41, no. 2, pp. 381–398, Feb. 2023. with collaborative computation offloading: An intelligent deep determin-
[22] T. Chen et al., “Learning-based computation offloading for IoRT through istic policy gradient approach,” IEEE Internet Things J., vol. 10, no. 10,
Ka/Q-band satellite–terrestrial integrated networks,” IEEE Internet pp. 9092–9107, May 2023.
Things J., vol. 9, no. 14, pp. 12056–12070, Jul. 2022. [45] G. Cui, Y. Long, L. Xu, and W. Wang, “Joint offloading and resource
[23] X. Gao, R. Liu, and A. Kaushik, “Virtual network function placement in allocation for satellite assisted vehicle-to-vehicle communication,” IEEE
satellite edge computing with a potential game approach,” IEEE Trans. Syst. J., vol. 15, no. 3, pp. 3958–3969, Sep. 2021.
Netw. Service Manag., vol. 19, no. 2, pp. 1243–1259, Jun. 2022. [46] Y. Wang, J. Zhang, X. Zhang, P. Wang, and L. Liu, “A computation
offloading strategy in satellite terrestrial networks with double edge
[24] C. Zhou et al., “Deep reinforcement learning for delay-oriented IoT task
computing,” in Proc. IEEE Int. Conf. Commun. Syst. (ICCS), Chengdu,
scheduling in SAGIN,” IEEE Trans. Wireless Commun., vol. 20, no. 2,
China, 2018, pp. 450–455.
pp. 911–925, Feb. 2021.
[47] S. Yu, X. Gong, Q. Shi, X. Wang, and X. Chen, “EC-SAGINs: Edge
[25] X. Qi, B. Zhang, Z. Qiu, and L. Zheng, “Using inter-mesh links to
computing-enhanced space–air–ground integrated networks for Internet
reduce end-to-end delay in walker delta constellations,” IEEE Commun.
of Vehicles,” IEEE Internet Things J., vol. 9, no. 8, pp. 5742–5754,
Lett., vol. 75, no. 9, pp. 3070–3074, Sep. 2021.
Apr. 2022.
[26] J. Liu, X. Zhao, P. Qin, S. Geng, and S. Meng, “Joint dynamic task [48] Q. Ye, W. Shi, K. Qu, H. He, W. Zhuang, and X. S. Shen, “Learning-
offloading and resource scheduling for WPT enabled space-air-ground based computing task offloading for autonomous driving: A load
power Internet of Things,” IEEE Trans. Netw. Sci. Eng., vol. 9, no. 2, balancing perspective,” in Proc. IEEE Int. Conf. Commun., Montreal,
pp. 660–677, Mar. 2022. QC, Canada, 2021, pp. 1–6.
[27] S. Zhang, H. Gu, K. Chi, L. Huang, K. Yu, and S. Mumtaz, “DRL-based [49] Y. Ju et al., “Joint secure offloading and resource allocation for
partial offloading for maximizing sum computation rate of wireless pow- vehicular edge computing network: A multi-agent deep reinforcement
ered mobile edge computing network,” IEEE Trans. Wireless Commun., learning approach,” IEEE Trans. Intell. Transp. Syst., vol. 24, no. 5,
vol. 21, no. 12, pp. 10934–10948, Dec. 2022. pp. 5555–5569, May 2023.
[28] X. Gao, R. Liu, A. Kaushik, and H. Zhang, “Dynamic resource allocation
for virtual network function placement in satellite edge clouds,” IEEE
Trans. Netw. Sci. Eng., vol. 9, no. 4, pp. 2252–2265, Jul. 2022.
[29] Q. Tang, Z. Fei, B. Li, and Z. Han, “Computation offloading in LEO
satellite networks with hybrid cloud and edge computing,” IEEE Internet
Things J., vol. 8, no. 11, pp. 9164–9176, Jun. 2021. Ting Lyu received the B.S. degree in software
[30] J. Zhang et al., “Energy-latency tradeoff for energy-aware offloading engineering from Jiangxi University of Science and
in mobile edge computing networks,” IEEE Internet Things J., vol. 5, Technology, Ganzhou, China, in 2017, and the M.S.
no. 4, pp. 2633–2645, Aug. 2018. degree from the School of Artificial Intelligence,
[31] X.-Q. Pham, T. Huynh-The, E.-N. Huh, and D.-S. Kim, “Partial Guangxi Minzu University, Nanning, China, in
computation offloading in parked vehicle-assisted multi-access edge 2020. He is currently pursuing the Ph.D. degree
computing: A game-theoretic approach,” IEEE Trans. Veh. Technol., with the School of Computer and Communication
vol. 71, no. 9, pp. 10220–10225, Sep. 2022. Engineering, University of Science and Technology
Beijing, Beijing, China.
[32] P. Yang, F. Lyu, W. Wu, N. Zhang, L. Yu, and X. S. Shen, “Edge
His current research interests include wire-
coordinated query configuration for low-latency and accurate video
less resource allocation and management, edge
analytics,” IEEE Trans. Ind. Informat., vol. 16, no. 7, pp. 4855–4864,
computing, game theory, and reinforcement learning.
Jul. 2020.
[33] H. Van Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning
with double Q-learning,” in Proc. AAAI, Phoenix, AZ, USA, 2016,
pp. 2094–2100.
[34] T. P. Lillicrap et al., “Continuous control with deep reinforcement
learning,” 2015, arXiv:1509.02971. Yueqiang Xu received the M.E. degree from
[35] V. Mnih et al., “Human-level control through deep reinforcement Xi’an University of Posts and Telecommunications,
learning,” Nature, vol. 518, no. 7540, pp. 529–533, Feb. 2015. Xi’an, China, in 2017, and the Ph.D. degree from
the College of Information and Communication
[36] D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, and M. Riedmiller,
Engineering, Beijing University of Posts and
“Deterministic policy gradient algorithms,” in Proc. Int. Conf. Mach.
Telecommunications, Beijing, China, 2023.
Learn., Beijing, China, 2014, pp. 1–9.
He is currently a Postdoctoral Fellow with the
[37] Y. Yuan, G. Zheng, K.-K. Wong, and K. B. Letaief, “Meta-reinforcement
University of Science and Technology Beijing,
learning based resource allocation for dynamic V2X communica-
Beijing. His current research interests include
tions,” IEEE Trans. Veh. Technol., vol. 70, no. 9, pp. 8964–8977,
future wireless networks, mobile edge computing,
Sep. 2021.
blockchain, and machine learning.
[38] J. Chen, L. Guo, J. Jia, J. Shang, and X. Wang, “Resource allocation for
IRS assisted SGF NOMA transmission: A MADRL approach,” IEEE J.
Sel. Areas Commun., vol. 40, no. 4, pp. 1302–1316, Apr. 2022.
[39] S. Zhang, G. Cui, Y. Long, and W. Wang, “Joint computing and
communication resource allocation for satellite communication networks
with edge computing,” China Commun., vol. 18, no. 7, pp. 236–252, Feifei Liu received the M.S. degree from Inner
Jul. 2021. Mongolia University of Technology, Hohhot, China.
[40] C. Qiu, H. Yao, F. R. Yu, F. Xu, and C. Zhao, “Deep Q-learning aided She is currently pursuing the Ph.D. degree with
networking, caching, and computing resources allocation in software- the School of Computer and Communication
defined satellite-terrestrial networks,” IEEE Trans. Veh. Technol., vol. 68, Engineering, University of Science and Technology
no. 6, pp. 5871–5883, Jun. 2019. Beijing, Beijing, China.
Her current research interests include artificial
[41] Z. Han, C. Xu, Z. Xiong, G. Zhao, and S. Yu, “On-demand dynamic
intelligence, optical communication, and optical
controller placement in software defined satellite-terrestrial networking,”
soliton.
IEEE Trans. Netw. Service Manag., vol. 18, no. 3, pp. 2915–2928,
Sep. 2021.

Authorized licensed use limited to: Birla Institute of Technology and Science. Downloaded on February 03,2025 at 17:12:23 UTC from IEEE Xplore. Restrictions apply.
LYU et al.: TASK OFFLOADING AND RESOURCE ALLOCATION 275

Haitao Xu (Member, IEEE) received the B.S. degree Zhu Han (Fellow, IEEE) received the B.S. degree
in communication engineering from Sun Yat-sen in electronic engineering from Tsinghua University,
University, Guangzhou, China, in 2007, the M.S. Beijing, China, in 1997, and the M.S. and Ph.D.
degree in communication system and signal pro- degrees in electrical and computer engineering from
cessing from University of Bristol, University of the University of Maryland, College Park, MD,
Bristol, Bristol, U.K., in 2009, and the Ph.D. degree USA, in 1999 and 2003, respectively.
in communication and information system from From 2000 to 2002, he was a Research and
the University of Science and Technology Beijing, Development Engineer of JDSU, Germantown, MD.
Beijing, China, in 2014. From 2003 to 2006, he was a Research Associate
He is currently a Professor with the Department of with the University of Maryland. From 2006 to
Communication Engineering, University of Science 2008, he was an Assistant Professor with Boise State
and Technology Beijing. His research interests include wireless resource allo- University, Boise, ID, USA. He is currently a John and Rebecca Moores
cation and management, wireless communications and networking, dynamic Professor with the Electrical and Computer Engineering Department and the
game and mean field game theory, big data analysis, and security. Computer Science Department, University of Houston, Houston, TX, USA.
Dr. Xu has co-edited a book titled Security in Cyberspace and co-authored His main research targets on the novel game-theory related concepts critical
over 50 technical papers. to enabling efficient and distributive use of wireless networks with limited
resources. His other research interests include wireless resource allocation and
management, wireless communications and networking, quantum computing,
data science, smart grid, security, and privacy.
Dr. Han received an NSF Career Award in 2010, the Fred W. Ellersick Prize
of the IEEE Communication Society in 2011, the EURASIP Best Paper Award
for the Journal on Advances in Signal Processing in 2015, the IEEE Leonard
G. Abraham Prize in the field of communications systems (Best Paper Award
in IEEE JSAC) in 2016, and several best paper awards in IEEE conferences.
He is also the winner of the 2021 IEEE Kiyo Tomiyasu Award (an IEEE
Technical Field Award) for outstanding early- to mid-career contributions
to technologies holding the promise of innovative applications, with the
following citation: “For Contributions to Game Theory and Distributed
Management of Autonomous Communication Networks.” He has been a 1%
highly cited researcher since 2017, according to Web of Sciences. He was an
IEEE Communications Society Distinguished Lecturer from 2015 to 2018, an
AAAS Fellow since 2019, and an ACM distinguished member since 2019.

Authorized licensed use limited to: Birla Institute of Technology and Science. Downloaded on February 03,2025 at 17:12:23 UTC from IEEE Xplore. Restrictions apply.

You might also like