Task Offloading and Resource Allocation For Satellite-Terrestrial Integrated Networks
Task Offloading and Resource Allocation For Satellite-Terrestrial Integrated Networks
1, 1 JANUARY 2025
Abstract—Low-Earth orbit (LEO) satellite networks can difficulty in fully covering some complex terrains, such as
achieve global network coverage without geographical restric- mountains and oceans, are no longer able to satisfy the global
tions and are essential to the future communication network. demand for ubiquitous connectivity for 6G and beyond [1], [2].
In this article, we study the computing offloading problem in a
satellite-terrestrial integrated network for the Internet of Remote To overcome the related challenges, satellite communication
Things (IoRT), which aims to reduce the total cost (weighted networks (SCNs) have emerged as a potential core technology.
sum of energy consumption and delay), and jointly offload node As an adequate extension of terrestrial cellular networks,
selection, offloading ratio, and computational resource allocation SCN can provide reliable and stable communication services
to achieve the dynamic management of network resources. regardless of geographical conditions [3]. Expanding coverage
First, we propose a hybrid cloud and satellite multilayer
multiaccess edge computing (MEC) network architecture that by providing services via satellite networks to users in areas
can provide heterogeneous computing resources to terrestrial inaccessible to terrestrial cellular networks or where infras-
users. Subsequently, since the problem under consideration is tructure has been compromised. Terrestrial users can connect
a mixed-integer nonlinear programming problem, we propose with a terrestrial cloud center through a satellite network.
a computing offloading algorithm for multiagent reinforcement However, with the development of wireless communication
learning, which is an integration of double deep Q learning
(DDQN) and deep deterministic policy gradient (DDPG). The technology, the demand of ground users for services, such as
algorithm can learn the optimal policy for actions containing a augmented reality and virtual reality has increased dramat-
mixture of discrete and continuous variables. Finally, an optimal ically. These services require high computational efficiency,
computational resource allocation scheme is proposed to improve and offloading tasks to the cloud may not satisfy real-time
the task computation efficiency. Simulation results show that requirements. To reduce the transmission delay of tasks, multi-
the proposed task offloading and resource allocation scheme can
achieve reasonable scheduling of computational tasks and optimal access edge computing (MEC) [4], [6], [7] has been proposed.
allocation of computational resources, reducing the cost of task Efficient and flexible computing services are provided by
computation. utilizing the computing resources at the network’s edge.
Index Terms—Computing offloading, deep reinforce- Therefore, low-Earth orbit (LEO) satellite edge computing
ment learning (DRL), multiagent, resource allocation, networks emerged [5], [8], [9]. Deploying MEC servers on
satellite-terrestrial integrated network (STIN). LEO satellites allows ground users to transfer computing tasks
to the satellite for execution, which can significantly reduce
latency and energy consumption [10], [11], [12].
I. I NTRODUCTION
ITH the unprecedented development of the Internet
W of Things (IoT), video transmission, and other emerg-
ing applications, traditional terrestrial networks, which have
A. Related Work
Satellite MEC (SMEC) networks have received extensive
attention from academia and industry and are considered an
Received 22 February 2024; revised 17 June 2024 and 22 August 2024; important research direction for future networks [13].
accepted 9 September 2024. Date of publication 23 September 2024; date 1) Network Architecture: Cui et al. [14] proposed a hybrid
of current version 25 December 2024. This work was supported in part geostationary Earth orbit (GEO) and LEO network architecture
by the Hebei Natural Science Foundation under Grant F2022402001; in
part by the National Natural Science Foundation of China under Grant to solve the load imbalance problem of the LEO network.
62341129 and Grant 6240010820; in part by the China Postdoctoral Science User tasks can be processed in LEO satellites or forwarded
Foundation under Grant 2024M750181; in part by the Postdoctoral Fellowship to ground gateways via GEO satellites. Di et al. [15] and
Program of CPSF under Grant GZB20240061; in part by the NSF under
Grant CNS-2107216, Grant CNS-2128368, Grant CMMI-2222810, and Grant Deng et al. [16] proposed a satellite-terrestrial network archi-
ECCS-2302469; in part by the US Department of Transportation, Toyota; tecture with efficient data offloading, considering a satellite
and in part by the Amazon and Japan Science and Technology Agency network combined with a terrestrial network, where the satel-
(JST) Adopting Sustainable Partnerships for Innovative Research Ecosystem
(ASPIRE) under Grant JPMJAP2326. (Corresponding author: Haitao Xu.) lite has to forward the terrestrial user’s data to a terrestrial
Ting Lyu, Yueqiang Xu, Feifei Liu, and Haitao Xu are with the gateway or terrestrial base station (BS) for processing. Cheng
Department of Communication Engineering, University of Science and et al. [17] proposed a space-air-ground integrated network
Technology Beijing, Beijing 100083, China (e-mail: [email protected];
[email protected]; [email protected]; [email protected]). (SAGIN) edge/cloud computing architecture to provide com-
Zhu Han is with the Department of Electrical and Computer Engineering, puting services for computationally intensive tasks. Kim
University of Houston, Houston, TX 77004 USA, and also with the et al. [18] proposed a satellite edge computing architecture
Department of Computer Science and Engineering, Kyung Hee University,
Seoul 446-701, South Korea (e-mail: [email protected]). to support IoT and improve network efficiency by optimizing
Digital Object Identifier 10.1109/JIOT.2024.3465656 the network slice scheduling scheme. Zhang et al. [19]
2327-4662
c 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Birla Institute of Technology and Science. Downloaded on February 03,2025 at 17:12:23 UTC from IEEE Xplore. Restrictions apply.
LYU et al.: TASK OFFLOADING AND RESOURCE ALLOCATION 263
proposed a software-defined space-ground-vehicle integrated network to provide efficient computing services to terrestrial
network (SD-SGVIN) architecture for transport environments users at low cost.
where satellites improve network coverage and facilitate the This article studies the dynamic computing offloading problem
interconnection of road networks. based on hybrid strategies. In addition, a multiagent reinforce-
2) Resource Allocation: Song et al. [20] proposed a new ment learning-based computing offloading scheme is proposed,
terrestrial satellite IoT MEC framework to provide task- considering the practical environment of different application
processing capabilities for mobile devices in remote IoT. requirements and random task generation. Specifically, for
An energy-efficient computational offloading and resource the computing offloading requirements of varying computing
allocation algorithm is proposed to minimize the energy tasks, we propose an algorithm based on DRL to achieve an
consumption of all mobile devices. Cao et al. [21] investigated optimal solution for task offloading node selection and task
an LEOS edge-assisted multilayer MEC system to enhance the offloading ratio. Then, we design an optimal computational
coverage of a multilayer MEC system and solve the computing resource allocation scheme to achieve the optimal allocation of
problem for users in congested and isolated areas. To achieve computational resources in the dynamic environment and reduce
high transmission rates and communication robustness in the task computation cost. The main contributions of this article
satellite dynamic network environments, an optimal offloading are summarized as follows.
scheme based on deep reinforcement learning (DRL) [22] 1) A multilayer satellite-terrestrial network edge computing
has been proposed to accomplish computationally intensive architecture has been proposed to provide computing
and delay-sensitive tasks through joint offloading paths and services for user tasks with different computational
resource allocation. Gao et al. [23] considered satellite edge requirements. Collaborative processing of computing
computing with virtual network functions (VNFs) technology tasks is achieved through hybrid clouds and satellites,
and devise a potential game-based resource allocation method where intersatellite computing resources can be shared
for NFV placement in satellite edge computing to maximize through intersatellite links (ISLs). Computing tasks gen-
network benefits. A collaborative computational offloading erated by ground users can be computed locally, on
strategy for SMEC networks is proposed by integrating com- low-orbit satellites or on the cloud servers.
putational resources within the coverage of LEO satellites 2) To address the resource requirements of different com-
to minimize user-perceived delays. Zhou et al. [24] studied puting tasks, this article fully uses the computing
the computational task scheduling problem in SAGIN. They resources of the cloud and individual satellites to provide
proposed a novel deep risk-sensitive reinforcement learning heterogeneous computing services for ground users.
solving algorithm that minimizes offloading and computational Then, the objective optimization problem studied in this
delays for all tasks. article is to make the task computation spend the least
Although the above work has studied the network service while satisfying the computation delay and resource
architecture and resource allocation scheme of SMEC, it has constraints to ensure the completion of the task.
not fully considered the intersatellite collaboration. Different 3) Considering the service demands of different computing
from the above research works, this article designs a multilayer tasks and solving the joint decision problem of discrete
network service architecture and proposes a multiagent rein- decision (task offloading node selection) and continu-
forcement learning-based approach to jointly optimize the ous variable (task offloading ratio) at the same time,
resource allocation strategy and enhance computational col- this article proposes a multiagent computing offloading
laboration among satellites. strategy with joint deep Q network (DQN) and deep
deterministic policy gradient (DDPG). Extensive simula-
tion results show that the scheme has good convergence
performance and achieves better results than the bench-
B. Research Motivation and Contributions mark algorithm.
The seamless integration of ground stations and LEOS The remainder of this article is organized as follows.
edges enables increased availability, continuity, and scal- Section II presents the system model and the associated
ability of network services, providing solutions for the problem. In Section III, we offer the algorithm based on
next-generation wireless ecosystem. In this article, the MEC multiagent DRL. In Section IV, we show simulation results
system concept plays an essential role in satellite-terrestrial and discuss and analyze the corresponding results. We con-
integrated networks (STINs) by providing low computational clude this article in Section V.
latency, high energy efficiency, and broad coverage. However,
how do you design a reasonable network service architecture II. S YSTEM M ODEL
in an LEO edge computing network? At the same time, In this section, we propose a multisatellite hybrid terres-
it is important to conceive an efficient network resource trial cloud-based collaborative computation offload network,
management scheme for the designed framework to achieve including a network, computing, and cost model. Table I lists
a rational allocation of overall network resources to achieve the main notations used in this article.
efficient network access services. Based on the above analyses,
we combine the advantages of satellite and edge computing A. Network Model
in this article to design a multilayer MEC system for STIN, In this article, we consider an STIN in remote areas to solve
where the satellite network can complement the terrestrial the emergency services problem in remote areas, as shown in
Authorized licensed use limited to: Birla Institute of Technology and Science. Downloaded on February 03,2025 at 17:12:23 UTC from IEEE Xplore. Restrictions apply.
264 IEEE INTERNET OF THINGS JOURNAL, VOL. 12, NO. 1, 1 JANUARY 2025
TABLE I
L IST OF M AIN N OTATIONS
Authorized licensed use limited to: Birla Institute of Technology and Science. Downloaded on February 03,2025 at 17:12:23 UTC from IEEE Xplore. Restrictions apply.
LYU et al.: TASK OFFLOADING AND RESOURCE ALLOCATION 265
1) Onboard Task Process: We assume that the MEC servers where ds denotes the propagation delay between the user and
of the LEO satellites can be used for the simultaneous the satellite, tc and ttran = Di /ri denote the computation
processing of multiple tasks, but each task may be time and the transmission time, respectively. Consider that the
allocated different computational resources. computational resources allocated by the satellite or the cloud
2) Task Migration: Since the MEC server of each LEO for the ground user i ∈ I are fie , and the computational time
satellite has limited computing capacities, it may only be tc can be expressed as
able to complete some of the tasks within the required αi Ci
time. Due to the existence of ISL, when the computa- tc = . (4)
fie
tional capacity of the access satellite L cannot satisfy
the demand of the ground equipment, the task can be Similarly, satellite Lm is selected for task processing, where
passed to other satellites for collaborative computation. k = m. The task delay can be expressed as
Therefore, the satellite L will offload the tasks to a
nearby LEO satellite, MEO satellite, or the ground cloud Tme = ttran + tc + ds (5)
server. The relationship of computing capacities among where ds denotes the total propagation delay from the user to
the LEO satellites, the MEO satellites, and the cloud the satellite Lm . When intersatellite cooperative computation
server is given by FLEO < FMEO Fcloud , where FLE0 , tasks are required, the task transmission link involves x-hop
FMEO , and Fcloud denote the computing capabilities of ISL. Therefore, the propagation delay ds can be expressed as
the LEO satellite, the computing capability of the MEO
s
satellite, and the computing capability in the cloud, ds = ds + ζ (6)
respectively. c
In summary, there are three computational schemes for where s denotes the distance between satellites, c denotes
uploading the computational task of ground user i to LEO the propagation speed of the light, and ζ denotes the number
satellite k. Let xik denote whether the computational task of of hops of the ISL. This article considers 1-hop ISLs, i.e.,
ground user i is processed by satellite k, where xik = 1 means ζ = 1. Let rk denote the transmission rate between satellite k
that the computational task of user i is computed at satellite k; and the ground cloud. If the ground cloud is selected for task
otherwise, xik = 0. Let yim denote whether the computational processing, the task latency is expressed as
task of user i is processed by the associated LEO satellite m, e
Tcloud = ttran + tc + dc + tcloud (7)
where yi,m = 1 denoted the computational task is offloaded
to the LEO satellite m for computation; otherwise,yi,m = 0. where dc denotes the total propagation delay from the user to
Similarly, let zi ∈ 0, 1 denote whether the computational task the ground cloud, and tcloud = Di /rk denotes the transmission
of user i is processed by the terrestrial cloud server, where time between satellite k and the ground cloud.
zi = 1 denotes that the computational task ψi is executed by 2) Energy Cost: The local energy consumption computed
the cloud server; otherwise, zi = 0. Considering that each by user i to complete the task is defined as
computational task has only one offloading node selection 2
strategy Ai = {xik , yim , zi } at each time slot, the computational Eiloc = κ fil Ci (1 − αi ) (8)
strategy of ground user i satisfies the constraint as
where κ is a constant that depends on the chip architecture of
xik + yim + zi = 1, i ∈ I. (1) the user’s device. The energy consumption of ground device i
m∈L\{k} to offload the remaining tasks to the satellite is Eitran = pi titran ,
where pi denotes the transmission power of user i. Then, the
C. Cost Model total energy consumption of device i is Eiall = Eitran + Eiloc .
In the considered satellite networks, the cost for processing
the computing tasks is mainly composed of the delay cost and D. Problem Formulation
the energy cost. The completion time of the offloading part of the task
1) Delay Cost: The delays in calculating the tasks mainly depends mainly on the offloading strategy and the allocated
depend on the task’s offloading decisions. The local computa- computing resources, which can be expressed as
tional latency of user i can be expressed as
Tiall = max xik Tke + yim Tme + zi Tcloud
e
, Tiloc , i ∈ I. (9)
(1 − αi )Ci
Tiloc = ,i ∈ I (2)
fil Execution delay directly determines the time to mission
completion. Energy consumption is important because ground
where fil denotes the local computing (LC) resources allocated devices have limited battery energy. By the weighted sum
by the user i. method, the cost of ground device i is defined as a weighted
If the tasks are processed at the LEO satellites Lk , the sum of energy consumption and delay, which is expressed as
computation time of tasks in the LEO satellite needs to be
taken into account. Let ri denote the transmission rate from Costi = w1 Eiall + w2 Tiall (10)
user i to satellite k. Then, the delay is given by
where w1 , w2 ∈ [0, 1] denote the weights of device energy
Tke = ttran + tc + ds (3) consumption and task latency, respectively.
Authorized licensed use limited to: Birla Institute of Technology and Science. Downloaded on February 03,2025 at 17:12:23 UTC from IEEE Xplore. Restrictions apply.
266 IEEE INTERNET OF THINGS JOURNAL, VOL. 12, NO. 1, 1 JANUARY 2025
In this work, we propose to minimize the cost of all solved by reinforcement learning methods. The MDP’s state
ground devices while satisfying the maximum tolerable delay space, action space, and reward function are shown below.
constraint. We formulate the joint optimization problem of 1) State: At each time step t, the state st of the envi-
the task offloading selection policy A, the task offloading size ronment observed by each agent is defined as st =
policy α, and the computation resource allocation policy fie as [ψ, At−1 , α t−1 , T t−1 , Et−1 ], where ψ = {ψi , i ∈ I}.
follows: Considering that the variables in state st have different ranges,
this requires smaller training rates and careful parameter
I
P 1 : min e Costi initialization, which slows down the training process. In
αi ,Ai ,fi this article, we use the min-max normalization method to
i=1
s.t. C1 : αi ∈ [0, 1] preprocess the state st so that the network can be trained more
efficiently.
C2 : xik + yim + zi = 1
2) Action: At each time step t, the action at is a com-
m∈L\{k}
putational offloading policy made based on the observed
C3 : xik , yim , zi ∈ {0, 1} environmental state information, which mainly includes the
C4 : 0 ≤ fi,j ≤ fjmax , j ∈ L ∪ {cloud} task offloading size policy α t = {αit , i ∈ I} and the task
C5 : fi,j ≤ fjmax , j ∈ L ∪ {cloud} offloading node selection At = {Ati , i ∈ I} for task computation
j
for each user during the transmission period. Therefore, the
action at can be defined as at = [At , α t ].
C6 : all
Ti ≤ Timax (11) 3) Reward: The purpose of the reward function is to
evaluate the effect of the execution of a given action. After
where C1–C3 are the offloading constraints, C4 and C5 denote
executing action at in-state st at time step t, the state of the
the computation resource constraints of the satellite, C6 denote
environment changes to st+1 and returns the corresponding
the maximum tolerated delay constraint, and fi,j denotes the
reward rt . The reward function is not only related to the
computational resources allocated to user by the satellite or
objective function but also to the corresponding constraints.
the cloud, i.e., fie . The formulated optimization problem is
If the current action can reduce the system task overhead
a nonlinear programming problem. Each user’s optimal task
while satisfying all constraints, it receives a greater reward.
offloading strategy is not solely determined by their own
Therefore, the immediate reward rt at time step t is defined as
approach but is also significantly influenced by the strategies
employed by other users. This interdependence suggests that rt = ω1r Costi + ω2r rp (12)
task offloading strategies are interconnected and mutually
influential. In addition, the action space of the joint offloading where rp denotes the penalty for the task processing timeout,
strategy grows exponentially with the number of users and is and ω1r and ω2r denote the weight coefficients of the corre-
difficult to solve in polynomial time using traditional methods. sponding parts, respectively. As seen from the above equation,
To address the above challenges, we opt for a DRL approach in the design of the reward function, we consider the objective
to solve the formulated optimization problem in a distributed function and the conditional constraints to encourage the agent
and intelligence manner. to produce an overall satisfactory resource allocation strategy.
III. O PTIMAL C OMPUTATION O FFLOADING AND B. Solving Algorithm Based on Multiagent Reinforcement
R ESOURCE A LLOCATION S CHEME Learning
In this section, we first model the optimization problem Based on the above analysis, we propose a multiagent DRL
as a Markov decision process (MDP) to further describe framework to solve the problem in a partitioned way, i.e.,
the process of computational offloading and resource allo- the framework contains two decision-making computational
cation. Second, a computing offloading algorithm based on regions, as shown in Fig. 2. In the proposed solving frame-
the multiagent reinforcement learning is proposed for solving work, the final decision consists of two types, i.e., discrete
the optimization problem. Finally, we describe the allocation action and continuous action. To achieve joint optimization,
process of computational resources. a computational offloading algorithm based on joint double
DQN (DDQN) [33] and DDPG [34] is proposed. Based on the
characteristics of DDQN and DDPG, DDQN is used to learn
A. MDP Model the optimal offloading node selection strategy, and DDPG is
In this article, we consider that all ground users transmit used to learn the task offloading size strategy. In the multiagent
their tasks to the LEO satellite first, i.e., the LEO satellite DRL framework, each ground user device acts as an agent and
knows the computational offloading policies of all ground contains two decision regions in a single agent.
users. Ground users are agents interacting with the environ- In the proposed algorithm, the divide-and-conquer model
ment to gain optimization experience in computing offloading is employed, which reduces the coupling of the discrete-
allocation strategies. Meanwhile, consider that the offloading continuous joint strategy and improves the learning efficiency
decision of the previous time slot affects the offloading of the algorithm. The agent assigns the same observation
decision of the current time slot. Therefore, the optimization states to different decision regions and obtains the computing
problem can be formulated as an MDP, which can then be offloading node selection policy and task offloading size policy
Authorized licensed use limited to: Birla Institute of Technology and Science. Downloaded on February 03,2025 at 17:12:23 UTC from IEEE Xplore. Restrictions apply.
LYU et al.: TASK OFFLOADING AND RESOURCE ALLOCATION 267
through the decision regions in each round of the decision- generating a specific offloading policy, it interacts with the
making process, respectively. As shown in Fig. 2, while the environment, i.e., the device executes the relevant offloading
computing offload node selection strategy and the task offload policy. The environment receives decision information from
size strategy come from different decision regions, they are not all the agents to compute the optimal computational resource
independent of the overall solution process. For the two other allocation policy. The corresponding action reward information
decision regions, the state input information for each decision can be obtained according to the computational resource
step and the sample data required for algorithm training are allocation strategy. Based on the reward information from the
the same. At the same time, the state information update is environment, the agent judges the strengths and weaknesses of
performed by two different types of decisions together. This its action strategy and further improves its strategy to make a
means the coupling between computing the offloading node better offloading strategy. Through iterative interactions with
selection and the offloading task size is implicit in the state the environment, until all agents learn the optimal global pol-
information. During the training process of the algorithm, each icy, i.e., in a single agent, DDQN learns the optimal offloading
decision region abstracts the mutual fitting correspondence node selection policy, and DDPG learns the optimal offloading
from the state information. Therefore, when the agent makes a task size allocation policy. The following section focuses on
decision, each decision region can get the best action strategy the main decision-making elements in the multiagent DRL
based on the same state information, thus achieving joint framework.
optimization. 1) DDQN for Offload Node Selection: Considering that
To ensure the consistency of state information during the the DQN [35] algorithm produces an overestimation of the
training process, the algorithm adopts the paradigm of training target value, we use the DDQN algorithm instead of the DQN
from the same data source and distributed execution to achieve algorithm. In this section, we describe the decision-making
the joint optimization of two different types of strategies. process for task offloading node selection based on DDQN.
In each round of the policy generation, DDQN and DDPG Due to the discrete nature of the task offloading node selection
generate policies based on the state information obtained from policy, DDQN can be directly used to solve the task offloading
the previous round of policies, respectively. To improve the node allocation problem.
efficiency of the algorithms, the interaction experiences are DDQN is a reinforcement learning algorithm based on an
collected using experience buffering, and these experiences are evaluation model consisting mainly of a target network and
used for the training of the algorithms. When the agent finishes a target Q-network. It is worth noting that the two networks
Authorized licensed use limited to: Birla Institute of Technology and Science. Downloaded on February 03,2025 at 17:12:23 UTC from IEEE Xplore. Restrictions apply.
268 IEEE INTERNET OF THINGS JOURNAL, VOL. 12, NO. 1, 1 JANUARY 2025
have the same structure. The stability of the algorithm is Q value between the Q network and the target Q network can
improved by using the target Q network. At the beginning of be measured by using the following loss function:
each time slot t, the agent computes the action at based on the
greedy strategy and the observed state st , where the ε −greedy ϕt (θ ) = |Q st , at , θ − zt | (15)
strategy is used to balance action exploration, i.e., to balance where
the exploration of new actions with the utilization of known
actions. Therefore, action ai of agent i can be expressed as zt = rat t st , st+1 + γ ∗ Q st+1 , arg max Q st+1 , a, θ , θ̂ .
a
(16)
ai =
t an existing action, probability ε
Randomly select
arg maxa∈A Q st , at , θ , probability(1 − ε) Then, by minimizing the loss function of agent i using
(13) gradient descent, the update of the weights of the Q-network
where Q(st , at , θ ) denotes the Q-value output from the can be expressed as
Q-network given the observation state st and the action as at , θ ← θ − ζd ∇θ ϕ(θ ) (17)
and θ denotes the weight of the Q-network.
2) DDPG for Task Offload Size: Considering that the where ζd is the learning rate. The target Q-network updates
action space of task offloading size is continuous, DDPG is the weights θ̂ by periodically copying the weights θ of the
used to deal with the problem of task offloading size allocation. Q-network.
DDPG is an off-policy reinforcement learning algorithm based DDPG is based on selected experience exp. The estimated
on the actor–critic framework, combining the advantages of Q-values of the critic network and the target critic network
the deterministic policy gradient (DPG) [36] algorithm and the can be expressed as Q(sti , ati , ρ) and Q(st+1 t+1
i , π̂ (si , ξ̂ ), ρ̂),
t+1
DQN algorithm. In the DDPG algorithm, the actor network is where π̂(si , ξ̂ ) denotes the target actor network, ρ is the
used to generate strategies, and the critic network is used to weight of the critic network, ξ̂ denotes the weight of the
evaluate the rewards of state-action pairs. Agent i generates target actor network, and ρ̂ denotes the weight of the target
the task offloading ratio decision through the DDPG decision critic network. The following loss function can represent the
region after obtaining the observation state st . Specifically, difference between the critic network and the target critic
agent i uses the actor network to compute the task offloading network:
ratio αit via sti , i.e., αit = π(sti , ξ ), where π(sti , ξ ) denotes
the actor network’s strategy, and ξ denotes the weighting Lic (ρ) = (yi (t) − Q(si (t), ai (t), ρ))2 ∀i (18)
t
coefficient of the actor network. Since the DDPG learning
strategy is deterministic, random noise is introduced to balance where yi (t) = ri (t) + γ Q(si (t + 1), π̂ (si (t + 1), ξ̂ ), ρ̂).
exploration and exploitation. Therefore, the task offloading The gradient descent method is used to minimize (18), and
size allocation decision for agent i is the weight update of the critic network can be expressed as
1
ati = π st , ξ + η 0 , i ∈ I (14) ρ ← ρ − ζp1 ∇ρ Lc (ρ) (19)
where η is denoted as the noise obeys a normal distribution where ζp1 is the learning rate.
N (0, 0.2). The lower and upper bounds are used to constrain According to the DPG theorem, the actor network updates
the task offloading ratio decision considering the task offload- its weights to obtain larger cumulative discount rewards [37].
ing size constraints. Therefore, the loss function of the actor network can be
3) Algorithm Update: After obtaining the action (αit , Ati ) of expressed as
the observed state sti based on the time slot t, the immediate
Lia (ξ ) = −Q(si (t), π(si (t), ξ ), ρ), i ∈ I. (20)
reward rit and the new state st+1 i will be returned to agent i from
the environment. After that, the experience (sti , ati , rit , st+1 i ) The gradient descent method is used to minimize (20), and
gained by agent i at time slot t is stored in the experience the weight update of the actor network can be expressed as
buffer B through the experience replay policy. Next, a mini-
batch of experience Ns is randomly selected to update the ξ ← ξ − ζp2 ∇ξ La (ξ ) (21)
network weights in the decision region of the algorithm. It is
where ζp2 is the learning rate.
worth noting that the experience buffer contains experiences
For the target actor network and the target critic network,
from different time slots. Meanwhile, both decision regions
the weight update depends on the actor and critic network,
are updated using the same experience during the algorithm.
respectively, which are calculated as shown below
Specifically, in the update process of the algorithm, the experi-
ence exp = (sti , ati , rit , st+1
i ) is first selected from the experience ξ̂ ← τ ξ + (1 − τ )ξ̃
buffer, and the experience exp is used to train the neural ρ̂ ← τρ + (1 − τ )ρ̃ (22)
network in the two decision regions. Based on the states sti
and st+1
i in the chosen experience exp, the output Q-values where τ ∈ [0, 1] is the update frequency factor. Soft updating
of the Q-network and the target Q-network in the DDQN are of the network is achieved by τ controlling the proportion
denoted as Q(st , at , θ ) and Q(st , at , θ̂ ), respectively, where θ̂ of weights copied from the primary network to the target
is the weight of the target Q-network. The difference in output network.
Authorized licensed use limited to: Birla Institute of Technology and Science. Downloaded on February 03,2025 at 17:12:23 UTC from IEEE Xplore. Restrictions apply.
LYU et al.: TASK OFFLOADING AND RESOURCE ALLOCATION 269
The details of the computing offloading algorithm based Algorithm 1: Multiagent DRL-Based on Computing
on the joint DDQN and DDPG are shown in Algorithm 1. Offloading Algorithm
In each episode, each agent first obtains the current state Initialize user tasks ψ;
of the environment. Then, it inputs two decision regions to initialize the DDQN decision region’s network
obtain multidimensional actions based on the current state. coefficients, i.e., θ and θ̂ ;
A random selection process is added before executing each Initialize the network coefficients for the DDPG decision
action to increase the exploration of the policy concerning the regions, i.e., ξ , ρ, ξ̂ , and ρ̂;
environment. Specifically, for discrete action decision regions, Initialize replay memory to capacity Bi , i ∈ I;
the ε − greedy policy is employed to generate task offload for ep = 1 to Eamx :
node selection decisions. For the concatenated action decision Initialize the satellite communications scenario;
region, exploration of the environment is increased by adding for t = 1 to Tmax :
action noise to the task offload size decision. Next, the network for i to I:
environment gets the computational resource allocation policy Oberserve the initial state s0i ;
based on the multidimensional action a and executes it. It is Choose the task offloading node selection action
worth noting that instant action rewards and new environment Ai (t) based on DDQN;
states are obtained through Algorithm 2. Then, the network Choose the task offloading size ratio action αi (t)
environment updates the state st+1 and returns the reward rt based on DDPG;
simultaneously. Meanwhile, each agent collects the experience ai (t) ← {Ai (t), αi (t)};
(st , at , rt , st+1 ) and stores it in the experience replay buffer B. End for
Then, a mini-batch of samples is extracted from B to update All agents take the joint action a(t), observe the
the network parameters. Finally, the algorithm keeps iterating new state s(t + 1), and obtain the reward r(t) according
the above process until the maximum number of training is to Algorithm 2;
reached. store tuple (si (t), ai (t), ri (t), si (t + 1)) in Bi ;
for i to I:
C. Optimal Computing Resource Allocation Policy Randomly sample minibatch of Ns tuples
(si (t), ai (t), ri (t), si (t + 1)) from Bi ;
In this section, the satellite or the cloud will allocate com- Update θ according to (17), and update θ̂ by
putational resources to the terrestrial devices whose tasks are copying θ ;
offloaded. To improve computational efficiency, the optimal Update ρ according to (19), update ξ according to
computational resource allocation is considered when the (21), update ξ̂ and ρ̂ according to (22);
device offloads computational tasks to the cloud servers or the End for
MEC server on the satellite. Knowing the offloading policies End for
of the ground equipment (task offloading node selection policy End for
and task offloading ratio policy), the optimal computational
resource allocation problem can be expressed as
I Proof: It can be observed that problem P2 is convex, and
αi Ci
P 2 : min w2 the slater condition is satisfied. First, the Lagrange function of
e fi fie the convex optimization problem P2 can be expressed as
i=1
s.t. C4, C5
e αi Ci
αi Ci L fi = w2 e + λ fi − f
e max
C7.ttran + d (Ai ) + ≤ Timax (23) fi
(26)
fie i
To address the above problem, we give the optimal compu- Letting ([∂L(fie )]/[∂fie ]) = 0, one obtains
tational resource allocation scheme for satellites and the cloud, w2 αi Ci
as shown in the following theorem. fi =
e∗
. (28)
λ
Theorem 1: The optimal computational resource allocation
policy for satellite k ∈ L or the cloud concerning ground device Observation of the above equation shows that λ = 0, hence
i ∈ I with known task offloading a for all ground devices can f
j i,j − f j
max = 0. Substituting (28) into C6 can be solved
Authorized licensed use limited to: Birla Institute of Technology and Science. Downloaded on February 03,2025 at 17:12:23 UTC from IEEE Xplore. Restrictions apply.
270 IEEE INTERNET OF THINGS JOURNAL, VOL. 12, NO. 1, 1 JANUARY 2025
Authorized licensed use limited to: Birla Institute of Technology and Science. Downloaded on February 03,2025 at 17:12:23 UTC from IEEE Xplore. Restrictions apply.
LYU et al.: TASK OFFLOADING AND RESOURCE ALLOCATION 271
Fig. 4. Computation cost under different task input data sizes. Fig. 6. Computational cost of tasks under different number of users.
Authorized licensed use limited to: Birla Institute of Technology and Science. Downloaded on February 03,2025 at 17:12:23 UTC from IEEE Xplore. Restrictions apply.
272 IEEE INTERNET OF THINGS JOURNAL, VOL. 12, NO. 1, 1 JANUARY 2025
Authorized licensed use limited to: Birla Institute of Technology and Science. Downloaded on February 03,2025 at 17:12:23 UTC from IEEE Xplore. Restrictions apply.
LYU et al.: TASK OFFLOADING AND RESOURCE ALLOCATION 273
cost of the tasks with different numbers of satellites for the agents take actions involving discrete and continuous vari-
same number of tasks, where the local computational power ables. Finally, we propose an optimal computational resource
is 0.2 GHz and the number of users is 12. We compare the allocation scheme used for satellite/cloud resource allocation
single satellite network (SSN) architecture and double satellite to improve the computational efficiency of the task. Through
network (DSN) architectures, respectively, and the specific simulation experiments, we find that the multiagent reinforce-
results are shown in Fig. 8. As shown in Fig. 8, we can find ment learning-based approach achieves lower computational
that the computational expense of the task gradually decreases cost performance than the existing benchmark algorithms
as the number of satellites increases. It can be observed that (including LC, Random, MA-DQN, and DDQN algorithms).
the proposed scheme in this article still has the smallest
computational spend for the task under different network archi-
tectures. This is because more edge nodes can provide richer R EFERENCES
computational resources. Meanwhile, reasonable selection of [1] T. de Cola and I. Bisio, “QoS optimisation of eMBB services in
offloading nodes and allocation of computing resources can converged 5G-satellite networks,” IEEE Trans. Veh. Technol., vol. 69,
no. 10, pp. 12098–12110, Oct. 2020.
reduce tasks’ data transmission delay and computation energy
[2] H. Xu, S. Han, X. Li, and Z. Han, “Anomaly traffic detection
consumption, thus reducing computation cost. The proposed based on communication-efficient federated learning in space-air-ground
scheme in this article has the most minor task computation cost integration network,” IEEE Trans. Wireless Commun., vol. 22, no. 12,
compared to the other schemes. This implies that the scheme pp. 9346–9360, Dec. 2023.
[3] G. Giambene, S. Kota, and P. Pillai, “Satellite-5G integration: A network
proposed in this article can provide a better computational perspective,” IEEE Netw., vol. 32, no. 5, pp. 25–31, Oct. 2018.
offloading strategy. [4] J. Chen, Z. Chang, X. Guo, R. Li, Z. Han, and T. Hämäläinen, “Resource
Fig. 9 illustrates the impact of computational resource allo- allocation and computation offloading for multi-access edge computing
with fronthaul and backhaul constraints,” IEEE Trans. Veh. Technol.,
cation policies on task computation cost. Fig. 9(a) illustrates vol. 70, no. 8, pp. 8037–8049, Aug. 2021.
the task computation spend for each scenario under different [5] S. Sthapit, S. Lakshminarayana, L. He, G. Epiphaniou, and C. Maple,
computational resource allocation strategies. LC does not “Reinforcement learning for security-aware computation offloading
in satellite networks,” IEEE Internet Things J., vol. 9, no. 14,
consider allocating computational resources at the edge end, pp. 12351–12363, Jul. 2022.
so the LC scheme is not affected by the computational [6] J. Du et al., “MADDPG-based joint service placement and task offload-
resource allocation policy. It can be observed that the optimal ing in MEC empowered air–ground integrated networks,” IEEE Internet
Things J., vol. 11, no. 6, pp. 10600–10615, Mar. 2024.
computational resource allocation policy among the other four
[7] H. Xu, W. Huang, Y. Zhou, D. Yang, M. Li, and Z. Han, “Edge
schemes can reasonably allocate the computational resources computing resource allocation for unmanned aerial vehicle assisted
and reduce the computation cost of the tasks. Meanwhile, mobile network with blockchain applications,” IEEE Trans. Wireless
since the uniform allocation policy gives the same computing Commun., vol. 20, no. 5, pp. 3107–3121, May 2021.
[8] J. Du et al., “Resource pricing and allocation in MEC enabled blockchain
resources to each user, i.e., users under the same offload systems: An A3C deep reinforcement learning approach,” IEEE Trans.
selection policy have the same computing resources. The Netw. Sci. Eng., vol. 9, no. 1, pp. 33–44, Jan. 2022.
scheme proposed in this article optimizes the task offloading [9] T. Lyu, H. Xu, F. Liu, M. Li, L. Li, and Z. Han, “Two layer
Stackelberg game-based resource allocation in cloud-network conver-
ratio according to the computational task requirements and gence service computing,” IEEE Trans. Cogn. Commun. Netw., early
balances the task ratio between local computation and offload- access, Apr. 23, 2024, doi: 10.1109/TCCN.2024.3392809.
ing processing. As a result, the algorithm proposed in this [10] J. Zheng, Y. Cai, Y. Wu, and X. Shen, “Dynamic computation offloading
for mobile cloud computing: A stochastic game-theoretic approach,”
article achieves better performance, i.e., lower computation IEEE Trans. Mobile Comput., vol. 18, no. 4, pp. 771–786, Apr. 2019.
cost. Fig. 9(b) illustrates the impact of different computational [11] Y. Hao, M. Chen, L. Hu, M. S. Hossain, and A. Ghoneim, “Energy
resource allocation policies on the computation cost of a task efficient task caching and offloading for mobile edge computing,” IEEE
Access, vol. 6, pp. 11365–11373, 2018.
under the same offloading policy. Since the computational
[12] S. Yu, R. Langar, X. Fu, L. Wang, and Z. Han, “Computation offloading
resource allocation policy does not affect the LC scheme, we with data caching enhancement for mobile edge computing,” IEEE
do not consider the LC scheme. It can be observed that the Trans. Veh. Technol., vol. 67, no. 11, pp. 11098–11112, Nov. 2018.
optimal computational resource allocation scheme reduces the [13] A. Alsharoa and M.-S. Alouini, “Improvement of the global con-
nectivity using integrated satellite-airborne-terrestrial networks with
task spend of each scheme under the same offloading policy. resource optimization,” IEEE Trans. Wireless Commun., vol. 19, no. 8,
pp. 5088–5100, Aug. 2020.
[14] G. Cui, P. Duan, L. Xu, and W. Wang, “Latency optimization for hybrid
GEO–LEO satellite-assisted IoT networks,” IEEE Internet Things J.,
vol. 10, no. 7, pp. 6286–6297, Apr. 2023.
V. C ONCLUSION [15] B. Di, H. Zhang, L. Song, Y. Li, and G. Y. Li, “Ultra-dense LEO:
In this article, we study the computing offloading problem Integrating terrestrial-satellite networks into 5G and beyond for data
offloading,” IEEE Trans. Wireless Commun., vol. 18, no. 1, pp. 47–62,
for STIN and jointly optimize the computing offloading node Jan. 2019.
selection and the task offloading ratio to minimize the task [16] R. Deng, B. Di, S. Chen, S. Sun, and L. Song, “Ultra-dense LEO
computation cost (weighted sum of task computation energy satellite offloading for terrestrial networks: How much to pay the
satellite operator?” IEEE Trans. Wireless Commun., vol. 19, no. 10,
consumption and task computation delay). First, a hybrid pp. 6240–6254, Oct. 2020.
satellite and the cloud multilayer MEC network architecture is [17] N. Cheng et al., “Space/aerial-assisted computing offloading for IoT
proposed to provide richer computational resources for users. applications: A learning-based approach,” IEEE J. Sel. Areas Commun.,
To solve the task offloading problem for ground users, we vol. 37, no. 5, pp. 1117–1129, May 2019.
[18] T. Kim, J. Kwak, and J. P. Choi, “Satellite edge computing architecture
propose a multiagent reinforcement learning-based approach, and network slice scheduling for IoT support,” IEEE Internet Things J.,
which integrates DDQN and DDPG, being able to help vol. 9, no. 16, pp. 14938–14951, Aug. 2022.
Authorized licensed use limited to: Birla Institute of Technology and Science. Downloaded on February 03,2025 at 17:12:23 UTC from IEEE Xplore. Restrictions apply.
274 IEEE INTERNET OF THINGS JOURNAL, VOL. 12, NO. 1, 1 JANUARY 2025
[19] L. Zhang, F. Liu, and J. Sun, “Sofware-defined space-ground-vehicle [42] X. Gao, R. Liu, A. Kaushik, J. Thompson, H. Zhang, and Y. Ma,
integrated network architecture for unconscious payment,” IEEE Access, “Dynamic resource management for neighbor-based VNF placement
vol. 10, pp. 128122–128131, 2022. in decentralized satellite networks,” in Proc. 1st Int. Conf. 6G Netw.
[20] Z. Song, Y. Hao, Y. Liu, and X. Sun, “Energy-efficient multiaccess (6GNet), Paris, France, 2022, pp. 1–5.
edge computing for terrestrial-satellite Internet of Things,” IEEE Internet [43] G. Cui, X. Li, L. Xu, and W. Wang, “Latency and energy
Things J., vol. 8, no. 18, pp. 14202–14218, Sep. 2021. optimization for MEC enhanced SAT-IoT networks,” IEEE Access,
[21] X. Cao et al., “Edge-assisted multi-layer offloading optimization of LEO vol. 8, pp. 55915–55926, 2020.
satellite-terrestrial integrated networks,” IEEE J. Sel. Areas Commun., [44] H. Zhang, R. Liu, A. Kaushik, and X. Gao, “Satellite edge computing
vol. 41, no. 2, pp. 381–398, Feb. 2023. with collaborative computation offloading: An intelligent deep determin-
[22] T. Chen et al., “Learning-based computation offloading for IoRT through istic policy gradient approach,” IEEE Internet Things J., vol. 10, no. 10,
Ka/Q-band satellite–terrestrial integrated networks,” IEEE Internet pp. 9092–9107, May 2023.
Things J., vol. 9, no. 14, pp. 12056–12070, Jul. 2022. [45] G. Cui, Y. Long, L. Xu, and W. Wang, “Joint offloading and resource
[23] X. Gao, R. Liu, and A. Kaushik, “Virtual network function placement in allocation for satellite assisted vehicle-to-vehicle communication,” IEEE
satellite edge computing with a potential game approach,” IEEE Trans. Syst. J., vol. 15, no. 3, pp. 3958–3969, Sep. 2021.
Netw. Service Manag., vol. 19, no. 2, pp. 1243–1259, Jun. 2022. [46] Y. Wang, J. Zhang, X. Zhang, P. Wang, and L. Liu, “A computation
offloading strategy in satellite terrestrial networks with double edge
[24] C. Zhou et al., “Deep reinforcement learning for delay-oriented IoT task
computing,” in Proc. IEEE Int. Conf. Commun. Syst. (ICCS), Chengdu,
scheduling in SAGIN,” IEEE Trans. Wireless Commun., vol. 20, no. 2,
China, 2018, pp. 450–455.
pp. 911–925, Feb. 2021.
[47] S. Yu, X. Gong, Q. Shi, X. Wang, and X. Chen, “EC-SAGINs: Edge
[25] X. Qi, B. Zhang, Z. Qiu, and L. Zheng, “Using inter-mesh links to
computing-enhanced space–air–ground integrated networks for Internet
reduce end-to-end delay in walker delta constellations,” IEEE Commun.
of Vehicles,” IEEE Internet Things J., vol. 9, no. 8, pp. 5742–5754,
Lett., vol. 75, no. 9, pp. 3070–3074, Sep. 2021.
Apr. 2022.
[26] J. Liu, X. Zhao, P. Qin, S. Geng, and S. Meng, “Joint dynamic task [48] Q. Ye, W. Shi, K. Qu, H. He, W. Zhuang, and X. S. Shen, “Learning-
offloading and resource scheduling for WPT enabled space-air-ground based computing task offloading for autonomous driving: A load
power Internet of Things,” IEEE Trans. Netw. Sci. Eng., vol. 9, no. 2, balancing perspective,” in Proc. IEEE Int. Conf. Commun., Montreal,
pp. 660–677, Mar. 2022. QC, Canada, 2021, pp. 1–6.
[27] S. Zhang, H. Gu, K. Chi, L. Huang, K. Yu, and S. Mumtaz, “DRL-based [49] Y. Ju et al., “Joint secure offloading and resource allocation for
partial offloading for maximizing sum computation rate of wireless pow- vehicular edge computing network: A multi-agent deep reinforcement
ered mobile edge computing network,” IEEE Trans. Wireless Commun., learning approach,” IEEE Trans. Intell. Transp. Syst., vol. 24, no. 5,
vol. 21, no. 12, pp. 10934–10948, Dec. 2022. pp. 5555–5569, May 2023.
[28] X. Gao, R. Liu, A. Kaushik, and H. Zhang, “Dynamic resource allocation
for virtual network function placement in satellite edge clouds,” IEEE
Trans. Netw. Sci. Eng., vol. 9, no. 4, pp. 2252–2265, Jul. 2022.
[29] Q. Tang, Z. Fei, B. Li, and Z. Han, “Computation offloading in LEO
satellite networks with hybrid cloud and edge computing,” IEEE Internet
Things J., vol. 8, no. 11, pp. 9164–9176, Jun. 2021. Ting Lyu received the B.S. degree in software
[30] J. Zhang et al., “Energy-latency tradeoff for energy-aware offloading engineering from Jiangxi University of Science and
in mobile edge computing networks,” IEEE Internet Things J., vol. 5, Technology, Ganzhou, China, in 2017, and the M.S.
no. 4, pp. 2633–2645, Aug. 2018. degree from the School of Artificial Intelligence,
[31] X.-Q. Pham, T. Huynh-The, E.-N. Huh, and D.-S. Kim, “Partial Guangxi Minzu University, Nanning, China, in
computation offloading in parked vehicle-assisted multi-access edge 2020. He is currently pursuing the Ph.D. degree
computing: A game-theoretic approach,” IEEE Trans. Veh. Technol., with the School of Computer and Communication
vol. 71, no. 9, pp. 10220–10225, Sep. 2022. Engineering, University of Science and Technology
Beijing, Beijing, China.
[32] P. Yang, F. Lyu, W. Wu, N. Zhang, L. Yu, and X. S. Shen, “Edge
His current research interests include wire-
coordinated query configuration for low-latency and accurate video
less resource allocation and management, edge
analytics,” IEEE Trans. Ind. Informat., vol. 16, no. 7, pp. 4855–4864,
computing, game theory, and reinforcement learning.
Jul. 2020.
[33] H. Van Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning
with double Q-learning,” in Proc. AAAI, Phoenix, AZ, USA, 2016,
pp. 2094–2100.
[34] T. P. Lillicrap et al., “Continuous control with deep reinforcement
learning,” 2015, arXiv:1509.02971. Yueqiang Xu received the M.E. degree from
[35] V. Mnih et al., “Human-level control through deep reinforcement Xi’an University of Posts and Telecommunications,
learning,” Nature, vol. 518, no. 7540, pp. 529–533, Feb. 2015. Xi’an, China, in 2017, and the Ph.D. degree from
the College of Information and Communication
[36] D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, and M. Riedmiller,
Engineering, Beijing University of Posts and
“Deterministic policy gradient algorithms,” in Proc. Int. Conf. Mach.
Telecommunications, Beijing, China, 2023.
Learn., Beijing, China, 2014, pp. 1–9.
He is currently a Postdoctoral Fellow with the
[37] Y. Yuan, G. Zheng, K.-K. Wong, and K. B. Letaief, “Meta-reinforcement
University of Science and Technology Beijing,
learning based resource allocation for dynamic V2X communica-
Beijing. His current research interests include
tions,” IEEE Trans. Veh. Technol., vol. 70, no. 9, pp. 8964–8977,
future wireless networks, mobile edge computing,
Sep. 2021.
blockchain, and machine learning.
[38] J. Chen, L. Guo, J. Jia, J. Shang, and X. Wang, “Resource allocation for
IRS assisted SGF NOMA transmission: A MADRL approach,” IEEE J.
Sel. Areas Commun., vol. 40, no. 4, pp. 1302–1316, Apr. 2022.
[39] S. Zhang, G. Cui, Y. Long, and W. Wang, “Joint computing and
communication resource allocation for satellite communication networks
with edge computing,” China Commun., vol. 18, no. 7, pp. 236–252, Feifei Liu received the M.S. degree from Inner
Jul. 2021. Mongolia University of Technology, Hohhot, China.
[40] C. Qiu, H. Yao, F. R. Yu, F. Xu, and C. Zhao, “Deep Q-learning aided She is currently pursuing the Ph.D. degree with
networking, caching, and computing resources allocation in software- the School of Computer and Communication
defined satellite-terrestrial networks,” IEEE Trans. Veh. Technol., vol. 68, Engineering, University of Science and Technology
no. 6, pp. 5871–5883, Jun. 2019. Beijing, Beijing, China.
Her current research interests include artificial
[41] Z. Han, C. Xu, Z. Xiong, G. Zhao, and S. Yu, “On-demand dynamic
intelligence, optical communication, and optical
controller placement in software defined satellite-terrestrial networking,”
soliton.
IEEE Trans. Netw. Service Manag., vol. 18, no. 3, pp. 2915–2928,
Sep. 2021.
Authorized licensed use limited to: Birla Institute of Technology and Science. Downloaded on February 03,2025 at 17:12:23 UTC from IEEE Xplore. Restrictions apply.
LYU et al.: TASK OFFLOADING AND RESOURCE ALLOCATION 275
Haitao Xu (Member, IEEE) received the B.S. degree Zhu Han (Fellow, IEEE) received the B.S. degree
in communication engineering from Sun Yat-sen in electronic engineering from Tsinghua University,
University, Guangzhou, China, in 2007, the M.S. Beijing, China, in 1997, and the M.S. and Ph.D.
degree in communication system and signal pro- degrees in electrical and computer engineering from
cessing from University of Bristol, University of the University of Maryland, College Park, MD,
Bristol, Bristol, U.K., in 2009, and the Ph.D. degree USA, in 1999 and 2003, respectively.
in communication and information system from From 2000 to 2002, he was a Research and
the University of Science and Technology Beijing, Development Engineer of JDSU, Germantown, MD.
Beijing, China, in 2014. From 2003 to 2006, he was a Research Associate
He is currently a Professor with the Department of with the University of Maryland. From 2006 to
Communication Engineering, University of Science 2008, he was an Assistant Professor with Boise State
and Technology Beijing. His research interests include wireless resource allo- University, Boise, ID, USA. He is currently a John and Rebecca Moores
cation and management, wireless communications and networking, dynamic Professor with the Electrical and Computer Engineering Department and the
game and mean field game theory, big data analysis, and security. Computer Science Department, University of Houston, Houston, TX, USA.
Dr. Xu has co-edited a book titled Security in Cyberspace and co-authored His main research targets on the novel game-theory related concepts critical
over 50 technical papers. to enabling efficient and distributive use of wireless networks with limited
resources. His other research interests include wireless resource allocation and
management, wireless communications and networking, quantum computing,
data science, smart grid, security, and privacy.
Dr. Han received an NSF Career Award in 2010, the Fred W. Ellersick Prize
of the IEEE Communication Society in 2011, the EURASIP Best Paper Award
for the Journal on Advances in Signal Processing in 2015, the IEEE Leonard
G. Abraham Prize in the field of communications systems (Best Paper Award
in IEEE JSAC) in 2016, and several best paper awards in IEEE conferences.
He is also the winner of the 2021 IEEE Kiyo Tomiyasu Award (an IEEE
Technical Field Award) for outstanding early- to mid-career contributions
to technologies holding the promise of innovative applications, with the
following citation: “For Contributions to Game Theory and Distributed
Management of Autonomous Communication Networks.” He has been a 1%
highly cited researcher since 2017, according to Web of Sciences. He was an
IEEE Communications Society Distinguished Lecturer from 2015 to 2018, an
AAAS Fellow since 2019, and an ACM distinguished member since 2019.
Authorized licensed use limited to: Birla Institute of Technology and Science. Downloaded on February 03,2025 at 17:12:23 UTC from IEEE Xplore. Restrictions apply.