Deep Reinforcement Learning Mechanism For Deadline
Deep Reinforcement Learning Mechanism For Deadline
https://doi.org/10.1007/s11276-022-03135-1 (0123456789().,-volV)(0123456789().
,- volV)
ORIGINAL PAPER
Abstract
Caching the most likely to be requested content at the mobile devices in a cooperative manner can facilitate direct content
delivery without fetching content from the remote content server and thus alleviate the user-perceived latency, reduce the
burden on backhaul and minimize the duplicated content transmissions. In addition to content popularity, it is also essential
to consider the users’ dynamic behaviour for real-time applications, which can further improve communication between
user devices, leading to efficient content service time. Most previous studies consider stationary network topologies, in
which all users remain stationary during data transmission, and the user can receive desired content from the corresponding
base station. In this work, we study an essential issue: caching content by taking advantage of user mobility and the
randomness of user interaction time. We consider a realistic scenario in a cooperative caching problem with user devices
moving at various velocities. We formulate the cache placement problems as maximization of saved delay with capacity
and deadline constraints by considering the contact duration and inter-contact time among the user devices. A deep
reinforcement learning-based caching scheme is presented to solve the high dimensionality of the proposed Integer linear
programming problem. The proposed caching schemes improve the long-term reward and higher convergence rate than the
Q-learning mechanism. Extensive simulation results demonstrate that the proposed cooperative caching mechanism sig-
nificantly improves up to 23 % on hit ratio, 24 % on acceleration ratio and 25 % on offloading ratio compared with existing
mechanisms.
Keywords Device-to-Device mobile networks Cooperative cache placement Deep reinforcement learning
DDPG User mobility
1 Introduction
123
burden and enhancing user Quality of Experience (QoE) A few challenges need to be handled when it comes to
[3]. So, the edge nodes can serve a massive amount of efficient caching in D2D enabled MEN. First, the user’s
duplicate content requests and this reduces the service mobility, short contact time, and inter-contact time make it
delay and reduces the content delivery distance. Caching challenging for mobile devices to cache content proac-
content at mobile devices with the use of device-to-device tively. Second, determine eligible mobile devices for
(D2D) communications [5] can alleviate the stress on the content caching, as devices with long duration’s of com-
backhaul links by offloading network traffic to D2D links munication coverage may be enabled to cache accept-
as contrary to base station caching [6]. In addition to able content. Third, determine the number of the content
providing content near to users, caching at mobile devices segments that need to be cached at the eligible mobile
even reduces network latency [7]. Therefore, this can devices. In order to achieve real-time device selection,
support latency-critical mobile applications in a mobile some learnt patterns produced by resource availability in a
edge computing framework. dynamic mobile edge network environment must be
The effective cache utilization is reduced when the exploited. As a result, real-time content caching on mobile
individual nodes with limited storage make independent devices improves user experience while maintaining a
decisions since they may redundantly cache popular con- tolerable response latency.
tent. A practical solution is to facilitate cooperation among In this work, we aim to maximize the saved delay by
edge nodes by sharing the content and which has been an considering the capacity and deadline constraints for
important consideration in this work. Moreover, different accessing a large volume of data. We consider the content
edge nodes share their content in cooperative caching, request deadline for generality and practicality, which is
which forms more extensive cache storage and enables reasonable in latency-sensitive mobile and IoT applica-
cache diversity [8]. Since the user demands follow skewed tions. The novelty of this work lies in designing a cache
distribution, most of the users’ demands are for a few mechanism for a dynamic environment with the random-
contents [9]. Hence, caching the content proactively during ness of user mobility by considering the limited storage at
the off-peak time enhances the network performance. Due each edge node. The mobility aware cooperative content
to the enormous amount of content and limited cache caching using D2D communications is modeled as an
capacity, optimizing proactive caching strategy by utilizing Integer linear programming problem. As the number of
the user requests is crucial in obtaining the advantage of mobile devices grows, it becomes more challenging to
edge caching [10]. Most proactive caching mechanisms locate possible mobile devices in real-time using exhaus-
cache the content cooperatively at base stations (BS) to tive search algorithms. Hence, we proposed a DRL (deep
reduce fetching content from a central server [11]. Some reinforcement learning) based cooperative caching mech-
earlier works [5, 12–15] focuse on cooperatively caching anism to identify possible mobile devices to be cached. We
data at mobile users for static D2D mobile networks. In a apply deep deterministic policy gradient (DDPG) to speed
dense network, the assumption stated by previous publi- up the learning process of the proposed approach because
cations [5, 12–15] is unreasonable. This study looks at the of the slower convergence of traditional learning methods
cache placement problem in a real setting where users of induced by vast action space and higher dimensionality.
varying speeds connect to other users and BSs at irregular The contributions of this work are as follows:
intervals. Users will consistently move among users and
– Design an Integer linear programming problem for
will be able to download only portions of the requested
cooperative content caching problem: maximization of
content from the various users they encounter throughout
saved download delay subject to constraints, namely
their path. If the user cannot receive the entire content from
cache capacity and deadline of the content in D2D
the contacted user, the requested content is retrieved from a
enabled mobile edge networks.
base station, which ultimately enhances delay and
– Formulating the designed cooperative cache placement
adversely impacts QoS. Even though [16–18] assume user
problem as a Markov decision process problem to
mobility, the randomization of contact duration is not taken
maximize the cumulative reward by ensuring the
into account. Data transmission is related to contact dura-
coordination of the mobile users.
tion, according to [19]. The user travels at a fast speed if
– Design a cooperative caching scheme based on deep
the contact time is short and at a low speed if the contact
reinforcement learning mechanism, DDPG to speed up
time is long. As a result, contact time randomization pro-
the learning rate and enhance content placement at
duced by user mobility has an impact on data transmission,
mobile users to maximize the saved delay.
which in turn has an impact on content placement. Hence,
– Extensive simulations have been performed to show the
we want to build caching approach that take into account
efficacy of the proposed deep reinforcement learning
limited resources, content popularity, deadlines, contact
based cooperative caching algorithm by considering
length randomization (user speed), and user mobility.
123
acceleration ratio, hit ratio, offloading ratio and caching The increasing mobile devices’ capabilities exploit user
reward. devices as the caching nodes to bring the content near
The rest of this paper is organized as follows. The summary users. Utilizing the user devices as caching nodes reduces
of related work is discussed in Sect. 2, and the network user-perceived latency and enhances user QoS. There have
model, user mobility model, content request model and been some studies focusing on device-to-device commu-
problem formulation are presented in Sect. 3. The deep nications. Wang et al. [26] have introduced a D2D big data
reinforcement learning algorithm, DDPG is presented in framework for intelligent content dissemination and
Sect. 4. The simulation environment, and the results are offloading. Li et al. [13] have investigated a distributed
discussed in Sect. 5. Concluding remarks are provided in caching mechanism to select appropriate user nodes and
Sect. 6. allocate content at selected nodes in cellular networks
using D2D communications. The authors aim to maximize
social welfare and minimize the delay at the same time.
2 Related work Based on the user’s social relationship, the social welfare is
modeled as a many-to-one game-further, the authors show
Content caching has been studied widely in the literature. that the presented scheme performs better than existing
Li et al. [9] have presented a survey on content placement schemes and is stable. Pan et al. [5] have studied the data
and delivery mechanisms in various cellular networks. The offloading using D2D communication, assuming the
advantages of using content caching in mobile networks physical transmission requirements and cooperation among
were highlighted by Zhou et al. [20]. Practically users users to maximize the offloading gain. The users request
request a small number of frequently accessed top-ranked the desired content from trustworthy users in the D2D
content. Content pre-fetching that depends on content proximity. Further, the authors presented an iterative
popularity has been investigated in the literature [12, 21]. scheme depending on the asymptotic approximation of
Shanmugam et al. [12] have proposed a caching mecha- success probability. Prerana et al. [6] have discussed the
nism for each helper node to reduce the expected download classification, challenges and solutions of employing
delay. Authors in [12] studied a content assignment prob- device-to-device communications in a 5G environment.
lem in an uncoded scenario and showed the proposed Wu et al. [27] have presented a content sharing framework
problem is NP-hard, and presented a 2-approximation with placement and delivery stages and proposed a coop-
algorithm. Poularakis et al. [21] presented an approxima- erative caching mechanism in 5G environment. Yang et al.
tion approach for content cache placement in small cell [14] have presented a cost-aware energy-efficient data
network (SCN) using content popularity knowledge. To offloading technique to trade-off between cost, energy
minimize the load on the backhaul, the authors proposed a efficiency and delay. The data offloading problem is for-
mobility framework by modeling the random walks on the mulated as an optimal control problem and presented as an
Markov chain. Collaborative cache placement has been approximation scheme. Liu et al. [28] studied an optimal
investigated in SCN to handle the limitation of cache caching mechanism by modelling a multi-objective opti-
capacity at each node to improve the user QoS [8, 22, 23]. mization problem to maximize the hit rate and minimize
The works mentioned above consider the content popu- the ergodic delay and outage probability. Fu et al. [29]
larity known in advance. Moreover, popularity prediction have studied a caching mechanism to improve the caching
based caching strategies were also studied in [10, 24, 25]. quality of device-to-device facilitated mobile networks.
In [24], authors proposed a learning theoretic perspective Nevertheless, the above-mentioned earlier studies only
for content caching heterogeneous networks with time- consider fixed network architectures, do not consider user
varying and unknown popularity profiles. Further, the mobility, and employ machine learning mechanisms to
authors presented a cache update mechanism showing place the content.
better performance for periodic updates. Chen et al. [10] Some works in the literature consider user mobility.
have presented an echo state network to estimate the con- Wang et al. [16] have presented an efficient D2D
tent popularity and node mobility to maximize the user offloading mechanism using the predicted contact period
QoE in unmanned areal vehicle placement. However, the metric expected available duration to assess the opportu-
works mentioned above consider the content popularity nity that an object may be retrieved via a D2D link. Qiao
prediction and neglected the device caching capabilities, et al. [18] studied a cache mechanism founded on the
the dynamic user requests and environment complexities. Markov decision process to maximize the QoS in vehicular
In contrast, our work considers the reinforcement learning networks with highway network scenarios. Due to user
to handle the dynamic environment. mobility, there may be frequent interconnections with the
users who may suffer from connection delays and long
hand-offs. A decomposition-based scheme is presented to
123
dynamically solve the dynamic programming problem of LSTM and CNN. In [28], the authors proposed a DRL
allocating the available storage space to the users. Zhao based caching mechanism to minimize the outage proba-
et al. [30] have presented user mobility and interest pre- bility. Authors in [36] present a Deep Q-Network based
diction based cache placement and cache-aware replace- incentive mechanism. However, the works mentioned
ment mechanisms to maximize the utility in IoT networks. above consider the learning perspective in the D2D net-
Zhang et al. [31] have investigated the user interest request works. The authors of earlier works have not taken envi-
prediction based mobility aware D2D caching mechanism ronment complexities, user mobility, user velocity and
to maximize the opportunistic cache utility. In [32], the randomness of contact duration. This inspires us to utilise
authors presented a network slicing mechanism to effi- machine learning methodologies to solve the exciting
ciently handle heterogeneous resources in a vehicular net- problem of designing an efficient content placement
work with assured QoS. Ibrahim et al. [33] have presented mechanism in a cache-enabled D2D network while con-
a coded caching mechanism to reduce device-to-device sidering the contact duration and inter-contact time.
communication-based data dissemination load. Sun et al.
[34] have investigated a caching mechanism in a D2D
environment by considering the user mobility to minimize 3 System model and problem formulation
the network latency. In [23, 35] , the authors have designed
the content placement in a small cell environment using In this section, the network model, user mobility model,
mobility and contact duration to reduce the traffic load. content delivery model and problem formulation are pre-
Zhou et al. [36] have investigated the reverse auction-based sented in detail.
incentive-driven deep reinforcement learning scheme to
reduce the burden on backhaul links. However, the studies 3.1 Network model
mentioned above consider mobility and the authors have
not considered the learning perspective in the D2D net- Mobile edge computing improves users’ capabilities by
works. In contrast, we utilise a machine learning scheme to providing cache capacity (i.e., storage), network resources
solve the high dimensionality problem in a cache-enabled and computing near to the users. Consider a mobile edge
D2D network. network containing a macro base stations (MBS) equipped
Moreover, some recent works have exploited the ML- with a MEC server, a set U of U mobile users with limited
based techniques in wireless communications cache capacity and a content server as shown in Fig. 1.
[13, 25, 30, 36–40]. In [38], Qiu et al. have presented a Each mobile device u 2 U has a limited cache Su called
model-free DRL based online offloading scheme for local storage. The storage of each mobile device is used for
blockchain empowered MEC networks. Bajpai et al. [37] content caching. The content server acts as an origin server
have presented content caching in the device-to-device that stores all contents. A user may be in the communi-
network using deep learning-based recurrent neural net- cation range of more than one user but user can commu-
works and transformers for better performance. He et al. nicate with only one user at a particular time. The user may
[40] have presented a Deep RL (DRL) mechanism to make get the desired content from more than one mobile users
resource allocation decisions optimally. Li et al. [13] have who are in the communication range of the user. The
presented two deep reinforcement learning-based caching mobile users are communicated to each other using the
schemes to design efficient content placement and a device-to-device communication. A user directly con-
delivery scheme. Furthermore, some familiar RL approa- nected to a base station and the user may be in the com-
ches, including SARSA [30] and deep Q-network (DQN), munication range of more than one BS at any point in time.
are explored and employed in real-world communication However, any user can communicate with only one BS at a
networks. Zeng et al. [41] have studied a DRL based particular time. Mobile users are attached to the base sta-
caching mechanism in a D2D scenario to maximize the hit tions according to a cellular network protocol. The con-
ratio. Jiang et al. [42] have formulated the content caching nected base stations are accountable for serving user
in D2D networks as multi-armed bandit problem and pre- requests. Each user receives content requests from multiple
sented two Q-learning based multi agent learning mecha- users in the communication range without knowing its
nisms. Q-learning based multi agent learning mechanism popularities. The user can serve the requests in three ways:
maintains the Q-value in memory because the massive
– Local storage The mobile user checks the local storage
state-action space storage of individually BS may exhaust.
of the user. If the requested content is available on the
Li et al. [43] have presented two DRL mechanisms for
mobile device, then receive the content within the
content placement and a content delivery mechanism in
deadline. Otherwise, the content is obtained from either
IoT networks. Chakraborty et al. [44] have presented a
cache mechanism to enhance cache hit rate by utilizing the
123
neighbour user or base station based on the availability user are called the neighbours, and these neighbours met
of the content. with a particular average time as illustrated in Fig. 2b. The
– Neighbour user If the user requested content is not mobile users in contact may exchange the content based on
available at the local storage, then the content can be the contact time and inter contact time shown in Fig. 2a.
obtained from the nearby users in the communication The contact and inter contact pattern between two users in
range based on the availability of the content at the given network is considered a set of independent
neighbouring users. Poisson processes and these random variables follow an
– Base station If the requested content does not exist with exponential distribution with parameters ku;v . The time
any users in the communication range, the content between two sequential contact times of two users is
would be served from the associated base station by denoted as inter contact time. The average contact rate of
fetching the content from the central server through a
backhaul link.
It is depicted in Fig. 1 that each content is shown in various
colours, divided into multiple segments. Each segment of
the content is denoted with a number. The user with a color
demonstrates that the user is requesting the content of the
same color, and the coverage area of a user is depicted with
a circle of the same color. The user needs to acquire all the
numbered segments of the same color to get the desired
content. A user U4 is moving across the MBS starting from
U3 and requesting content f1 . Since U4 is in the coverage of
U3 , U4 gets the content segments 1 and 2 of content f1 and
moves to U5 . U4 gets segment four from U5 and segment
five from U6 . The leftover segment (3) is obtained from the (a)
MBS. U4 gets the desired content by collecting the content
segments from the different users and MBS.
123
users u and v is denoted as ku;v , determined from historical Table 1 List of Notations
information. Term Definition
Definition 1 (Contact time) The two mobile devices in the 3.5 Problem formulation
communication range of each other and can share the
content with other devices in communication range is The caching mechanism of encoded segments in a mobile
known as the contact time. user is indicated as Xvf , where xvf 2 X represents the
Definition 2 (Inter-contact time) The time between each number of coded segments f cached at user v. Because of
consecutive contact of two mobile devices is known as the user mobility, the user may contact with same user
inter-contact time. multiple times and communicate with multiple users in the
moving path. The valuable content retrieved by the user in
Definition 3 (Saved delay) The difference between the first contact with other users is denoted as
download delay from the content server and the mobile n ru;v o
device is defined as a saved delay. yv;1 v
f ¼ min xf ; Cu;v ð1Þ
Sf
Definition 4 (Deadline) The requested content needs to be
Retrieving the valuable content f by user in the
served within the given time limit, which describes the
123
123
number of devices contacted, average contact time, data the content delivery deadlines of user i for accessing the
transmission rate, and node storage capacity. Devices requested content f in time t.
encountered: Number of devices encountered in the mobile – A is the set of actions where A ¼ fa1 ; a2 ; . . .; an g. The
user path is denoted as E. Contact time: The average action represents a set of mobile devices over which the
contact time of the mobile device is denoted as T. Trans- content is cached to maximize the saved delay. The
mission rate: The amount of content transmitted in a number of mobile devices in each action is equal to the
contact time is denoted as R. The efficient content caching number of wireless channels. The actions entirely
mechanism using the DRL mechanism, DDPG, is pre- depend on the state information (i.e., each state may
S
sented in subsection 4.1. have different actions). A ¼ t2T ati ; 8si 2 S represents
the action space analogous to the state space S. The
4.1 Content caching using deep reinforcement agent selects the actions depending on the current
learning policy.
– R is the immediate reward ri obtained by performing an
It is harder to ensure maximum saved delay in Eq. (6) since action ai on the environment with state si . The reward
different users have different needs for limited resources. also contains penalties in case of sufficient resources
The proposed problem needs to gather a massive amount of not available. This work maximises the saved delay by
network state information. Due to the dynamic nature of obtaining the desired content at a low transmission
the problem and to take an optimal caching decision, we delay within the fetching deadline. Each mobile device
cannot adopt the conventional optimization methods. With replaces the cached content if the cache is full
the huge success of reinforcement learning (RL) and deep otherwise caches the content at the local storage. In
learning (DL) in the recent years motivated us to use the the cooperative environment, based on the availability
deep reinforcement learning (DRL) in wireless networks of the content, either neighbouring mobile users or the
which is considered as an effective technique to handle BS serves the users requests. The user’s local storage is
complex problems. Inspired by the benefits of DRL in denoted as the local user, the nearby users in the
resource management in wireless networks, we use a deep communication range of mobile users are called
deterministic policy gradient (DDPG) mechanism to attain neighbouring users, and the BS is associated with the
an efficient solution for Eq. (6). The agent in DRL gathers user called central server.
the necessary information about the mobile device
resources as well as the various requests of users. After 1. Suppose the content requested by user is available at
that, the agent handles the caching decisions by taking the local storage, the content can be delivered imme-
action. diately with low latency. The cost of delivering content
from local user is denoted as ci . Let us consider that the
4.1.1 Problem formulation based on DRL user i fetches l contents from its local storage in time t
t
is indicated as ri;l . Therefore, the cost of the local user
t
Implementing DRL in this mechanism aims to enhance the service is represented as ci ri;l .
system’s adaptability in a challenging (dynamic) environ- 2. Suppose some of the contents requested by the user are
ment. The caching decisions are determined based on the not served by the local storage i. Let us consider that
present state which is not depend on the previous state the content requested by user is available at neighbour
information. Hence, we can model the the proposed prob- user j, and the content is served by j to the user i. The
lem as Markov decision process (MDP). A MDP is defined cost of fetching content from j to i is denoted as ci;j . Let
as a tuple fS; A; R; P; cg. S defines the sate space, A defines l contents are fetched from j to i in time t is denoted as
t
the action space, R defines the system reward, P defines the ri;j . Therefore, the cost of neighbouring user service is
P
immediate reward and c denotes the discount factor. t
represented as j2M;j6¼i ci;j ri;j .
– Let S is the set of system state space where 3. Suppose the content requested by the user is unavail-
S ¼ fsi jsi ¼ ðNit ; Kit ; Bti ; wti ; nti Þg. In each time slot t, able at any of the users. The corresponding BS obtains
the state sti contains the set of user requests Kit , MEC i the content from the content server. Let us consider the
cache state Nit , content delivery deadline Bti , number of cost to get the content from the content server to the
devices encountered wti , average contact duration nti and user i via BS h denoted as ci;h . Let l contents are
available cache size Cit at user i. Where Kit ¼ fetched from content server h to i in time t is denoted as
t
t t t t ri;h . Therefore, the cost of content server service is
fki;1 ; ki;2 ; . . .; ki;U g; ki;u is the contents requested by t
represented as ci;h ri;h .
user u at user i in time t, Bti ¼ fbti;1 ; bti;2 ; . . .; bti;F g, bti;f is
The overall cost of the service in time t is represented as
123
X
t
ci ri;l þ t
ci;j ri;j t
þ ci;h ri;h the immediate and long-term rewards impact agent actions.
ð11Þ Hence, the cooperative content replacement problem
j2M;j6¼i
expressed to maximize the cumulative discounted reward.
The content server serves the content miss at local storage The value function V p ðSÞ : is defined as
and neighbouring users. Hence, the user replace the newly " #
fetched content with less popular content. Therefore, the X
1
t t
E c R jsð0Þ ¼ s; p ð16Þ
cost should contain the replacement cost along with the t¼0
delivery cost. Let the cost of replacing content at user i is
denoted by ai . The number of content segments replaced by where 0 c\1 is the discount faction, c decides the future
user i at time t is indicated as dtfþ ;i ¼ fi ðxtf ;i \ xt1 reward’s effectiveness to the present decision. Lower c
f ;i Þ where
values give more weight to the immediate reward. We need
xtf ;i indicates content f cached in user i in time t, xt1
f ;i
to find the optimal caching policy p follows Bellman’s
indicates the content f cached in user i at time t 1 and fi
functions
indicates the content requests at user i. Therefore, the X
replacement cost is defined as V p ðsÞ ¼ Rðs; p ðsÞÞ þ c Ps0 s V p ðs0 Þ ð17Þ
X s0 2S
ai dtfþ ;i ð12Þ
f 2F where Ps0 s is the state transition probability. Bellman’s
functions usually solved by either value or policy iteration
The total cost is represented as sum of (11) and (12). That methods. Assume that there is a list of all acceptable poli-
is cies P. The optimal policy is then determined as
X X
t
ci ri;l þ t
di;j ri;j t
þ ci;h ri;h þ ai dtfþ ;i ð13Þ p ¼ argmax V p ðSÞ
j2M;j6¼i f 2F
ð18Þ
p2P
123
its great features, excellent performance, and sufficient network and the critic network each have two neural net-
processing time, deep reinforcement learning, a combina- works, making four neural networks for the primary net-
tion of RL and deep neural networks, has been widely work and the target network. The actor-network is used to
employed in wireless communication. investigate policies, whereas the critic network is used to
We apply DRL to address the presented optimization evaluate policies. In order to enhance the policy gradient,
problem by characterizing the state, action, and reward. the critic network also provides critic values. The state-
Q-learning is indeed a classic instance of reinforcement action pairings, the associated reward, and the future state
learning. Q-learning employs a Q-table to keep track of Q are stored in the replay memory. To reduce data correlation
values for various state and action orders. Nevertheless, effects, the agent will randomly choose these samples
this might not be appropriate for cases having enormous during the training process. The proposed DDPG technique
state and action spaces. Deep Q-Network (DQN) calculates consists of two primary DNNs:
Q values using a deep neural network, accommodating
– An actor, parameterized by x, produces action in
greater dimensional states and action spaces but having a
response to a state, such as at ¼ pðst ; xÞ.
slow convergence speed. We use the deep deterministic
– Given a state, Qðst ; at ; ;Þ, a critic parameterized by ;
policy gradient (DDPG), a model-free and actor-critic
gives the Q-value of the performed action.
approach to solving the caching problem.
Deep neural networks (DNNs) approximate the policy The policy parameter x is updated using the primary actor-
and value functions in the DDPG algorithm, which is an network by interacting with the environment using the
expanded version of the actor-critic method. Compared to current state s and action a to produce reward r and the
traditional RL algorithms, DDPG can solve optimization next state s0 . The policy parameter ; and the current Q
problems with huge state and action spaces with high values (Qðst ; at ; ;Þ) are updated using the primary critic
dimensional concerns. Further, with continuous action network. DDPG further employs a target actor, pðst ; x0 Þ,
spaces, DDPG may ensure efficient decisions. The struc- and a target critic, Qðst ; at ; ;0 Þ, to enhance the network
ture of DDPG learning algorithm is shown in Fig. 3. The training stability, where x0 and ;0 are the target actor and
DDPG network consists of three essential components target critic parameters, respectively. The values of
replay memory, primary and target networks. The actor-
123
5 Performance evaluation
123
5.1 Simulation environment 2. Acceleration ratio The fraction of saved delay and
overall delay (from the controller).
We do our experiment with the following settings in order 3. Offloading Ratio The offloading ratio measures the
to assess the effectiveness of the proposed caching tech- capacity of the base station to offload data by assessing
nique. We consider a cellular network with a single cell the volume of information offloaded by D2D content
scenario where the cell consists of a macro base station delivery to the total proportion of demand data in the
(MBS) and 20 mobile users are distributed randomly across cell.
the coverage of the MBS. The mobile users (user devices) 4. Cache Reward The reward measures the cumulative
move across the network, randomly contact the other long-term reward collected from caching (i.e., Sum of
mobile users, and communicate using D2D communication the intermediate reward of all mobile devices) using
in the considered scenario. The Gamma distribution is Eq. (15).
followed for contact rate among the various devices [17].
The central library contains 600 contents with a size of 40
5.3 Reference algorithms
MB [25, 35], and individual content is encoded into five
segments [34]. In each contact, a mobile user can suc-
In this section, we compare the proposed scheme with the
cessfully transmit two encoded segments. The cache
following caching approaches: Most Popular Caching
capacity of an individually mobile user is 0.2GB, accom-
(MPC) [23, 47, 48], Random Caching (RC) [10, 49], Deep
modating at most five contents. The content popularity
Q-Network (DQN) [36] and Greedy Caching Policy (GCP)
follows Zipf distribution [12, 23]. Needed content for the
[17].
mobile users are served by other mobile users through D2D
communications before the deadline 800s, and MBS will 1. MPC: Each device caches the most popular content
serve the failed segments after the deadline. All results of fragments based on user request statistics until the
the simulation shown here are the average of 50 runs. The cache is full.
values of the simulation parameters are presented in 2. RC: A random caching scheme ensures that every
Table 2. device stores data randomly until the cache is full,
regardless of user popularity.
5.2 Performance metrics 3. GCP: [17] proposes greedy mobility aware caching
approach to maximise the D2D offloading ratio while
To compare the performance of cache replacement considering user mobility. The average number of
schemes, we consider the following metrics: successfully delivered file segments via D2D commu-
nications to the total number of file segments is known
1. Cache Hit Ratio The fraction of requests served over
as the D2D offloading ratio.
the total requests.
4. DQN: In this scheme the caching policy is determined
Table 2 Simulation Parameters based on the algorithm used in [36].
Parameters Values The first three cache replacement strategies update the
content individually based on popularity, randomly and
Simulation area 500=m 500=m
greedily, whereas the other strategies consider deep rein-
Number of users 90
forcement learning to place the contents. The fourth cache
Number of contents 600
replacement strategy (DQN) is different from the proposed
Number of base stations 15
strategy (DDPGCP) because, the former is the value-based
Content size (10, 100] MB RL, and the latter is policy-based RL.
The delay between BS and user (5,25]s
The delay between BSs 20s 5.4 Demand model
The delay between content server and BS 80s
The deadline of the content (10,30] s We use the real-world Dataset MovieLens 1M Dataset [50]
Actor and critic learning rate 0.001, 0.0005 in our simulations to investigate for requesting content. The
Network update rate 0.01 MovieLens dataset consists of 3952 movies, 1000209 user
Discount 0.9 ratings that take integer values [1 (worst), 5 (best)] and
Mini batch size 256 6040 users. Each row of the dataset consists of userid,
Replay memory capacity 105 movieid, rating and timestamp. The rating information is
Number of episodes 1500 considered the content request since the user rates a movie
Number of steps in each episode 100 followed by watching it [51]. We consider the rating
123
123
In Fig. 5b, the impact of the number of user devices on In Fig. 6a, the impact of cache size on the cache hit rate
the acceleration ratio is presented. The acceleration ratio is shown. The cache hit rate displays an upward trend with
displays an upward trend with the growing number of the expansion of the cache size. The rationale for this is
users. This is because the increasing number of users that mobile users will be able to cache more contents with
cumulatively provide more cache storage and enough space higher storage sizes, which will be successfully shared via
to cache more appropriate content at the user devices leads D2D connections, leading to faster content delivery. We
to serving the content near to users. We can notice that the can notice that the proposed mechanism outperforms other
performance gap between the DRL based schemes, namely caching schemes by achieving a higher cache hit ratio. The
DDPGCP and DQN, increases with the increasing user DDPGCP and DQN mechanism shows superiority over the
devices. The growth of the curves became low with the other caching schemes due to the mobility information.
higher number of users since the popular content is already The proposed DDPGCP mechanism provide improvement
cached at the nearby user devices. Most importantly, the of up to 24.3, 20.7, 14 and 3.9 % on hit ratio compared with
proposed mechanism achieves a higher acceleration ratio RC, MPC, GCP and DQN, respectively.
than other caching schemes due to the utilization of In Fig. 6b, the impact of cache size on the acceleration
mobility information. The proposed DDPGCP mechanism ratio is presented. The acceleration ratio evolves higher
provide improvement of up to 17, 17, 13 and 5.3 % on with the larger cache sizes. With the increase in the cache
acceleration ratio compared with RC, MPC, GCP and capacity, all the mechanisms increase quickly compared to
DQN, respectively. lower cache sizes. This is because of getting more storage
In Fig. 5c, the impact of the number of mobile devices to accommodate more popular content at each user device.
on offloading ratio is presented. The proposed caching The proposed DDPGCP outperforms other existing
scheme outperforms other caching methods because it schemes with significant improvement in the acceleration
exploits the user mobility information. With the increasing ratio. By utilizing the mobility information, DQN and
number of users, the performance gap between the pro- DDPGCP gain the acceleration ratio than other schemes.
posed mechanism and other caching mechanisms is wider The proposed DDPGCP mechanism provide improvement
than DQN. This shows that more users make more room to of up to 24.8, 14.6, 11 and 4 % on acceleration ratio
cache appropriate content by utilizing the mobility infor- compared with RC, MPC, GCP and DQN, respectively.
mation. The proposed DDPGCP mechanism provide In Fig. 6c, the impact of the cache size on the offloading
improvement of up to 27, 18.6, 11.4 and 3.2 % on D2D ratio is presented. As expected, the D2D offloading rate of
offloading ratio compared with RC, MPC, GCP and DQN, all caching mechanisms increases as cache size increases.
respectively. The DRL based caching schemes, namely DDPGCP and
DQN, produce a greater offloading rate than existing cache
5.6 Impact of cache size of user devices approaches. The reason is that exploiting device contact
information of the mobility-based cache approaches pro-
The impact of cache size on hit ratio, D2D offload ratio, vides an advantage over the non-mobility based caching
and acceleration ratio is presented in this subsection schemes. The proposed mechanism has a significant
(Fig. 6). The number of users in this scenario is 20, the advantage with the limited cache size since it exploits the
number of contents is 600, the deadline considered is 800s, user contact information and transmission rate. The pro-
the Zipf parameter is 0.6, and the cache capacity ranges posed DDPGCP mechanism provide improvement of up to
from 0.1 to 0.3GB with step size 0.05GB. 24.4, 21, 13 and 2.5 % on D2D offloading ratio compared
with RC, MPC, GCP and DQN, respectively.
123
123
123
6 Conclusion
(a) In this paper, the cooperative cache placement problem has
been analyzed in device-to-device mobile edge networks
by placing the content to maximize the saved delay with
deadline and capacity constraints. The saving latency has
been calculated analytically using the inter-contact move-
ment pattern. We formulate the problem as an Integer
linear programming problem for cooperative cache place-
ment. Since the proposed problem is NP-hard, we designed
a caching scheme for large-sized D2D enabled mobile edge
networks by integrating a deep reinforcement learning
based deep deterministic policy gradient mechanism to
enhance the long term reward and speed up the learning
(b) process. It has been demonstrated that exploiting user
Fig. 12 Comparison of the proposed and existing schemes using
cooperation, content deadlines, and randomness of user
training episode vs a Cache hit ratio b Acceleration ratio interaction information results in a considerable perfor-
mance gain in D2D enabled mobile edge networks. The
learning-based algorithms curves indicate an upward trend simulation results show that the proposed cooperative
and stabilize after that. The GCP has a higher hit ratio than cache placement improves up to 23, 24 and 25 per cent on
DQN and DDPGCP initially, but as the episodes increase, cache hit rate, acceleration ratio and offload ratio compared
it slowly diminishes. That is because the GCP is a greedy with RC, MPC, GCP and DQN, respectively. For future
cooperative cache replacement mechanism where each user work, we investigate the optimized user association and
greedily caches the content based on local information, not D2D link quality to improve the user quality of experience.
considering the other agents’ knowledge in caching deci-
sions. Therefore, each agent may cache content redun-
Data availibility The MovieLens 1M dataset was used to support this
dantly leads to obtain more content from the content server. study and its application details are available in https://grou-
In DQN, the agents cache the content based on the central plens.org/datasets/movielens/1m/. The data set is cited at relevant
controller, which cooperates with communication places within the text as references.
123
Code availability The program of this paper is supported by custom 15. Dai, X., Xiao, Z., Jiang, H., Alazab, M., Lui, J., Dustar, S., & Liu,
code. It can be applied from the corresponding author on reasonable J. (2022). Task co-offloading for d2d-assisted mobile edge
request. computing in industrial internet of things. IEEE Transactions on
Industrial Informatics
16. Wang, Z., Shah-Mansouri, H., & Wong, V. W. (2016). How to
Declarations download more data from neighbors? a metric for d2d data
offloading opportunity. IEEE Transactions on Mobile Computing,
Conflict of interest The authors declare that they do not have any 16(6), 1658–1675.
known competing interest. 17. Wang, R., Zhang, J., Song, S., & Letaief, K. B. (2017). Mobility-
aware caching in d2d networks. IEEE Transactions on Wireless
Ethical statement The work submitted by the authors is his own work Communications, 16(8), 5001–5015.
and it is neither published nor considered for publication elsewhere. 18. Qiao, J., He, Y., & Shen, X. S. (2016). Proactive caching for
mobile video streaming in millimeter wave 5g networks. IEEE
Transactions on Wireless Communications, 15(10), 7187–7198.
19. Lu, Z., Sun, X., & La Porta, T. (2016). Cooperative data
References offloading in opportunistic mobile networks. In IEEE INFOCOM
2016-The 35th Annual IEEE International Conference on Com-
1. Wang, X., Han, Y., Wang, C., Zhao, Q., Chen, X., & Chen, M. puter Communications, IEEE, pp 1–9
(2019). In-edge ai: Intelligentizing mobile edge computing, 20. Zhou, H., Wang, H., Li, X., & Leung, V. C. (2018). A survey on
caching and communication by federated learning. IEEE Net- mobile data offloading technologies. IEEE Access, 6, 5101–5111.
work, 33(5), 156–165. 21. Poularakis, K., Iosifidis, G., & Tassiulas, L. (2014). Approxi-
2. Inc, C. S. (2019). Cisco visual networking index: Global mobile mation algorithms for mobile data caching in small cell networks.
data traffic forecast update. Update, 2017, 2022. IEEE Transactions on Communications, 62(10), 3665–3677.
3. Yao, J., Han, T., & Ansari, N. (2019). On mobile edge caching. 22. Baştuğ, E., Kountouris, M., Bennis, M., Debbah, M. (2016). On
IEEE Communications Surveys & Tutorials, 21(3), 2525–2553. the delay of geographical caching methods in two-tiered hetero-
4. Qiu, L., & Cao, G. (2019). Popularity-aware caching increases geneous networks. In 2016 IEEE 17th International Workshop on
the capacity of wireless networks. IEEE Transactions on Mobile Signal Processing Advances in Wireless Communications
Computing, 19(1), 173–187. (SPAWC), IEEE, pp 1–5
5. Pan, Y., Pan, C., Yang, Z., Chen, M., & Wang, J. (2019). A 23. Somesula, M. K., Rout, R. R., & Somayajulu, D. (2021). Contact
caching strategy towards maximal d2d assisted offloading gain. duration-aware cooperative cache placement using genetic algo-
IEEE Transactions on Mobile Computing, 19(11), 2489–2504. rithm for mobile edge networks. Computer Networks, 193,
6. Prerna, D., Tekchandani, R., & Kumar, N. (2020). Device-to- 108062.
device content caching techniques in 5g: A taxonomy, solutions, 24. Bharath, B., Nagananda, K. G., Gündüz, D., & Poor, H. V.
and challenges. Computer Communications, 153, 48–84. (2018). Caching with time-varying popularity profiles: A learn-
7. Yu, S., Dab, B., Movahedi, Z., Langar, R., & Wang, L. (2019). A ing-theoretic perspective. IEEE Transactions on Communica-
socially-aware hybrid computation offloading framework for tions, 66(9), 3837–3847.
multi-access edge computing. IEEE Transactions on Mobile 25. Somesula, M. K., Rout, R. R., & Somayajulu, D. (2021). Dead-
Computing, 19(6), 1247–1259. line-aware caching using echo state network integrated fuzzy
8. Tran, T. X., Le, D. V., Yue, G., & Pompili, D. (2018). Cooper- logic for mobile edge networks. Wireless Networks, 27(4),
ative hierarchical caching and request scheduling in a cloud radio 2409–2429.
access network. IEEE Transactions on Mobile Computing, 26. Wang, X., Zhang, Y., Leung, V. C., Guizani, N., & Jiang, T.
17(12), 2729–2743. (2018). D2d big data: Content deliveries over wireless device-to-
9. Li, L., Zhao, G., & Blum, R. S. (2018). A survey of caching device sharing in large-scale mobile networks. IEEE Wireless
techniques in cellular networks: Research issues and challenges Communications, 25(1), 32–38.
in content placement and delivery strategies. IEEE Communica- 27. Wu, D., Zhou, L., Cai, Y., & Qian, Y. (2018). Collaborative
tions Surveys & Tutorials, 20(3), 1710–1732. caching and matching for d2d content sharing. IEEE wireless
10. Chen, M., Saad, W., Yin, C., & Debbah, M. (2017). Echo state communications, 25(3), 43–49.
networks for proactive caching in cloud-based radio access net- 28. Liu, Z., Song, H., & Pan, D. (2020). Distributed video content
works with mobile users. IEEE Transactions on Wireless Com- caching policy with deep learning approaches for d2d commu-
munications, 16(6), 3520–3535. nication. IEEE Transactions on Vehicular Technology, 69(12),
11. Zhu, H., Cao, Y., Wang, W., Jiang, T., & Jin, S. (2018). Deep 15644–15655.
reinforcement learning for mobile edge caching: Review, new 29. Fu, Y., Salaün, L., Yang, X., Wen, W., & Quek, T. Q. (2021).
features, and open issues. IEEE Network, 32(6), 50–57. Caching efficiency maximization for device-to-device commu-
12. Shanmugam, K., Golrezaei, N., Dimakis, A. G., Molisch, A. F., & nication networks: A recommend to cache approach. IEEE
Caire, G. (2013). Femtocaching: Wireless content delivery Transactions on Wireless Communications, 20(10), 6580–6594.
through distributed caching helpers. IEEE Transactions on 30. Zhao, D., Wang, H., Shao, K., & Zhu, Y. (2016) Deep rein-
Information Theory, 59(12), 8402–8413. forcement learning with experience replay based on sarsa. In
13. Li, J., Liu, M., Lu, J., Shu, F., Zhang, Y., Bayat, S., & Jayakody, 2016 IEEE Symposium Series on Computational Intelligence
D. N. K. (2019). On social-aware content caching for d2d-en- (SSCI), IEEE, pp 1–6
abled cellular networks with matching theory. IEEE Internet of 31. Zhang, W., Wu, D., Yang, W., & Cai, Y. (2019). Caching on the
Things Journal, 6(1), 297–310. move: A user interest-driven caching strategy for d2d content
14. Yang, C., & Stoleru, R. (2020). Ceo: cost-aware energy efficient sharing. IEEE Transactions on Vehicular Technology, 68(3),
mobile data offloading via opportunistic communication. In 2020 2958–2971.
International Conference on Computing (pp. 548–554). IEEE: 32. Zhang, S., Quan, W., Li, J., Shi, W., Yang, P., & Shen, X. (2018).
Networking and Communications (ICNC). Air-ground integrated vehicular network slicing with content
123
pushing and caching. IEEE Journal on Selected Areas in Com- 50. Harper, F. M., & Konstan, J. A. (2015). The movielens datasets:
munications, 36(9), 2114–2127. History and context. ACM Transactions on Interactive Intelligent
33. Ibrahim, A. M., Zewail, A. A., & Yener, A. (2020). Device-to- Systems, 5(4), 1–19. https://doi.org/10.1145/2827872.
device coded-caching with distinct cache sizes. IEEE Transac- 51. Garg, N., Sellathurai, M., Bhatia, V., Bharath, B., & Ratnarajah,
tions on Communications, 68(5), 2748–2762. T. (2019). Online content popularity prediction and learning in
34. Sun, R., Wang, Y., Lyu, L., Cheng, N., Zhang, S., Yang, T., & wireless edge caching. IEEE Transactions on Communications,
Shen, X. (2020). Delay-oriented caching strategies in d2d mobile 68(2), 1087–1100.
networks. IEEE Transactions on Vehicular Technology, 69(8),
8529–8541. Publisher’s Note Springer Nature remains neutral with regard to
35. Poularakis, K., & Tassiulas, L. (2016). Code, cache and deliver jurisdictional claims in published maps and institutional affiliations.
on the move: A novel caching paradigm in hyper-dense small-cell
networks. IEEE Transactions on Mobile Computing, 16(3),
Springer Nature or its licensor holds exclusive rights to this article
675–687.
under a publishing agreement with the author(s) or other rightsh-
36. Zhou, H., Wu, T., Zhang, H., & Wu, J. (2021). Incentive-driven
older(s); author self-archiving of the accepted manuscript version of
deep reinforcement learning for content caching and d2d
this article is solely governed by the terms of such publishing
offloading. IEEE Journal on Selected Areas in Communications,
agreement and applicable law.
39(8), 2445–2460.
Manoj Kumar Somesula Manoj
37. Bajpai, R., Chakraborty, S., Gupta, N. (2022). Adapting deep
Kumar Somesula received his
learning for content caching frameworks in device-to-device
PhD from National Institute of
environments. IEEE Open Journal of the Communications
Technology Warangal, India.
Society
He completed M. Tech. degree
38. Qiu, X., Liu, L., Chen, W., Hong, Z., & Zheng, Z. (2019). Online
in Computer Science from
deep reinforcement learning for computation offloading in
School of Information Technol-
blockchain-empowered mobile edge computing. IEEE Transac-
ogy, JNTU, Hyderabad, India
tions on Vehicular Technology, 68(8), 8050–8062.
and B. Tech. degree in Compu-
39. Somesula, M. K., Rout, R. R., & Somayajulu, D. V. (2022).
ter Science and Engineering
Cooperative cache update using multi-agent recurrent deep
from Mahaveer Institute of Sci-
reinforcement learning for mobile edge networks. Computer
ence and Technology, JNTU,
Networks, 209, 108876.
Hyderabad, India. He is cur-
40. He, Y., Liang, C., Yu, F.R., Leung, V.C. (2018). Integrated
rently working as an Assistant
computing, caching, and communication for trust-based social
Professor in the Department of
networks: A big data drl approach. In 2018 IEEE Global Com-
Computer Science and Engineering at BVRIT Hyderabad College of
munications Conference (GLOBECOM), IEEE, pp 1–6
Engineering for Women, India. He has published several research
41. Zeng, S., Ren, Y., Wang, Y., Zhao, T., Qian, Z. (2019). Caching
papers on mobile edge networks and D2D networks in reputed
strategy based on deep q-learning in device-to-device scenario. In
international journals. His primary research area includes Edge
2019 12th International Symposium on Computational Intelli-
computing, Wireless networks, Internet of Things and Unmanned
gence and Design (ISCID), IEEE, vol 1, pp 175–179
aerial vehicles. He is a member of IEEE, IEEE ComSoc, IEEE
42. Jiang, W., Feng, G., Qin, S., Yum, T. S. P., & Cao, G. (2019).
Computer Society and ACM.
Multi-agent reinforcement learning for efficient content caching
in mobile d2d networks. IEEE Transactions on Wireless Com-
Sai Krishna Mothku Sai Krishna
munications, 18(3), 1610–1622.
Mothku received the B. Tech
43. Li, L., Xu, Y., Yin, J., Liang, W., Li, X., Chen, W., & Han, Z.
degree in Computer Science and
(2019). Deep reinforcement learning approaches for content
Engineering in 2009 from
caching in cache-enabled d2d networks. IEEE Internet of Things
Kakatiya Institute of Technol-
Journal, 7(1), 544–557.
ogy and Science (KITS), War-
44. Chakraborty, S., Bajpai, R., & Gupta, N. (2021). R2-d2d: A novel
angal, Telangana, India and M.
deep learning based content-caching framework for d2d net-
Tech degree in Computer Sci-
works. In 2021 IEEE 93rd Vehicular Technology Conference
ence and Engineering - Infor-
(VTC2021-Spring), IEEE, pp 1–5
mation Security in 2012 from
45. Jiang, W., Feng, G., & Qin, S. (2017). Optimal cooperative
the National Institute of Tech-
content caching and delivery policy for heterogeneous cellular
nology, Surathkal, Karnataka,
networks. IEEE Transactions on Mobile Computing, 16(5),
India. He received PhD degree
1382–1393.
from the National Institute of
46. Arulkumaran, K., Deisenroth, M. P., Brundage, M., & Bharath,
Technology, Warangal, Tel-
A. A. (2017). Deep reinforcement learning: A brief survey. IEEE
angana, India in 2019. Currently, he is working as an Assistant Pro-
Signal Processing Magazine, 34(6), 26–38.
fessor in the Department of Computer Science and Engineering,
47. Ahlehagh, H., & Dey, S. (2014). Video-aware scheduling and
National Institute of Technology, Tiruchirappalli, India. His primary
caching in the radio access network. IEEE/ACM Transactions on
research includes Wireless adhoc networks, Wireless sensor net-
Networking (TON), 22(5), 1444–1462.
works, Internet of things, Cloud, Fog and Edge computing.
48. Peng, X., Shen, J.C., Zhang, J., & Letaief, K.B. (2015). Backhaul-
aware caching placement for wireless networks. arXiv preprint
arXiv:1509.00558
49. Blaszczyszyn, B., & Giovanidis, A. (2015). Optimal geographic
caching in cellular networks. In 2015 IEEE International Con-
ference on Communications (ICC), IEEE, pp 3358–3363
123
123
1. use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
2. use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
3. falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
4. use bots or other automated methods to access the content or redirect messages
5. override any security feature or exclusionary protocol; or
6. share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at