0% found this document useful (0 votes)
15 views12 pages

IET Intelligent Trans Sys - 2024 - Erüst - Spatio Temporal Dynamic Navigation For Electric Vehicle Charging Using Deep

This research paper presents a novel approach to electric vehicle charging navigation using deep reinforcement learning to optimize the selection of charging stations and paths in real-time dynamic environments. The proposed method integrates a Gaussian process regression algorithm to estimate waiting times and employs a phasic policy gradient algorithm for decision-making, achieving a 9% improvement in rewards and a 7-10% reduction in total time compared to existing methods. The study demonstrates the effectiveness of the approach through case studies on a benchmark transportation network, highlighting its potential for enhancing electric vehicle charging efficiency.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views12 pages

IET Intelligent Trans Sys - 2024 - Erüst - Spatio Temporal Dynamic Navigation For Electric Vehicle Charging Using Deep

This research paper presents a novel approach to electric vehicle charging navigation using deep reinforcement learning to optimize the selection of charging stations and paths in real-time dynamic environments. The proposed method integrates a Gaussian process regression algorithm to estimate waiting times and employs a phasic policy gradient algorithm for decision-making, achieving a 9% improvement in rewards and a 7-10% reduction in total time compared to existing methods. The study demonstrates the effectiveness of the approach through case studies on a benchmark transportation network, highlighting its potential for enhancing electric vehicle charging efficiency.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Received: 8 April 2024 Revised: 22 August 2024 Accepted: 20 October 2024 IET Intelligent Transport Systems

DOI: 10.1049/itr2.12588

ORIGINAL RESEARCH

Spatio-temporal dynamic navigation for electric vehicle charging


using deep reinforcement learning

Ali Can Erüst1 Fatma Yıldız Taşcıkaraoğlu1,2

1
Departmant of Electrical and Electronics Abstract
Engineering, Mugla Sitki Kocman University, Mugla,
This paper considers the real-time spatio-temporal electric vehicle charging navigation
Türkiye
2
problem in a dynamic environment by utilizing a shortest path-based reinforcement learn-
Department of Mechanical and Aerospace
Engineering, University of California San Diego, La
ing approach. In a data sharing system including transportation network, an electric vehicle
Jolla, California, USA (EV) and EV charging stations (EVCSs), it is aimed to determine the most convenient
EVCS and the optimal path for reducing the travel, charging and waiting costs. To estimate
Correspondence the waiting times at EVCSs, Gaussian process regression algorithm is integrated using a
Fatma Yıldız Taşcıkaraoğlu, Departmant of
real-time dataset comprising of state-of-charge and arrival-departure times of EVs. The
Electrical and Electronics Engineering, Mugla Sitki
Kocman University, 48000, Mugla, Türkiye. optimization problem is modelled as a Markov decision process with unknown transition
Email: [email protected] probability to overcome the uncertainties arising from time-varying variables. A recently
proposed on-policy actor–critic method, phasic policy gradient (PPG) which extends the
Funding information proximal policy optimization algorithm with an auxiliary optimization phase to improve
University of California, San Diego; Turkiye Bilimsel
ve Teknolojik Araştırma Kurumu under 2219 training by distilling features from the critic to the actor network, is used to make EVCS
International Fellowship Program, Grant/Award decisions on the network where EV travels through the optimal path from origin node
Number: 1059B192301247 to EVCS by considering dynamic traffic conditions, unit value of EV owner and time-of-
use charging price. Three case studies are carried out for 24 nodes Sioux-Falls benchmark
network. It is shown that phasic policy gradient achieves an average of 9% better reward
compared to proximal policy optimization and the total time decreases by 7–10% when
EV owner cost is considered.

1 INTRODUCTION In the literature, various studies have been revealed to analyse


EV charging navigation (EVCN) problem from various dif-
Nowadays, the utilization of electric vehicles (EVs) has a vital ferent perspectives. In [5], a weighted planning technique was
role to reduce the carbon emission, energy consumption and proposed, which considers the EV user requests, traffic flow,
environmental pollution. In this direction, policymakers take the number of EVs charging at the station and the charging
regulatory measures for the time horizons of 10 to 20 years, load. It was proved through case studies that this strategy can
such as banning internal combustion engine vehicles and pro- reduce the charging queue while also improving the EVCS oper-
moting zero-emission EVs [1]. However, the growing number ational efficiency. Besides, Zhong et al. [6] proposed a study to
of EVs on the road introduces new challenges to EV charg- minimize EV charging time and dynamic charging cost while
ing processes due to charging time and the change of energy considering the status of the network, traffic condition and
consumption profile at a large scale [2]. Moreover, the irregular power grid. In [7], a similar approach was developed by using
charging activities of EV owners without centralized commu- the trip-chain and grid world path selection model to indicate
nication may lead to many drawbacks in traffic and electricity the effects of EVs on the grid. To demonstrate the effec-
infrastructure. Hence, intelligent transportation systems (ITS) tiveness of the simulation method for varying travel and load
are being established to obtain real time data such as traffic flow, demand, the model was tested for both workday and holiday
EV charging costs and waiting time of the EV at EV charging scenarios. Another approach in [8] analysed EV owner time
stations (EVCSs) [3, 4]. utility for EVCN problem and Dijkstra based path planning is

This is an open access article under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs License, which permits use and distribution in any medium, provided the
original work is properly cited, the use is non-commercial and no modifications or adaptations are made.
© 2024 The Author(s). IET Intelligent Transport Systems published by John Wiley & Sons Ltd on behalf of The Institution of Engineering and Technology.

2520 wileyonlinelibrary.com/iet-its IET Intell. Transp. Syst. 2024;18:2520–2531.


17519578, 2024, 12, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/itr2.12588 by Test, Wiley Online Library on [05/02/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
ERÜST and TAŞCIKARAOĞLU 2521

used to obtain optimal path for EV. In the study, the authors that DQN algorithm is able to tolerate against dynamic uncer-
stated that different EV owner time utilities have an effect tainties. Another approach is presented by using double DQN
on the results. Similar methodology is used in [9] for multiple (DDQN) to minimize the energy consumption of EVs in the
EVs on the transportation network to find the energy efficient transportation system [19]. Besides, Wan et al. [20] proposed a
optimal paths. model-free DRL to reduce charging cost of EVs at the EVCS.
Furthermore, main efforts of the studies in [10–12] have The problem was modelled by using Markov decision process
been paid on EVCN while considering the transportation net- (MDP) with unknown state-action probabilities. The effective-
work conditions. Tan et al. [10] presented a hierarchical game ness of the designed DRL model was ensured with different test
approach for EV to charging station navigation by consider- cases. Similarly, Jiang et al. [21] used a DRL method to charging
ing both transportation and power system. Consequently, they station selection for EVs by considering distance, travel time,
observed that the proposed method enhances the robustness of charging duration. The MDP formulation was used for EVCN
the power network and the economic profit of the EVCSs. In problem by considering traffic flow and dynamic user requests.
[11] an EV en-route charging navigation was proposed based Similarly, the authors of [22] presented a multi-agent spatio-
on deterministic and stochastic traffic flows. In the study a temporal RL framework to provide intelligent EV charging for
dynamic programming approach was used to derive the charg- public EVCSs. With the goal of minimizing the waiting time,
ing navigation based on the time-dependent electricity price. As charging price and charging failure rate, the problem was formu-
a result, the travel cost of the EV was minimized during the lated as a multi-objective task. With the similar objectives, Xing
long trips. Similarly, Cerna et al. [12] conducted a study on the et al. [23] presented a graph DRL method for EVCN by con-
charging navigation of a fleet of EVs. The authors modelled sidering shortest route feature extraction. Authors designed a
delay uncertainties and randomness on the transportation sys- modified rainbow algorithm similar to DQN algorithm to mini-
tem based on predefined probabilities of traffic signal, public mize the charging cost, road cost, waiting time and driving time.
work and school areas. The model is solved using mixed integer Effectiveness of the algorithm was tested with various test cases
linear programming (MILP) solvers. Finally, the results indicate and also failure cases. Similarly, in [24] a DRL approach is used
that the model is able to optimize the maintenance cost and to minimize the travel cost and charging cost of an EV. In this
navigation scheduling. study, a shortest path based two level optimization is used to
While the EVCN methodologies mentioned above yield obtain features of the EVCN problem. Then, an actor-critic
convincing decision-making results, the traditional mathemati- method is considered to make path decision of the EV. Results
cal optimization algorithms rely on deterministic environment showed that the shortest path-based DRL approach is reduced
and states. Furthermore, the primary drawbacks of mathe- the total cost of the EV.
matically modelled optimization methods include their low Besides the aforementioned DRL EVCN methodologies,
computational power and a lack of robustness against random- various considerations must be addressed for a realistic EVCN
ness. Therefore, these methods are not applicable for EVCN approach. DRL-based EVCN studies given in [18, 19, 23,
problems in large scale real time dynamic networks. 24] prioritize reducing the number of complex dynamic states
In the recent literature, researchers’ interest has shifted including time information and mathematical model of the
towards deep reinforcement learning (DRL) to solve decision- transportation network. Therefore, modelling the behaviour of
making problems for EVCN to overcome the randomness EV users fail due to time independent randomized states such
and dynamic complexity in the large scale networks. Further- as electricity prices and EVCS waiting times. Furthermore, most
more, to deal with the real-time variations and uncertainties, of the above mentioned EVCN strategies avoid utilizing the
such as EV owner priorities, transportation system instabili- transportation network features such as road demand, traffic
ties and EVCS conditions, DRL provides significant advan- capacity and flow. Consequently, the objective of EVCN prob-
tages, as this method requires no prior knowledge about the lem is only partially analysed. Finally, the proposed model needs
environment [13–15]. to be robust against dynamic environment states, such as unex-
In this context, several studies have been carried out to pected traffic flow on the transportation and waiting times at
optimize the sequential decision-making problems for EVCN the EVCS.
strategies in a dynamic environment. In [16], a DRL method
was proposed to design retail prices in terms of an EV aggre-
gator by taking into account the discrete nature of the EV 1.1 Contribution and paper organization
charging and discharging levels. With a similar objective, an
EV charging/discharging scheduling problem was defined in This study introduces a novel EVCN strategy with a multi
[17] in terms of EV users. Qian et al. [18] presented a DRL- objective cost function to minimize driving time, EVCS waiting
based EV charging navigation with the aim of minimizing the time and charging cost of an EV. Presented method uti-
total travel time and charging cost at an EVCS. In this study, lizes DRL method for EVCS selection and Dijkstra algorithm
the MILP-based feature extraction model was obtained and taking traffic conditions into account to extract the optimal
used with the deep Q-Learning (DQN) algorithm. The authors path, and a data driven method, GPR, to predict the EVCS
observed that the traditional optimization and DQN algorithm queue times. The main contributions of this study are as
results are close to each other. In addition, the results stated follows:
17519578, 2024, 12, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/itr2.12588 by Test, Wiley Online Library on [05/02/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
2522 ERÜST and TAŞCIKARAOĞLU

∙ The deterministic EVCN problems presented in [16] and


[22] (mixed-integer problem) are reformulated, taking into
account traffic demand, flow, and capacity variables to
achieve a more realistic transportation network.
∙ The Gaussian process regression (GPR) algorithm is intro-
duced to estimate waiting time at the EVCS using real-time
data.
∙ A novel shortest path based phasic policy gradient (PPG)
algorithm is developed to determine the optimal EVCS selec-
tion policy without requiring any prior information. Dijkstra
algorithm solves the optimal shortest path problem under
dynamic traffic conditions, and this is further evaluated in the
decision-making phase of the PPG algorithm.
∙ The performance of the proposed model is analysed through
various test scenarios, incorporating EV owner priorities
and random traffic flow patterns using the Sioux-Falls FIGURE 1 Smart communication network.
transportation network [25].
∙ The effectiveness of the proposed shortest path-based PPG
model is compared with shortest path-based proximal policy 2.2 Mathematical formulation of EVCN
optimization (PPO) algorithm [26] in terms of reducing the problem
travel time and charging cost of the EV.
The multi-objective function is calculated as linear sum of three
The remainder of the article is organized as follows: System objectives: path cost J p , charging cost J c , and the time spent by
p
architecture and mathematical formulation of the deterministic the EV owner tk + tkwait , which is
EVCN problem is given in Section 2. Proposed transportation
(p )
network features included MDP formulation and shortest path- min J p + J c + 𝜓 tk + tkwait (1)
based PPG algorithm are introduced in Section 3. Real-time xi j ,lk
input data, case studies, and numerical simulation results are pre-
sented in Section 4. Finally, the conclusion and future directions where the EV owner’s unit value factor, 𝜓, is a weight to penal-
p
of the paper reported in Section 5. ize the sum of the time spent during travel tk , and waiting
at the EVCS node tkwait . Thus, the importance of the prior-
ities between the EV owner’s unit value and the total time
2 METHODOLOGY spent can be determined while simultaneously minimizing the
cost functions J p and J c [18]. EV battery usage based on
2.1 System overview power consumption on the road 𝜔, and the electricity price,
𝜂 is constrained by J p as given in Equation (2). Similarly, J c
The proposed model addresses the challenge encountered by aims to minimize the battery usage until the EV arrives to the
EV owners in determining the optimal EVCS in terms of mini- EVCS while constraining charging cost by using the SOC val-
mizing both charging and travelling costs to meet their charging ues of EV and the charging price at kth EVCS as given in
requirements at any hour of day and in challenging transporta- Equation (3).
tion network scenarios. In this context, a smart real-time data
sharing center similar to that proposed in [18, 23], and [24], is ∑
J p ≥ 𝜔𝜂 di j xi j (2)
utilized to incorporate position and state-of-charge (SOC) data i, j
from the EV, time-of-use (ToU) tariff-based charging prices,

estimated waiting times from EVCS, as well as road distance, J c ≥ (emax − ekend )E max 𝛼tc,k lk (3)
capacity, and real-time traffic flow data from the network. The k∈L
overall data acquisition system architecture for the EVCN prob-
lem is illustrated in Figure 1. First, the EV is initialized with The following equations model the travel time from initial
random position and SOC value in the network and selects an position to the EVCS node. The studies [18] and [24] deter-
EVCS location. Then, EV takes a step in the network accord- mine the travel time of the EV using only the distance and EV
ing to the solution of weighted shortest path approach. At velocity parameters without considering traffic density. Inspired
each node, communication network parameters and charging- by [27], to achieve a more realistic representation of the trans-
p p
traveling costs are updated based on the time-of-day. The portation network, this study formulates the travel time ti j ( fi j )
EVCS selection process in the network is handled by a DRL by considering the traffic flow and road capacity parameters.
based approach to overcome the stochastic data in the system. This formulation highlights the importance of the travel time
Finally, this process is repeated until the EV reaches optimal parameter in selecting the EVCS location. The formulation of
EVCS. the required time from node i to j is given as
17519578, 2024, 12, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/itr2.12588 by Test, Wiley Online Library on [05/02/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
ERÜST and TAŞCIKARAOĞLU 2523

∑ ( )
p p p nodes. The smaller portion of a transportation network with the
tk ≥ ti j fi j xi j (4)
i, j
weights dnm is given as follows:

⎛ 𝜉
( ) ⎛ f p⎞ ⎞ ⎡0 d12 inf d14 inf ⎤
⎜ ⎟ ⎢d12 d25 ⎥
= ti j ⎜ 1 + b ⎜ p ⎟ ⎟
p p p,0 i j 0 d23 d24
ti j fi j (5)
⎜ ⎜c ⎟ ⎟ D(G ) = ⎢ inf d23 0 d34 d35 ⎥ (11)
⎝ ⎝ ij ⎠ ⎠ ⎢ ⎥
⎢d14 d24 d34 0 inf ⎥
p,0
⎣ inf d25 d35 inf 0⎦
where, ti j represents the free-flow time of the road, b is the
parameter of the BPR function and 𝜉 determine the function where inf denoting that no direct connection exists between two
shape and utilizes the threshold value for the BPR function. The nodes. The weights of the transportation network are obtained
waiting time required before EV starts its charging process at utilizing the cost of distance and travel time from node i to j by
EVCS k is represented as considering the Equation (4) as below.
∑ ( )
p p
tkwait = tkc lk . (6) dnm = 𝜔𝜂di j + 𝜓ti j fi j (12)
k∈L
Finally, the formulated weights given in Equation (12) are
tkc denotes the estimated waiting time at the EVCS location.
solved using Dijkstra’s algorithm, as presented in [28].
The SOC value of the battery when EV arrives at the EVCS
is calculated as in Equation (7), utilizing initial SOC level, total
distance and battery capacity parameters. Furthermore, the final
SOC on arrival is constrained by a minimum critical level as in
2.4 GPR-based EVCS waiting time model
Equation (8).
Most of the DRL based EVCN solutions rely on randomly
∑ di j xi j distributed waiting time models without considering the time-
ekend = einit − 𝜔 (7) of-day and EV owner behaviours. Therefore, these studies fail
i, j
E max
to analyse daily characteristics of EVCS waiting time. Thus, this
ekend ≥ ekmin (8) section introduces a GPR-based learning model for estimating
the EVCS waiting times by using real world data.
The following equations ensure that only a single EVCS node GPR, detailed in [29], is one of the popular regression algo-
is suggested and a sequential path is selected from initial node rithms in machine learning (ML) that utilizes a non-parametric
to the destination node Bayesian model for solving the problems. Moreover, EVCS
∑ waiting time prediction is a complex task due to the uncertain-
lk = 1 (9) ties in the transportation network and EV owner behaviours.
k∈L Hence, GPR-based models have the advantage of being able to
⎧1, resist uncertainties, which allows to obtain daily charging atti-
∑ ∑ i=q
⎪ tudes of the EV owners due to their probabilistic predictions
xi j − x ji = ⎨0, i ≠ q, i ∉ L (10) [30].
i, j i, j ⎪l , i∈L
⎩i The GPR model output function y is obtained from one
dimensional input vector x by utilizing the training dataset Z =
where binary decision variable xi j identifies the path selec- (xi , yi )N , where training and prediction points are represented
tion on the transportation network when the value is equal by xi and yi respectively and N denotes the number of data sam-
to 1. Otherwise, a path from node i to j is not included. ples for training. In the GPR approach, an unknown function
Similarly, lk is the binary decision variable for EVCS, identify- is used to transition between input data and output prediction
ing the selected EVCS node and q is the optimal path of the model, as given follows:
optimization problem.
The deterministic formulation of EVCN problem requires y = f (x) + 𝜖 (13)
full-state knowledge about transportation network, charging
time, electricity price and SOC level of the EV. However, in where, 𝜖 ∼ 𝜘(0, 𝜎2 ) represents the Gaussian noise over the
practice real-time systems fail to capture all the state knowledge. output prediction model. The GP function f (x) express a
Moreover, the state data usually changes over a time period. probability distribution given as:

f (x) ∼ GP(m(x), 𝜅(x, x ′ )) (14)


2.3 Weighted shortest path formulation
where, m(x) represents the mean function and 𝜅(x, x ′ ) denotes
In this study, a graph-based transportation network is consid- the covariance function. Together, these functions describe the
ered. D(G ) is the weight matrix that includes all the paths in probability distribution for training points and are detailed in
the transportation network and shows the weights between the Equations (15) and (16), respectively.
17519578, 2024, 12, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/itr2.12588 by Test, Wiley Online Library on [05/02/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
2524 ERÜST and TAŞCIKARAOĞLU

m(x) = 𝔼[ f (x)] (15) Action: The EV agent’s action at given in Equation (19), is
the decision-making between EVCS nodes with respect to the
environment states at the current time t . In addition, at each first
𝜅(x, x ′ ) = 𝔼[( f (x) − m(x))( f (x ′ ) − m(x ′ ))T ] (16) node of the optimal path, the EV agent takes sequential actions
until the EV reaches the EVCS node.
The aforementioned covariance function 𝜅(⋅) is commonly
referred to as the kernel function of the GPR process. Various at = k, k ∈ L (19)
kernel functions have been proposed in the literature, taking
into account attributes such as smoothness and expected pat- Reward: The reward function rt is the feedback of the system
terns within the dataset’s training points [31]. In this study, the and obtained from the perspective of the EV agent as described
radial basis function (RBF) kernel is chosen as follows due to its in Equation (20). Moreover, the reward function is derived in
capability to represent smooth and consistent functions. two stages based on the objectives of the EVCN problem given
( ) in Equation (1).
‖x − x ′ ‖2
𝜅(x, x ′ ) = 𝜎f2 exp − (17)
2𝜆2 ⎧ p p
⎪−𝜂𝜔di j − 𝜓ti j ( fi j ), i = 𝛿0 , j = 𝛿1 , pt ≠ 𝛿end
rt = ⎨
where, 𝜆 denotes the length-scale and 𝜎f2 represents the ⎪−(emax − et )𝛼tc E max − 𝜓tkwait , pt = 𝛿end
noise variance of data points. Input function, x, includes the ⎩
arrival/departure time and SOC level parameters of the EVs (20)
and output function y represents the waiting time model during
charging process. By considering the past training data points, where, et represents the current SOC level of the EV, and 𝛿0 and
the GPR model predicts the waiting time for EVs at any given 𝛿1 denote the origin and first node of the optimal path taken by
time. The input data is extensively detailed in Section 4.1. the EV agent to the EVCS. The first part of the reward function
is calculated while the EV is traveling on the network. In Equa-
tion (20), the feedback is computed based on the EV’s road time,
2.5 MDP formulation distance and the EV owner’s cost. The second part of the reward
function, which is charging cost at the EVCS, is determined by
This section introduces the MDP formulation of the EVCN using the charging price and SOC level of the EV. This study
problem from the perspective of EV owners, considering assumes that all EVCSs are identical and have the same power
the randomness of the received data. This data includes the rating values for charging process.
estimated waiting times at EVCSs, ToU electricity price, the Transition: The transition illustrates the mapping of state
time-of-day and traffic flow conditions on the road. Initially, the variables as defined in Equation (21) from time step t to t + 1 by
data is merged with data including EV variables to construct the utilizing the action at . The transition probability between states
environment’s full-state st covering the EV battery level condi- is a challenging problem because it is stochastic. Therefore, a
tion and its position on the map. In this direction, the EV takes DRL-based approach is developed to seek the optimal solution
action at to select an EVCS in the environment at time step as follows,
t . Based on this action, the optimal path and estimated arrival
time are computed using the weighted shortest path algorithm. st +1 = f (st , at , 𝜏t ) (21)
Afterwards, the environment feedback is obtained from reward
function, rt and next states st +1 are reconstructed based on the where 𝜏t represents the disturbance data utilized to introduce
first node of the optimal path 𝛿 o at time step t + 1. A detailed randomness in the state transition. The EV position is updated
explanation of the MDP formulation is provided below as a as
tuple (st , at , rt , st +1 ).
States: The system states st are determined based on the pt +1 = 𝛿1 (22)
constructed environment and derived as
( ) The updated SOC level during EVs travel is calculated as
p p
st = pt , etd , 𝛼tc , Jt , Jtc , t , tk , tkwait (18)
d𝛿1st 𝛿2nd
et +1 = etinit − 𝜔 (23)
where pt denotes the current position of the EV, represents etd emax
the current SOC level at each travelled node, 𝛼tc is the charg-
Value-function: A policy 𝜋(st ) describes the probability of
ing price determined according to the time index of the day,
p selecting an action based on the current state observation. The
which is t , with ToU electricity price data, Jt is calculated at
value-function V 𝜋 (st ) represents the effectiveness of the EV
each time step based on the approach outlined in Equation (2).
agent’s policy as follows:
Jtc is the charging cost of the EV and is formulated as shown in
Equation (3). Finally, tkwait represents the estimated waiting time ( )

T
at the selected EVCS, obtained from the GPR-based waiting V 𝜋 (st ) = rt (st , 𝜋t ) + E 𝜋 𝛾m−t rm (sk , 𝜋k ) (24)
time model. m=t +1
17519578, 2024, 12, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/itr2.12588 by Test, Wiley Online Library on [05/02/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
ERÜST and TAŞCIKARAOĞLU 2525

FIGURE 2 Complete overview of EVCN approach.

where, 𝛾 ∈ (0, 1] denotes the discount factor and illustrates the The PPG is an extended architecture of the PPO algorithm
impact of immediate and future rewards. T represents terminal [26], and representing one of the state-of-the-art model-free
phase when the EV agent reaches one of the EVCS node. The policy gradient based DRL methods in the current literature.
goal here is to obtain optimal EV agent policy 𝜋∗ that produces The PPG algorithm optimizes the objective of problem in two

the optimal value-function V 𝜋 (st ) for the best EVCS selection distinct phases: the policy phase and the auxiliary phase. Dur-
as follows ing the policy phase, PPO algorithm [26] is employed to train
the EV agent. In the auxiliary phase, valuable features of the
value-function are distilled into policy network to enhance the

V 𝜋 (st ) = max V (st ) (25) subsequent policy phase.

3 SHORTEST PATH BASED PPG 3.1.1 Policy phase


ALGORITHM
The policy phase of the PPG is constructed with two networks
The proposed PPG-based algorithm to solve the MDP problem namely, policy and value-function network. The policy network
is illustrated in Figure 2. is denoted with 𝜃 and the network parameters are updated
based on the objective that utilizes clipped surrogate func-
tion. The objective function of the policy network is given as
3.1 Complete scheme of PPG approach follows:

It is a quite challenging problem to solve the optimal path plan- L clip = 𝔼̂t [min(rt (𝜃)A
̂ t , clip(rt (𝜃), 1 − 𝜖, 1 + 𝜖)A
̂ t )] (26)
ning problems with high dimensional uncertainties and no prior
data. Thus, one of the possible approach is to use DRL algo- where, 𝜖 represents the clipping parameter utilized to prevent
rithms to optimize aforementioned EV agent policy [32]. In this local maxima during the exploration phase and A ̂ t represents
study, PPG algorithm is employed to address EVCN problem the effectiveness of the action at time t and is calculated by
due to the its effectiveness which is detailed in [33]. generalized advantage estimate (GAE) approach proposed in
17519578, 2024, 12, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/itr2.12588 by Test, Wiley Online Library on [05/02/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
2526 ERÜST and TAŞCIKARAOĞLU

[34]. The surrogate function denotes the ratio between the ALGORITHM 1 Shortest path based PPG algorithm training.
current 𝜋𝜃 and previous 𝜋𝜃old network parameters serving to
avoid from instantaneous parameter updates during training as 1 Initialization of Policy Network parameter 𝜃,
follows: 2 Value-function network 𝜙 and Buffer with B
p p
3 Receive the initial data: (𝜔, 𝜓, etd , pt , di j , ti j ( fi j ), 𝜂, 𝛼tc , tkwait )
𝜋𝜃 (at |st )
rt (𝜃) = (27) 4 Solve the shortest path by utilizing transportation network
𝜋𝜃old (at |st )
features,obtain the initial state s0 and road features
where, at and st represents the current action and state pair. 5 for t=1,2,...,T do
The value-function network, denoted by 𝜙, updates its network 6 Initialize a buffer B
parameters using temporal difference (TD) error. The objective 7 //Policy Phase Part
function for optimizing the value-function network is provided 8 for iter=1,2,...,N𝜋 do
as 9 Perform roll-outs under the policy 𝜋
[1 ( ] 10 Compute the target value function V̂ (st ) for all
target )2
L vf = 𝔼̂ t V𝜙 (st ) − V̂ t (28) states
2 11 //Policy Network Training Part
12 for count1 =1,2,...,E𝜋 do
where, V̂ t
target
represents the targets for the value-function
13 Optimize L clip function w.r.t 𝜃
network and is calculated by GAE approach [34].
14 end
15 //Value-Function Network Training Part
16 for count2 =1,2,...,Ev do
3.1.2 Auxiliary phase 17 Optimize L v f function w.r.t 𝜙
18 end
In the auxiliary phase, the joint objective function is utilized to
19 Store all (st , V̂ (st )) to B
optimize the policy network. It combines a behavioural cloning
loss and an arbitrary auxiliary loss as follows: 20 end
21 Compute and store policy 𝜋old (⋅|st ) for all states in B
L joint = L aux + 𝛽clone .𝔼̂t [KL[𝜋𝜃old (⋅|st ), 𝜋𝜃 (⋅|st )]] (29) 22 //Auxiliary Phase Part
23 for count3 =1,2,...,Eaux do
where, KL(𝜋𝜃old |𝜋𝜃 ) represents the Kullback–Leibler diver- 24 Optimize L joint function w.r.t 𝜃, on all data in B
gence function, which provides the probability distribution 25 Optimize L v f function w.r.t 𝜙, on all data in B
between the current and old policy. 𝜋𝜃old is considered as the 26 end
policy at the beginning of the auxiliary phase and the original 27 end
policy is preserved by utilizing the parameter of 𝛽clone . Finally,
L aux denotes the auxiliary objective function which is selected
to be same as the value-function network objective function, as
suggested in [33]. action at and state st . The last policy is then stored in the buffer
for all states to initialize the auxiliary phase.
Finally, in auxiliary phase L joint and L vf are optimized
3.2 Training of shortest path based PPG according to the data stored in the buffer B. The parameters
algorithm controlling the overall policy phase, policy network, value-
function network, and overall auxiliary phase iterations are
This section introduces the training process and simulation denoted by N𝜋 , E𝜋 , EV , and Eaux , respectively.
parameters of the proposed PPG algorithm. Below, Algorithm 1
outlines the training of the EV agent using PPG algorithm for Remark 1. Let |𝕏| denote the number of roads, |ℕ| represent
the above discussed EVCN problem. The parameters of EV the number of nodes and |ΩEVCS | denote the number of EVCS
p p
(pt , 𝜔, 𝜓, etd ), transportation network (di j , ti j ( fi j )) and EVCS nodes in the transportation network. The computational com-
(𝜂, 𝛼tc , tkwait ) are utilized as input to the algorithm. The out- plexity analysis of the proposed algorithm is performed in two
put of the algorithm is the trained policy which can effectively parts. In the first part, considering Line 4, the computational
guide EV through the transportation network with optimal complexity of the shortest path based on Dijkstra’s algorithm is
EVCS selection. Training process begins with the initialization defined as O((|𝕏| + (|ℕ| + |ΩEVCS )|) log(|ℕ| + |ΩEVCS |)). In
of the policy network 𝜃, value-function network 𝜙, and buffer B. the second part of the algorithm, computational complexity of
Then, shortest path algorithm is executed based on the received the DRL approach is defined by state st and action at [35] which
initial data to obtain initial state s0 for the learning process. are generally less than the number of roads O(|𝕏|). Hence,
In the policy phase, the policy and value-function networks the proposed PPG approach does not significantly increase
are trained based on the objective functions L clip and L vf to complexity, particularly when the number of roads |𝕏| in the
update the current EVCS selection policy according to the network is large.
17519578, 2024, 12, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/itr2.12588 by Test, Wiley Online Library on [05/02/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
ERÜST and TAŞCIKARAOĞLU 2527

FIGURE 5 GPR based time-of-day waiting time model.

method. Similar to study [24], the average traveling cost 𝜂 and


power consumption 𝜔 are set to 0.03$∕km and 0.18 kWh∕km
respectively, which are used to calculate the EV power con-
sumption on the path. Besides, the EV has an 80 kWh maximum
battery capacity for Tesla Model 3 [38] denoted as E b , its initial
FIGURE 3 24 Node Sioux-Falls transportation network [25].
energy randomly varies between 32 and 48 kWh [24], and the
average EV owner cost per unit 𝜓 is set to 0.75–5$∕h according
to the average wage in China [39].
The waiting time characteristics of the EV owners are
obtained from a GPR-based prediction model using a dataset
that includes initial SOC and arrival-departure times of 10,000
EVs in a parking lot [40]. The raw data is recorded from 07:00 to
00:00 with 15-min time intervals. Then, inspired by studies [18]
and [24], the short-term parking durations are considered as the
data samples for the GPR prediction model. In this direction,
uncertainties in the waiting time and time-of-day based EVCS
selection characteristic of the EV owners are included in the
system. The GPR-based waiting time prediction model is eval-
FIGURE 4 ToU electricity price tariff.
uated by utilizing mean squared error (MSE) metric which is
calculated as 1.96%. In Figure 5, x axis is the time-of-day and y
axis represents the estimated waiting time parameters.
4 TEST AND RESULTS In the training phase of the PPG algorithm, a multilayer per-
ceptron (MLP) structure is used to construct neural networks
4.1 Input data (NN). In this NN structure, ReLU (rectified linear units) acti-
vation function is used to connect neurons. Inspired from the
In the simulation studies, Sioux-Falls transportation network [33], the simulation hyperparameters of the PPG algorithm are
including 24 nodes and 76 roads is considered to test the chosen as given in Table 1.
effectiveness of the proposed approach. It is assumed that
there are 3 EVCSs located at nodes 4, 7, and 20 as shown in
Figure 3. The complete distance, road capacity and demand 4.2 Simulation and results
dataset of the network are available in [25, 36]. Additionally, traf-
fic flow parameters are randomly distributed across the network In the simulation studies, the algorithms are implemented on
within the range of road capacity values to analyse the traffic a computer equipped with a CPU i7 12659H processor, GPU
awareness performance. NVIDIA RTX4070, and 16 GB of RAM. In the first case study,
Inspired by [37], a time-dependent ToU tariff electricity price the proposed PPG algorithm is compared with the PPO algo-
data from an energy company shown in (Figure 4) is used to cal- rithm in terms of reducing the overall cost without no prior
culate the charging price 𝛼tc at the EVCS. In this context, time information on the environment parameters. Firstly, the states
index is included to analyse the effectiveness of the proposed are initialized with randomly generated data: The EV’s initial
17519578, 2024, 12, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/itr2.12588 by Test, Wiley Online Library on [05/02/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
2528 ERÜST and TAŞCIKARAOĞLU

TABLE 1 PPG algorithm simulation parameters. TABLE 2 Total time spent based on different EV owner cost.

Parameter Value EV owner cost 0 $∕h 3 $∕h 5 $∕h

N𝜋 32 Total time spent 132.17 min 122.69 min 119.55 min

E𝜋 1
EV 1
Eaux 6
Learning rate 0.001
Batch size 32
Epoch size 20k
Size of buffer B 32
Discounting factor 0.99
Extra discount factor 0.95
Entropy weight 0.005
Clipping parameter 0.2
Policy network size of layer and neurons 2, 256
Value-function network size of layer and neurons 2, 256

FIGURE 7 The reward of EVCN under three different EV owner costs.

than state-of-art PPO algorithm in the systems with dynamic


state variables.
In the second case study, three different EV owner cost val-
ues are considered according to the behaviour of EV owners
and their priorities. The lower EV owner cost implies that the
weight given to the time spent on the network is less than the
weight given to the charging cost. The higher cost indicates vice
versa. Similar to the previous test case, initial position of the EV
is randomly distributed over the nodes.
The effectiveness of PPG based EV agent model is analysed
in terms of the total time spent on the network and presented
in Table 2 for three different EV owner costs. In first case, 𝜓 is
set to 0 $∕h and the total time observed on the road is 132.17
min. When 𝜓 is set to 3 $∕h and 5 $∕h, it is seen that the total
FIGURE 6 PPG versus PPO: the reward of EVCN. times decrease to 122.69 and 119.55 min, respectively. Further-
more, Figure 7 shows that the average reward values are with
an average reward of −38.52, −52.56, and −61.37 respectively.
position is assigned across the nodes N , the time-of-day and There is a reduction of 7–10% in the total time spent on the
the initial SOC level of the EV are initialized at the begin- network when comparing the 𝜓 values of 0 $∕h and 5 $∕h.
ning of the simulation and traffic flow information is obtained Additionally, increasing the EV owner value causes a decrease
from the smart transportation network. Furthermore, the afore- in the average reward value according to the EV user time and
mentioned GPR-based waiting time estimation model and ToU cost importance. Thus, this comparative case study implies that
charging price data are included in the simulation. the proposed model is able to effectively adapt itself to EV
The performance comparison of the PPO-based DRL stud- owner priorities.
ied in [24] and the shortest path-based PPG algorithms in terms The last case study shows the traffic flow adaptation of the
of episodic cumulative reward is presented in Figure 6. It is EV agent. This case is divided into four parts in which usual
observed that the proposed PPG agent policy performs 9% bet- and high traffic conditions are considered as normal and high as
ter on average than the PPO agent in terms of reducing the presented in Table 3. In all experiments, the initial EV position,
total costs associated with the given EVCN optimization prob- the EV owner cost, time-of-day, and SOC level are set to: Node-
lem, as defined in Equation (1). This indicates that distilling 16, 5 $∕h, 8:30 a.m., and 50%, respectively. Figure 8 shows the
valuable features from the actor-critic network through auxil- evolutions of episodic cumulative reward for these experiments.
iary optimization during the training process enhances sample In P1, the traffic flow of the network is set to its normal con-
efficiency and improves the performance of the optimal policy. dition for all roads and the total costs are observed as −40.51,
As a result, PPG approach can be said to be more convenient −34.76, and −38.96. The EV agent selects EVCS-2 (Node-7)
17519578, 2024, 12, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/itr2.12588 by Test, Wiley Online Library on [05/02/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
ERÜST and TAŞCIKARAOĞLU 2529

TABLE 3 Traffic flow demands on the road.


Nodes
16–8 16–10 16–17 16–18 EVCS-1 EVCS-2 EVCS-3
Case
Phase-1 Normal Normal Normal Normal −40.51 −34.76 −38.96
Phase-2 Normal Normal Normal High −40.51 −36.33 −38.96
Phase-3 High Normal Normal High −40.51 −49.38 −38.96
Phase-4 High Normal High High −40.51 −45.17 −57.78

FIGURE 8 The reward of EVCN under dynamic traffic conditions.

with the least cost by following the nodes 16–18–7 as indi-


cated by arrows in Figure 9. Table 3 shows the EVCS selections
according to traffic flow variations and Figure 9 displays the
paths followed by the EV agent to the selected EVCS for each
experiment. As a result, the proposed model has ability to adapt FIGURE 9 EV agent traffic flow based path tracking performance.
itself to changes in traffic flow and effectively determines the
optimal EVCS location.

∙ An MDP with unknown transition probability is formulated


5 CONCLUSION for the EVCN problem to minimize the total travel cost
and the charging cost. A state-of-the-art PPG approach is
This paper proposes a shortest path-based DRL approach with employed to make the EVCS decisions. In this direction, the
dynamic traffic flows for optimal EV navigation problem, con- PPG-based EV agent learns the optimal policy for EVCN in
sidering time dependent path and charging cost objectives. a dynamic environment. Besides, a shortest path-based algo-
Numerical results are verified on a benchmark network, 24- rithm is employed to determine optimal travelling path from
Node Sioux-Falls, which is considered to have a single EV origin node to the selected EVCS.
and three EVCSs. The proposed architecture incorporates a ∙ A novel shortest path based Phasic Policy Gradient (PPG)
communication network between EV, EVCS and transporta- algorithm is developed to determine the optimal EVCS selec-
tion network to collect the state information. ToU electricity tion policy without requiring any prior information. Dijkstra
price tariff is considered for the EV charging price. Important algorithm solves the optimal shortest path problem under
findings are summarized below: dynamic traffic conditions, and this is further evaluated in the
decision-making phase of the PPG algorithm.
∙ GPR machine learning method is introduced for stochastic ∙ Comparative analysis of the training results reveals that the
waiting time estimation model at the EVCSs. Thus, com- proposed shortest path-based PPG algorithm outperforms
pared to the estimation based on queueing models, the the PPO algorithm by an average of 9 % in terms of reduc-
proposed method is more adaptable to changing conditions ing overall travel and charging costs. Besides, the traffic-aware
and uncertainties. performance of the PPG agent is evaluated under different
17519578, 2024, 12, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/itr2.12588 by Test, Wiley Online Library on [05/02/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
2530 ERÜST and TAŞCIKARAOĞLU

traffic flows, demonstrating the EV agent’s ability to adapt to 7. Xiang, Y., Jiang, Z., Gu, C., Teng, F., Wei, X., Wang, Y.: Electric vehicle
dynamic system states. charging in smart grid: a spatial-temporal simulation method. Energy 189,
∙ Finally, the EV agent is trained by considering user prefer- 116221 (2019)
8. Zhong, J., Yang, N., Zhang, X., Liu, J.: A fast-charging navigation strategy
ences based on EV owner costs. The case study shows a for electric vehicles considering user time utility differences. Sustainable
decrease of 7-10 % in total time spent on the network. This Energy Grids Networks 30, 100646 (2022)
implies that the proposed strategy facilitates weighting the 9. Shao, S., Guan, W., Bi, J.: Electric vehicle-routing problem with charging
time spent and charging cost in the objective function. demands and energy consumption. IET Intell. Transp. Syst. 12(3), 202–212
(2018)
10. Tan, J., Wang, L.: Real-time charging navigation of electric vehicles to fast
The overall case study results demonstrate that the proposed charging stations: a hierarchical game approach. IEEE Trans. Smart Grid
EVCN strategy achieves promising outcomes in minimiz- 8(2), 846–856 (2015)
ing total time spent on the network and charging cost 11. Liu, C., Zhou, M., Wu, J., Long, C., Wang, Y.: Electric vehicles en-route
by considering the time-of-day and EV owner preference charging navigation systems: joint charging and routing optimization.
IEEE Trans. Control Syst. Technol. 27(2), 906–914 (2017)
parameters. Future studies aim to incorporate real-time dis-
12. Cerna, F.V., Pourakbari-Kasmaei, M., Romero, R.A., Rider, M.J.: Optimal
tribution grid parameters such as peak power demand into delivery scheduling and charging of EVs in the navigation of a city map.
the EVCN problem to investigate effectiveness of the pro- IEEE Trans. Smart Grid 9(5), 4815–4827 (2017)
posed DRL approach. PPG-based DRL allows more efficient 13. Sharif, M., Seker, H.: Smart EV charging with context-awareness: enhanc-
use of data for high-dimensional systems in which data col- ing resource utilization via deep reinforcement learning. IEEE Access 12,
7009–7027 (2024)
lection can be cumbersome. Besides, reduced number of
14. Zhang, C., Liu, Y., Wu, F., Tang, B., Fan, W.: Effective charging planning
iterations is sufficient to achieve similar or better performance based on deep reinforcement learning for electric vehicles. IEEE Trans.
compared to PPO-based DRL method in high-dimensional Intell. Transp. Syst. 22(1), 542–554 (2020)
systems. 15. Miletić, M., Ivanjko, E., Gregurić, M., Kušić, K.: A review of reinforce-
ment learning applications in adaptive traffic signal control. IET Intell.
Transport Syst. 16(10), 1269–1285 (2022)
AUTHOR CONTRIBUTIONS
16. Qiu, D., Ye, Y., Papadaskalopoulos, D., Strbac, G.: A deep reinforcement
Ali Can Erüst: Methodology; software; writing—original learning method for pricing electric vehicles with discrete charging levels.
draft. Fatma Yıldız Taşcıkaraoğlu: Supervision; validation; IEEE Trans. Ind. Appl. 56(5), 5901–5912 (2020)
writing—review and editing. 17. Li, H., Wan, Z., He, H.: Constrained EV charging scheduling based on safe
deep reinforcement learning. IEEE Trans. Smart Grid 11(3), 2427–2439
(2019)
CONFLICT OF INTEREST STATEMENT
18. Qian, T., Shao, C., Wang, X., Shahidehpour, M.: Deep reinforcement learn-
The authors declare no conflicts of interest. ing for EV charging navigation by coordinating smart grid and intelligent
transportation system. IEEE Trans. Smart Grid 11(2), 1714–1723 (2019)
DATA AVAILABILITY STATEMENT 19. Aljohani, T.M., Ebrahim, A., Mohammed, O.: Real-time metadata-driven
The data that support the findings of this study are available routing optimization for electric vehicle energy consumption minimization
using deep reinforcement learning and Markov chain model. Electr. Power
on request from the corresponding author. The data are not
Syst. Res. 192, 106962 (2021)
publicly available due to privacy or ethical restrictions. 20. Wan, Z., Li, H., He, H., Prokhorov, D.: Model-free real-time EV charging
scheduling based on deep reinforcement learning. IEEE Trans. Smart Grid
ORCID 10(5), 5246–5257 (2018)
Fatma Yıldız Taşcıkaraoğlu https://orcid.org/0000-0003- 21. Jiang, C., Zhou, L., Zheng, J., Shao, Z.: Electric vehicle charging navigation
strategy in coupled smart grid and transportation network: a hierarchical
1866-2515
reinforcement learning approach. Int. J. Electr. Power Energy Syst. 157,
109823 (2024)
REFERENCES 22. Zhang, W., Liu, H., Wang, F., Xu, T., Xin, H., Dou, D., Xiong, H.: Intel-
1. Jannati, J., Nazarpour, D.: Optimal energy management of the smart park- ligent electric vehicle charging recommendation based on multi-agent
ing lot under demand response program in the presence of the electrolyser reinforcement learning. In: Proceedings of the Web Conference 2021, pp.
and fuel cell as hydrogen storage system. Energy Convers. Manage. 138, 1856–1867. ACM, New York, NY (2021)
659–669 (2017) 23. Xing, Q., Xu, Y., Chen, Z., Zhang, Z., Shi, Z.: A graph reinforcement
2. Ji, Z., Huang, X.: Plug-in electric vehicle charging infrastructure deploy- learning-based decision-making platform for real-time charging navigation
ment of china towards 2020: policies, methodologies, and challenges. of urban electric vehicles. IEEE Trans. Ind. Inf. 19(3), 3284–3295 (2023)
Renewable Sustainable Energy Rev. 90, 710–727 (2018) 24. Jin, J., Xu, Y.: Shortest-path-based deep reinforcement learning for EV
3. Cheng, X., Zhang, R., Yang, L.: Wireless toward the era of intelligent charging routing under stochastic traffic condition and electricity prices.
vehicles. IEEE Internet of Things J. 6, 188–202 (2018) IEEE Internet Things J. 9(22), 22571–22581 (2022)
4. Tascikaraoglu, F.Y., Aksoy, G.: Identification of sensor location and link 25. Ukkusuri, S.V., Yushimito, W.F.: A methodology to assess the criticality of
flow reconstruction using turn ratio and flow sensors in an arterial highway transportation networks. J. Transp. Secur. 2, 29–46 (2009)
network. J. Intell. Transp. Syst. 28(2), 163–173 (2024) 26. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal
5. Luo, Y., Feng, G., Wan, S., Zhang, S., Li, V., Kong, W.: Charging scheduling policy optimization algorithms. arXiv:1707.06347 (2017)
strategy for different electric vehicles with optimization for convenience of 27. Xiang, Y., Liu, J., Li, R., Li, F., Gu, C., Tang, S.: Economic planning of
drivers, performance of transport system and distribution network. Energy electric vehicle charging stations considering traffic constraints and load
194, 116807 (2020) profile templates. Appl. Energy 178, 647–659 (2016)
6. Zhong, J., Liu, J., Zhang, X.: Charging navigation strategy for electric 28. Dijkstra, E.W.: A note on two problems in connexion with graphs. In: Eds-
vehicles considering empty-loading ratio and dynamic electricity price. ger Wybe Dijkstra: His Life, Work, and Legacy, pp. 287–290. ACM, New
Sustainable Energy Grids Networks 34, 100987 (2023) York, NY (2022)
17519578, 2024, 12, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/itr2.12588 by Test, Wiley Online Library on [05/02/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
ERÜST and TAŞCIKARAOĞLU 2531

29. Williams, C.K., Rasmussen, C.E.: Gaussian processes for machine learning, 37. Beyazıt, M.A., Taşcıkaraoğlu, A.: Electric vehicle charging through mobile
Vol. 2, MIT Press, Cambridge, MA (2006) charging station deployment in coupled distribution and transportation
30. Li, W., Fan, Y., Ringbeck, F., Jöst, D., Sauer, D.U.: Unlocking electro- networks. Sustainable Energy Grids Networks 35, 101102 (2023)
chemical model-based online power prediction for lithium-ion batteries via 38. Kane, M.: 2021 Tesla model 3 LR AWD with 82 kWh battery: charg-
Gaussian process regression. Appl. Energy 306, 118114 (2022) ing analysis. Available: https://insideevs.com/news/519382/teslamodel3-
31. Schulz, E., Speekenbrink, M., Krause, A.: A tutorial on Gaussian pro- 82kwh-charging-analysis/. Accessed 30 Oct 2024
cess regression: modelling, exploring, and exploiting functions. J. Math. 39. N.B. of Statistics of China.: The avarage level of wages in main industry
Psychol. 85, 1–16 (2018) in China in 2017. Available: https://data.stats.gov.cn/english/easyquery.
32. Erüst, A.C., Beyazıt, M.A., Taşcıkaraoğlu, F.Y., Taşcıkaraoğlu, A.: Deep htm?cn=C01. Accessed 30 Oct 2024
reinforcement learning-based navigation strategy for a mobile charging 40. Şengör, İ., Güner, S., Erdinç, O.: Real-time algorithm based intelligent EV
station in a dynamic environment. In: 2023 International Conference parking lot charging management strategy providing PLL type demand
on Smart Energy Systems and Technologies (SEST), pp. 1–6. IEEE, response program. IEEE Trans. Sustainable Energy 12(2), 1256–1264
Piscataway, NJ (2023) (2020)
33. Cobbe, K.W., Hilton, J., Klimov, O., Schulman, J.: Phasic policy gradi-
ent. In: International Conference on Machine Learning, pp. 2020–2027.
Microtome Publishing, Brookline, MA (2021)
34. Schulman, J., Moritz, P., Levine, S., Jordan, M., Abbeel, P.: High-
How to cite this article: Erüst, A.C., Taşcıkaraoğlu,
dimensional continuous control using generalized advantage estimation.
arXiv:1506.02438 (2015) F.Y.: Spatio-temporal dynamic navigation for electric
35. Rasheed, F., Yau, K.-L.A., Noor, R.M., Wu, C., Low, Y.-C.: Deep rein- vehicle charging using deep reinforcement learning. IET
forcement learning for traffic signal control: a review. IEEE Access 8, Intell. Transp. Syst. 18, 2520–2531 (2024).
208016–208044 (2020) https://doi.org/10.1049/itr2.12588
36. Ukkusuri, S.V., Yushimito, W.F.: A methodology to assess the criticality of
highway transportation networks. J. Transp. Secur. 2, 29–46 (2009)

You might also like