IET Intelligent Trans Sys - 2024 - Erüst - Spatio Temporal Dynamic Navigation For Electric Vehicle Charging Using Deep
IET Intelligent Trans Sys - 2024 - Erüst - Spatio Temporal Dynamic Navigation For Electric Vehicle Charging Using Deep
DOI: 10.1049/itr2.12588
ORIGINAL RESEARCH
1
Departmant of Electrical and Electronics Abstract
Engineering, Mugla Sitki Kocman University, Mugla,
This paper considers the real-time spatio-temporal electric vehicle charging navigation
Türkiye
2
problem in a dynamic environment by utilizing a shortest path-based reinforcement learn-
Department of Mechanical and Aerospace
Engineering, University of California San Diego, La
ing approach. In a data sharing system including transportation network, an electric vehicle
Jolla, California, USA (EV) and EV charging stations (EVCSs), it is aimed to determine the most convenient
EVCS and the optimal path for reducing the travel, charging and waiting costs. To estimate
Correspondence the waiting times at EVCSs, Gaussian process regression algorithm is integrated using a
Fatma Yıldız Taşcıkaraoğlu, Departmant of
real-time dataset comprising of state-of-charge and arrival-departure times of EVs. The
Electrical and Electronics Engineering, Mugla Sitki
Kocman University, 48000, Mugla, Türkiye. optimization problem is modelled as a Markov decision process with unknown transition
Email: [email protected] probability to overcome the uncertainties arising from time-varying variables. A recently
proposed on-policy actor–critic method, phasic policy gradient (PPG) which extends the
Funding information proximal policy optimization algorithm with an auxiliary optimization phase to improve
University of California, San Diego; Turkiye Bilimsel
ve Teknolojik Araştırma Kurumu under 2219 training by distilling features from the critic to the actor network, is used to make EVCS
International Fellowship Program, Grant/Award decisions on the network where EV travels through the optimal path from origin node
Number: 1059B192301247 to EVCS by considering dynamic traffic conditions, unit value of EV owner and time-of-
use charging price. Three case studies are carried out for 24 nodes Sioux-Falls benchmark
network. It is shown that phasic policy gradient achieves an average of 9% better reward
compared to proximal policy optimization and the total time decreases by 7–10% when
EV owner cost is considered.
This is an open access article under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs License, which permits use and distribution in any medium, provided the
original work is properly cited, the use is non-commercial and no modifications or adaptations are made.
© 2024 The Author(s). IET Intelligent Transport Systems published by John Wiley & Sons Ltd on behalf of The Institution of Engineering and Technology.
used to obtain optimal path for EV. In the study, the authors that DQN algorithm is able to tolerate against dynamic uncer-
stated that different EV owner time utilities have an effect tainties. Another approach is presented by using double DQN
on the results. Similar methodology is used in [9] for multiple (DDQN) to minimize the energy consumption of EVs in the
EVs on the transportation network to find the energy efficient transportation system [19]. Besides, Wan et al. [20] proposed a
optimal paths. model-free DRL to reduce charging cost of EVs at the EVCS.
Furthermore, main efforts of the studies in [10–12] have The problem was modelled by using Markov decision process
been paid on EVCN while considering the transportation net- (MDP) with unknown state-action probabilities. The effective-
work conditions. Tan et al. [10] presented a hierarchical game ness of the designed DRL model was ensured with different test
approach for EV to charging station navigation by consider- cases. Similarly, Jiang et al. [21] used a DRL method to charging
ing both transportation and power system. Consequently, they station selection for EVs by considering distance, travel time,
observed that the proposed method enhances the robustness of charging duration. The MDP formulation was used for EVCN
the power network and the economic profit of the EVCSs. In problem by considering traffic flow and dynamic user requests.
[11] an EV en-route charging navigation was proposed based Similarly, the authors of [22] presented a multi-agent spatio-
on deterministic and stochastic traffic flows. In the study a temporal RL framework to provide intelligent EV charging for
dynamic programming approach was used to derive the charg- public EVCSs. With the goal of minimizing the waiting time,
ing navigation based on the time-dependent electricity price. As charging price and charging failure rate, the problem was formu-
a result, the travel cost of the EV was minimized during the lated as a multi-objective task. With the similar objectives, Xing
long trips. Similarly, Cerna et al. [12] conducted a study on the et al. [23] presented a graph DRL method for EVCN by con-
charging navigation of a fleet of EVs. The authors modelled sidering shortest route feature extraction. Authors designed a
delay uncertainties and randomness on the transportation sys- modified rainbow algorithm similar to DQN algorithm to mini-
tem based on predefined probabilities of traffic signal, public mize the charging cost, road cost, waiting time and driving time.
work and school areas. The model is solved using mixed integer Effectiveness of the algorithm was tested with various test cases
linear programming (MILP) solvers. Finally, the results indicate and also failure cases. Similarly, in [24] a DRL approach is used
that the model is able to optimize the maintenance cost and to minimize the travel cost and charging cost of an EV. In this
navigation scheduling. study, a shortest path based two level optimization is used to
While the EVCN methodologies mentioned above yield obtain features of the EVCN problem. Then, an actor-critic
convincing decision-making results, the traditional mathemati- method is considered to make path decision of the EV. Results
cal optimization algorithms rely on deterministic environment showed that the shortest path-based DRL approach is reduced
and states. Furthermore, the primary drawbacks of mathe- the total cost of the EV.
matically modelled optimization methods include their low Besides the aforementioned DRL EVCN methodologies,
computational power and a lack of robustness against random- various considerations must be addressed for a realistic EVCN
ness. Therefore, these methods are not applicable for EVCN approach. DRL-based EVCN studies given in [18, 19, 23,
problems in large scale real time dynamic networks. 24] prioritize reducing the number of complex dynamic states
In the recent literature, researchers’ interest has shifted including time information and mathematical model of the
towards deep reinforcement learning (DRL) to solve decision- transportation network. Therefore, modelling the behaviour of
making problems for EVCN to overcome the randomness EV users fail due to time independent randomized states such
and dynamic complexity in the large scale networks. Further- as electricity prices and EVCS waiting times. Furthermore, most
more, to deal with the real-time variations and uncertainties, of the above mentioned EVCN strategies avoid utilizing the
such as EV owner priorities, transportation system instabili- transportation network features such as road demand, traffic
ties and EVCS conditions, DRL provides significant advan- capacity and flow. Consequently, the objective of EVCN prob-
tages, as this method requires no prior knowledge about the lem is only partially analysed. Finally, the proposed model needs
environment [13–15]. to be robust against dynamic environment states, such as unex-
In this context, several studies have been carried out to pected traffic flow on the transportation and waiting times at
optimize the sequential decision-making problems for EVCN the EVCS.
strategies in a dynamic environment. In [16], a DRL method
was proposed to design retail prices in terms of an EV aggre-
gator by taking into account the discrete nature of the EV 1.1 Contribution and paper organization
charging and discharging levels. With a similar objective, an
EV charging/discharging scheduling problem was defined in This study introduces a novel EVCN strategy with a multi
[17] in terms of EV users. Qian et al. [18] presented a DRL- objective cost function to minimize driving time, EVCS waiting
based EV charging navigation with the aim of minimizing the time and charging cost of an EV. Presented method uti-
total travel time and charging cost at an EVCS. In this study, lizes DRL method for EVCS selection and Dijkstra algorithm
the MILP-based feature extraction model was obtained and taking traffic conditions into account to extract the optimal
used with the deep Q-Learning (DQN) algorithm. The authors path, and a data driven method, GPR, to predict the EVCS
observed that the traditional optimization and DQN algorithm queue times. The main contributions of this study are as
results are close to each other. In addition, the results stated follows:
17519578, 2024, 12, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/itr2.12588 by Test, Wiley Online Library on [05/02/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
2522 ERÜST and TAŞCIKARAOĞLU
∑ ( )
p p p nodes. The smaller portion of a transportation network with the
tk ≥ ti j fi j xi j (4)
i, j
weights dnm is given as follows:
⎛ 𝜉
( ) ⎛ f p⎞ ⎞ ⎡0 d12 inf d14 inf ⎤
⎜ ⎟ ⎢d12 d25 ⎥
= ti j ⎜ 1 + b ⎜ p ⎟ ⎟
p p p,0 i j 0 d23 d24
ti j fi j (5)
⎜ ⎜c ⎟ ⎟ D(G ) = ⎢ inf d23 0 d34 d35 ⎥ (11)
⎝ ⎝ ij ⎠ ⎠ ⎢ ⎥
⎢d14 d24 d34 0 inf ⎥
p,0
⎣ inf d25 d35 inf 0⎦
where, ti j represents the free-flow time of the road, b is the
parameter of the BPR function and 𝜉 determine the function where inf denoting that no direct connection exists between two
shape and utilizes the threshold value for the BPR function. The nodes. The weights of the transportation network are obtained
waiting time required before EV starts its charging process at utilizing the cost of distance and travel time from node i to j by
EVCS k is represented as considering the Equation (4) as below.
∑ ( )
p p
tkwait = tkc lk . (6) dnm = 𝜔𝜂di j + 𝜓ti j fi j (12)
k∈L
Finally, the formulated weights given in Equation (12) are
tkc denotes the estimated waiting time at the EVCS location.
solved using Dijkstra’s algorithm, as presented in [28].
The SOC value of the battery when EV arrives at the EVCS
is calculated as in Equation (7), utilizing initial SOC level, total
distance and battery capacity parameters. Furthermore, the final
SOC on arrival is constrained by a minimum critical level as in
2.4 GPR-based EVCS waiting time model
Equation (8).
Most of the DRL based EVCN solutions rely on randomly
∑ di j xi j distributed waiting time models without considering the time-
ekend = einit − 𝜔 (7) of-day and EV owner behaviours. Therefore, these studies fail
i, j
E max
to analyse daily characteristics of EVCS waiting time. Thus, this
ekend ≥ ekmin (8) section introduces a GPR-based learning model for estimating
the EVCS waiting times by using real world data.
The following equations ensure that only a single EVCS node GPR, detailed in [29], is one of the popular regression algo-
is suggested and a sequential path is selected from initial node rithms in machine learning (ML) that utilizes a non-parametric
to the destination node Bayesian model for solving the problems. Moreover, EVCS
∑ waiting time prediction is a complex task due to the uncertain-
lk = 1 (9) ties in the transportation network and EV owner behaviours.
k∈L Hence, GPR-based models have the advantage of being able to
⎧1, resist uncertainties, which allows to obtain daily charging atti-
∑ ∑ i=q
⎪ tudes of the EV owners due to their probabilistic predictions
xi j − x ji = ⎨0, i ≠ q, i ∉ L (10) [30].
i, j i, j ⎪l , i∈L
⎩i The GPR model output function y is obtained from one
dimensional input vector x by utilizing the training dataset Z =
where binary decision variable xi j identifies the path selec- (xi , yi )N , where training and prediction points are represented
tion on the transportation network when the value is equal by xi and yi respectively and N denotes the number of data sam-
to 1. Otherwise, a path from node i to j is not included. ples for training. In the GPR approach, an unknown function
Similarly, lk is the binary decision variable for EVCS, identify- is used to transition between input data and output prediction
ing the selected EVCS node and q is the optimal path of the model, as given follows:
optimization problem.
The deterministic formulation of EVCN problem requires y = f (x) + 𝜖 (13)
full-state knowledge about transportation network, charging
time, electricity price and SOC level of the EV. However, in where, 𝜖 ∼ 𝜘(0, 𝜎2 ) represents the Gaussian noise over the
practice real-time systems fail to capture all the state knowledge. output prediction model. The GP function f (x) express a
Moreover, the state data usually changes over a time period. probability distribution given as:
m(x) = 𝔼[ f (x)] (15) Action: The EV agent’s action at given in Equation (19), is
the decision-making between EVCS nodes with respect to the
environment states at the current time t . In addition, at each first
𝜅(x, x ′ ) = 𝔼[( f (x) − m(x))( f (x ′ ) − m(x ′ ))T ] (16) node of the optimal path, the EV agent takes sequential actions
until the EV reaches the EVCS node.
The aforementioned covariance function 𝜅(⋅) is commonly
referred to as the kernel function of the GPR process. Various at = k, k ∈ L (19)
kernel functions have been proposed in the literature, taking
into account attributes such as smoothness and expected pat- Reward: The reward function rt is the feedback of the system
terns within the dataset’s training points [31]. In this study, the and obtained from the perspective of the EV agent as described
radial basis function (RBF) kernel is chosen as follows due to its in Equation (20). Moreover, the reward function is derived in
capability to represent smooth and consistent functions. two stages based on the objectives of the EVCN problem given
( ) in Equation (1).
‖x − x ′ ‖2
𝜅(x, x ′ ) = 𝜎f2 exp − (17)
2𝜆2 ⎧ p p
⎪−𝜂𝜔di j − 𝜓ti j ( fi j ), i = 𝛿0 , j = 𝛿1 , pt ≠ 𝛿end
rt = ⎨
where, 𝜆 denotes the length-scale and 𝜎f2 represents the ⎪−(emax − et )𝛼tc E max − 𝜓tkwait , pt = 𝛿end
noise variance of data points. Input function, x, includes the ⎩
arrival/departure time and SOC level parameters of the EVs (20)
and output function y represents the waiting time model during
charging process. By considering the past training data points, where, et represents the current SOC level of the EV, and 𝛿0 and
the GPR model predicts the waiting time for EVs at any given 𝛿1 denote the origin and first node of the optimal path taken by
time. The input data is extensively detailed in Section 4.1. the EV agent to the EVCS. The first part of the reward function
is calculated while the EV is traveling on the network. In Equa-
tion (20), the feedback is computed based on the EV’s road time,
2.5 MDP formulation distance and the EV owner’s cost. The second part of the reward
function, which is charging cost at the EVCS, is determined by
This section introduces the MDP formulation of the EVCN using the charging price and SOC level of the EV. This study
problem from the perspective of EV owners, considering assumes that all EVCSs are identical and have the same power
the randomness of the received data. This data includes the rating values for charging process.
estimated waiting times at EVCSs, ToU electricity price, the Transition: The transition illustrates the mapping of state
time-of-day and traffic flow conditions on the road. Initially, the variables as defined in Equation (21) from time step t to t + 1 by
data is merged with data including EV variables to construct the utilizing the action at . The transition probability between states
environment’s full-state st covering the EV battery level condi- is a challenging problem because it is stochastic. Therefore, a
tion and its position on the map. In this direction, the EV takes DRL-based approach is developed to seek the optimal solution
action at to select an EVCS in the environment at time step as follows,
t . Based on this action, the optimal path and estimated arrival
time are computed using the weighted shortest path algorithm. st +1 = f (st , at , 𝜏t ) (21)
Afterwards, the environment feedback is obtained from reward
function, rt and next states st +1 are reconstructed based on the where 𝜏t represents the disturbance data utilized to introduce
first node of the optimal path 𝛿 o at time step t + 1. A detailed randomness in the state transition. The EV position is updated
explanation of the MDP formulation is provided below as a as
tuple (st , at , rt , st +1 ).
States: The system states st are determined based on the pt +1 = 𝛿1 (22)
constructed environment and derived as
( ) The updated SOC level during EVs travel is calculated as
p p
st = pt , etd , 𝛼tc , Jt , Jtc , t , tk , tkwait (18)
d𝛿1st 𝛿2nd
et +1 = etinit − 𝜔 (23)
where pt denotes the current position of the EV, represents etd emax
the current SOC level at each travelled node, 𝛼tc is the charg-
Value-function: A policy 𝜋(st ) describes the probability of
ing price determined according to the time index of the day,
p selecting an action based on the current state observation. The
which is t , with ToU electricity price data, Jt is calculated at
value-function V 𝜋 (st ) represents the effectiveness of the EV
each time step based on the approach outlined in Equation (2).
agent’s policy as follows:
Jtc is the charging cost of the EV and is formulated as shown in
Equation (3). Finally, tkwait represents the estimated waiting time ( )
∑
T
at the selected EVCS, obtained from the GPR-based waiting V 𝜋 (st ) = rt (st , 𝜋t ) + E 𝜋 𝛾m−t rm (sk , 𝜋k ) (24)
time model. m=t +1
17519578, 2024, 12, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/itr2.12588 by Test, Wiley Online Library on [05/02/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
ERÜST and TAŞCIKARAOĞLU 2525
where, 𝛾 ∈ (0, 1] denotes the discount factor and illustrates the The PPG is an extended architecture of the PPO algorithm
impact of immediate and future rewards. T represents terminal [26], and representing one of the state-of-the-art model-free
phase when the EV agent reaches one of the EVCS node. The policy gradient based DRL methods in the current literature.
goal here is to obtain optimal EV agent policy 𝜋∗ that produces The PPG algorithm optimizes the objective of problem in two
∗
the optimal value-function V 𝜋 (st ) for the best EVCS selection distinct phases: the policy phase and the auxiliary phase. Dur-
as follows ing the policy phase, PPO algorithm [26] is employed to train
the EV agent. In the auxiliary phase, valuable features of the
value-function are distilled into policy network to enhance the
∗
V 𝜋 (st ) = max V (st ) (25) subsequent policy phase.
It is a quite challenging problem to solve the optimal path plan- L clip = 𝔼̂t [min(rt (𝜃)A
̂ t , clip(rt (𝜃), 1 − 𝜖, 1 + 𝜖)A
̂ t )] (26)
ning problems with high dimensional uncertainties and no prior
data. Thus, one of the possible approach is to use DRL algo- where, 𝜖 represents the clipping parameter utilized to prevent
rithms to optimize aforementioned EV agent policy [32]. In this local maxima during the exploration phase and A ̂ t represents
study, PPG algorithm is employed to address EVCN problem the effectiveness of the action at time t and is calculated by
due to the its effectiveness which is detailed in [33]. generalized advantage estimate (GAE) approach proposed in
17519578, 2024, 12, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/itr2.12588 by Test, Wiley Online Library on [05/02/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
2526 ERÜST and TAŞCIKARAOĞLU
[34]. The surrogate function denotes the ratio between the ALGORITHM 1 Shortest path based PPG algorithm training.
current 𝜋𝜃 and previous 𝜋𝜃old network parameters serving to
avoid from instantaneous parameter updates during training as 1 Initialization of Policy Network parameter 𝜃,
follows: 2 Value-function network 𝜙 and Buffer with B
p p
3 Receive the initial data: (𝜔, 𝜓, etd , pt , di j , ti j ( fi j ), 𝜂, 𝛼tc , tkwait )
𝜋𝜃 (at |st )
rt (𝜃) = (27) 4 Solve the shortest path by utilizing transportation network
𝜋𝜃old (at |st )
features,obtain the initial state s0 and road features
where, at and st represents the current action and state pair. 5 for t=1,2,...,T do
The value-function network, denoted by 𝜙, updates its network 6 Initialize a buffer B
parameters using temporal difference (TD) error. The objective 7 //Policy Phase Part
function for optimizing the value-function network is provided 8 for iter=1,2,...,N𝜋 do
as 9 Perform roll-outs under the policy 𝜋
[1 ( ] 10 Compute the target value function V̂ (st ) for all
target )2
L vf = 𝔼̂ t V𝜙 (st ) − V̂ t (28) states
2 11 //Policy Network Training Part
12 for count1 =1,2,...,E𝜋 do
where, V̂ t
target
represents the targets for the value-function
13 Optimize L clip function w.r.t 𝜃
network and is calculated by GAE approach [34].
14 end
15 //Value-Function Network Training Part
16 for count2 =1,2,...,Ev do
3.1.2 Auxiliary phase 17 Optimize L v f function w.r.t 𝜙
18 end
In the auxiliary phase, the joint objective function is utilized to
19 Store all (st , V̂ (st )) to B
optimize the policy network. It combines a behavioural cloning
loss and an arbitrary auxiliary loss as follows: 20 end
21 Compute and store policy 𝜋old (⋅|st ) for all states in B
L joint = L aux + 𝛽clone .𝔼̂t [KL[𝜋𝜃old (⋅|st ), 𝜋𝜃 (⋅|st )]] (29) 22 //Auxiliary Phase Part
23 for count3 =1,2,...,Eaux do
where, KL(𝜋𝜃old |𝜋𝜃 ) represents the Kullback–Leibler diver- 24 Optimize L joint function w.r.t 𝜃, on all data in B
gence function, which provides the probability distribution 25 Optimize L v f function w.r.t 𝜙, on all data in B
between the current and old policy. 𝜋𝜃old is considered as the 26 end
policy at the beginning of the auxiliary phase and the original 27 end
policy is preserved by utilizing the parameter of 𝛽clone . Finally,
L aux denotes the auxiliary objective function which is selected
to be same as the value-function network objective function, as
suggested in [33]. action at and state st . The last policy is then stored in the buffer
for all states to initialize the auxiliary phase.
Finally, in auxiliary phase L joint and L vf are optimized
3.2 Training of shortest path based PPG according to the data stored in the buffer B. The parameters
algorithm controlling the overall policy phase, policy network, value-
function network, and overall auxiliary phase iterations are
This section introduces the training process and simulation denoted by N𝜋 , E𝜋 , EV , and Eaux , respectively.
parameters of the proposed PPG algorithm. Below, Algorithm 1
outlines the training of the EV agent using PPG algorithm for Remark 1. Let |𝕏| denote the number of roads, |ℕ| represent
the above discussed EVCN problem. The parameters of EV the number of nodes and |ΩEVCS | denote the number of EVCS
p p
(pt , 𝜔, 𝜓, etd ), transportation network (di j , ti j ( fi j )) and EVCS nodes in the transportation network. The computational com-
(𝜂, 𝛼tc , tkwait ) are utilized as input to the algorithm. The out- plexity analysis of the proposed algorithm is performed in two
put of the algorithm is the trained policy which can effectively parts. In the first part, considering Line 4, the computational
guide EV through the transportation network with optimal complexity of the shortest path based on Dijkstra’s algorithm is
EVCS selection. Training process begins with the initialization defined as O((|𝕏| + (|ℕ| + |ΩEVCS )|) log(|ℕ| + |ΩEVCS |)). In
of the policy network 𝜃, value-function network 𝜙, and buffer B. the second part of the algorithm, computational complexity of
Then, shortest path algorithm is executed based on the received the DRL approach is defined by state st and action at [35] which
initial data to obtain initial state s0 for the learning process. are generally less than the number of roads O(|𝕏|). Hence,
In the policy phase, the policy and value-function networks the proposed PPG approach does not significantly increase
are trained based on the objective functions L clip and L vf to complexity, particularly when the number of roads |𝕏| in the
update the current EVCS selection policy according to the network is large.
17519578, 2024, 12, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/itr2.12588 by Test, Wiley Online Library on [05/02/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
ERÜST and TAŞCIKARAOĞLU 2527
TABLE 1 PPG algorithm simulation parameters. TABLE 2 Total time spent based on different EV owner cost.
E𝜋 1
EV 1
Eaux 6
Learning rate 0.001
Batch size 32
Epoch size 20k
Size of buffer B 32
Discounting factor 0.99
Extra discount factor 0.95
Entropy weight 0.005
Clipping parameter 0.2
Policy network size of layer and neurons 2, 256
Value-function network size of layer and neurons 2, 256
traffic flows, demonstrating the EV agent’s ability to adapt to 7. Xiang, Y., Jiang, Z., Gu, C., Teng, F., Wei, X., Wang, Y.: Electric vehicle
dynamic system states. charging in smart grid: a spatial-temporal simulation method. Energy 189,
∙ Finally, the EV agent is trained by considering user prefer- 116221 (2019)
8. Zhong, J., Yang, N., Zhang, X., Liu, J.: A fast-charging navigation strategy
ences based on EV owner costs. The case study shows a for electric vehicles considering user time utility differences. Sustainable
decrease of 7-10 % in total time spent on the network. This Energy Grids Networks 30, 100646 (2022)
implies that the proposed strategy facilitates weighting the 9. Shao, S., Guan, W., Bi, J.: Electric vehicle-routing problem with charging
time spent and charging cost in the objective function. demands and energy consumption. IET Intell. Transp. Syst. 12(3), 202–212
(2018)
10. Tan, J., Wang, L.: Real-time charging navigation of electric vehicles to fast
The overall case study results demonstrate that the proposed charging stations: a hierarchical game approach. IEEE Trans. Smart Grid
EVCN strategy achieves promising outcomes in minimiz- 8(2), 846–856 (2015)
ing total time spent on the network and charging cost 11. Liu, C., Zhou, M., Wu, J., Long, C., Wang, Y.: Electric vehicles en-route
by considering the time-of-day and EV owner preference charging navigation systems: joint charging and routing optimization.
IEEE Trans. Control Syst. Technol. 27(2), 906–914 (2017)
parameters. Future studies aim to incorporate real-time dis-
12. Cerna, F.V., Pourakbari-Kasmaei, M., Romero, R.A., Rider, M.J.: Optimal
tribution grid parameters such as peak power demand into delivery scheduling and charging of EVs in the navigation of a city map.
the EVCN problem to investigate effectiveness of the pro- IEEE Trans. Smart Grid 9(5), 4815–4827 (2017)
posed DRL approach. PPG-based DRL allows more efficient 13. Sharif, M., Seker, H.: Smart EV charging with context-awareness: enhanc-
use of data for high-dimensional systems in which data col- ing resource utilization via deep reinforcement learning. IEEE Access 12,
7009–7027 (2024)
lection can be cumbersome. Besides, reduced number of
14. Zhang, C., Liu, Y., Wu, F., Tang, B., Fan, W.: Effective charging planning
iterations is sufficient to achieve similar or better performance based on deep reinforcement learning for electric vehicles. IEEE Trans.
compared to PPO-based DRL method in high-dimensional Intell. Transp. Syst. 22(1), 542–554 (2020)
systems. 15. Miletić, M., Ivanjko, E., Gregurić, M., Kušić, K.: A review of reinforce-
ment learning applications in adaptive traffic signal control. IET Intell.
Transport Syst. 16(10), 1269–1285 (2022)
AUTHOR CONTRIBUTIONS
16. Qiu, D., Ye, Y., Papadaskalopoulos, D., Strbac, G.: A deep reinforcement
Ali Can Erüst: Methodology; software; writing—original learning method for pricing electric vehicles with discrete charging levels.
draft. Fatma Yıldız Taşcıkaraoğlu: Supervision; validation; IEEE Trans. Ind. Appl. 56(5), 5901–5912 (2020)
writing—review and editing. 17. Li, H., Wan, Z., He, H.: Constrained EV charging scheduling based on safe
deep reinforcement learning. IEEE Trans. Smart Grid 11(3), 2427–2439
(2019)
CONFLICT OF INTEREST STATEMENT
18. Qian, T., Shao, C., Wang, X., Shahidehpour, M.: Deep reinforcement learn-
The authors declare no conflicts of interest. ing for EV charging navigation by coordinating smart grid and intelligent
transportation system. IEEE Trans. Smart Grid 11(2), 1714–1723 (2019)
DATA AVAILABILITY STATEMENT 19. Aljohani, T.M., Ebrahim, A., Mohammed, O.: Real-time metadata-driven
The data that support the findings of this study are available routing optimization for electric vehicle energy consumption minimization
using deep reinforcement learning and Markov chain model. Electr. Power
on request from the corresponding author. The data are not
Syst. Res. 192, 106962 (2021)
publicly available due to privacy or ethical restrictions. 20. Wan, Z., Li, H., He, H., Prokhorov, D.: Model-free real-time EV charging
scheduling based on deep reinforcement learning. IEEE Trans. Smart Grid
ORCID 10(5), 5246–5257 (2018)
Fatma Yıldız Taşcıkaraoğlu https://orcid.org/0000-0003- 21. Jiang, C., Zhou, L., Zheng, J., Shao, Z.: Electric vehicle charging navigation
strategy in coupled smart grid and transportation network: a hierarchical
1866-2515
reinforcement learning approach. Int. J. Electr. Power Energy Syst. 157,
109823 (2024)
REFERENCES 22. Zhang, W., Liu, H., Wang, F., Xu, T., Xin, H., Dou, D., Xiong, H.: Intel-
1. Jannati, J., Nazarpour, D.: Optimal energy management of the smart park- ligent electric vehicle charging recommendation based on multi-agent
ing lot under demand response program in the presence of the electrolyser reinforcement learning. In: Proceedings of the Web Conference 2021, pp.
and fuel cell as hydrogen storage system. Energy Convers. Manage. 138, 1856–1867. ACM, New York, NY (2021)
659–669 (2017) 23. Xing, Q., Xu, Y., Chen, Z., Zhang, Z., Shi, Z.: A graph reinforcement
2. Ji, Z., Huang, X.: Plug-in electric vehicle charging infrastructure deploy- learning-based decision-making platform for real-time charging navigation
ment of china towards 2020: policies, methodologies, and challenges. of urban electric vehicles. IEEE Trans. Ind. Inf. 19(3), 3284–3295 (2023)
Renewable Sustainable Energy Rev. 90, 710–727 (2018) 24. Jin, J., Xu, Y.: Shortest-path-based deep reinforcement learning for EV
3. Cheng, X., Zhang, R., Yang, L.: Wireless toward the era of intelligent charging routing under stochastic traffic condition and electricity prices.
vehicles. IEEE Internet of Things J. 6, 188–202 (2018) IEEE Internet Things J. 9(22), 22571–22581 (2022)
4. Tascikaraoglu, F.Y., Aksoy, G.: Identification of sensor location and link 25. Ukkusuri, S.V., Yushimito, W.F.: A methodology to assess the criticality of
flow reconstruction using turn ratio and flow sensors in an arterial highway transportation networks. J. Transp. Secur. 2, 29–46 (2009)
network. J. Intell. Transp. Syst. 28(2), 163–173 (2024) 26. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal
5. Luo, Y., Feng, G., Wan, S., Zhang, S., Li, V., Kong, W.: Charging scheduling policy optimization algorithms. arXiv:1707.06347 (2017)
strategy for different electric vehicles with optimization for convenience of 27. Xiang, Y., Liu, J., Li, R., Li, F., Gu, C., Tang, S.: Economic planning of
drivers, performance of transport system and distribution network. Energy electric vehicle charging stations considering traffic constraints and load
194, 116807 (2020) profile templates. Appl. Energy 178, 647–659 (2016)
6. Zhong, J., Liu, J., Zhang, X.: Charging navigation strategy for electric 28. Dijkstra, E.W.: A note on two problems in connexion with graphs. In: Eds-
vehicles considering empty-loading ratio and dynamic electricity price. ger Wybe Dijkstra: His Life, Work, and Legacy, pp. 287–290. ACM, New
Sustainable Energy Grids Networks 34, 100987 (2023) York, NY (2022)
17519578, 2024, 12, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/itr2.12588 by Test, Wiley Online Library on [05/02/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
ERÜST and TAŞCIKARAOĞLU 2531
29. Williams, C.K., Rasmussen, C.E.: Gaussian processes for machine learning, 37. Beyazıt, M.A., Taşcıkaraoğlu, A.: Electric vehicle charging through mobile
Vol. 2, MIT Press, Cambridge, MA (2006) charging station deployment in coupled distribution and transportation
30. Li, W., Fan, Y., Ringbeck, F., Jöst, D., Sauer, D.U.: Unlocking electro- networks. Sustainable Energy Grids Networks 35, 101102 (2023)
chemical model-based online power prediction for lithium-ion batteries via 38. Kane, M.: 2021 Tesla model 3 LR AWD with 82 kWh battery: charg-
Gaussian process regression. Appl. Energy 306, 118114 (2022) ing analysis. Available: https://insideevs.com/news/519382/teslamodel3-
31. Schulz, E., Speekenbrink, M., Krause, A.: A tutorial on Gaussian pro- 82kwh-charging-analysis/. Accessed 30 Oct 2024
cess regression: modelling, exploring, and exploiting functions. J. Math. 39. N.B. of Statistics of China.: The avarage level of wages in main industry
Psychol. 85, 1–16 (2018) in China in 2017. Available: https://data.stats.gov.cn/english/easyquery.
32. Erüst, A.C., Beyazıt, M.A., Taşcıkaraoğlu, F.Y., Taşcıkaraoğlu, A.: Deep htm?cn=C01. Accessed 30 Oct 2024
reinforcement learning-based navigation strategy for a mobile charging 40. Şengör, İ., Güner, S., Erdinç, O.: Real-time algorithm based intelligent EV
station in a dynamic environment. In: 2023 International Conference parking lot charging management strategy providing PLL type demand
on Smart Energy Systems and Technologies (SEST), pp. 1–6. IEEE, response program. IEEE Trans. Sustainable Energy 12(2), 1256–1264
Piscataway, NJ (2023) (2020)
33. Cobbe, K.W., Hilton, J., Klimov, O., Schulman, J.: Phasic policy gradi-
ent. In: International Conference on Machine Learning, pp. 2020–2027.
Microtome Publishing, Brookline, MA (2021)
34. Schulman, J., Moritz, P., Levine, S., Jordan, M., Abbeel, P.: High-
How to cite this article: Erüst, A.C., Taşcıkaraoğlu,
dimensional continuous control using generalized advantage estimation.
arXiv:1506.02438 (2015) F.Y.: Spatio-temporal dynamic navigation for electric
35. Rasheed, F., Yau, K.-L.A., Noor, R.M., Wu, C., Low, Y.-C.: Deep rein- vehicle charging using deep reinforcement learning. IET
forcement learning for traffic signal control: a review. IEEE Access 8, Intell. Transp. Syst. 18, 2520–2531 (2024).
208016–208044 (2020) https://doi.org/10.1049/itr2.12588
36. Ukkusuri, S.V., Yushimito, W.F.: A methodology to assess the criticality of
highway transportation networks. J. Transp. Secur. 2, 29–46 (2009)