0% found this document useful (0 votes)

98 views11 pages

Learning To Operate An Electric Vehicle Charging Station Considering Vehicle-Grid Integration

The document presents a reinforcement learning framework called CADE (centralized allocation and decentralized execution) to optimize the profitability of electric vehicle charging stations considering vehicle-grid integration. The framework uses a centralized allocation process to assign vehicles to waiting or charging spots, and a decentralized execution process where each charger independently makes charging/discharging decisions using a shared replay memory to learn optimal policies. Numerical results show the CADE framework is computationally efficient and scalable, and outperforms a baseline model predictive control approach.

Uploaded by

tamil1234selvan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

98 views11 pages

Learning To Operate An Electric Vehicle Charging Station Considering Vehicle-Grid Integration

Uploaded by

tamil1234selvan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

3038 IEEE TRANSACTIONS ON SMART GRID, VOL. 13, NO.

4, JULY 2022

Learning to Operate an Electric Vehicle Charging

Station Considering Vehicle-Grid Integration
Zuzhao Ye , Graduate Student Member, IEEE, Yuanqi Gao , Member, IEEE,
and Nanpeng Yu , Senior Member, IEEE

Abstract—The rapid adoption of electric vehicles (EVs) calls amin /amax The minimal/maximum charging power. The
for the widespread installation of EV charging stations. To minimal charging power corresponds to the
maximize the profitability of charging stations, intelligent con- maximum discharging power.
trollers that provide both charging and electric grid services are
in great need. However, it is challenging to determine the optimal B Net revenue of the charging station by charg-
charging schedule due to the uncertain arrival time and charging ing and discharging EV batteries.
demands of EVs. In this paper, we propose a novel centralized Cl Demand charge of the charging station.
allocation and decentralized execution (CADE) reinforcement Cp Penalty paid by the charging station for not
learning (RL) framework to maximize the charging station’s charging EVs to their target energy levels.
profit. In the centralized allocation process, EVs are allocated
to either the waiting or charging spots. In the decentralized exe- er Remaining energy to be charged for an EV,
cution process, each charger makes its own charging/discharging i.e., the difference between e and etgt .
decision while learning the action-value functions from a shared eini /efnl /etgt The initial/final/target energy level of an EV.
replay memory. This CADE framework significantly improves the emin /emax The minimum/maximum energy level of an
scalability and sample efficiency of the RL algorithm. Numerical EV.
results show that the proposed CADE framework is both com-
putationally efficient and scalable, and significantly outperforms Er,w Sum of er of the EVs in the waiting area.
the baseline model predictive control (MPC). We also provide an eit Energy level of the ith EV at time t.
in-depth analysis of the learned action-value function to explain H The set of time-of-use periods.
the inner working of the reinforcement learning agent. h h ∈ H, one of the time-of-use periods.
Index Terms—Electric vehicle, charging station, vehicle-grid I The set of EVs that have arrived at the
integration, reinforcement learning. charging station in T.
J The set of all chargers in the charging station.
N OMENCLATURE Lht Recorded peak power of a charging station in
βit A binary variable indicating if the ith EV is time-of-use period h at time t.
in the charging station at time t. mt The net revenue of charging/discharging an
a Interval between neighboring values in action EV with 1kWh of electricity.
set A. N The total number of parking spots in a charg-
t Length of one time step. ing station.
ηit A binary variable indicating if the ith EV is N c /N w The total number of chargers/waiting spots.
connected to a charger at time t. pc The price paid by an EV customer to the
A The set of all discrete actions of a reinforce- charging station for 1kWh of electricity.
ment learning agent. pd The price the charging station pays to EV
a a ∈ A, one of the actions, i.e., the charg- customers for discharging 1kWh of electricity.
ing/discharging power selected by a charger pe The price of electricity charged by the electric
for an EV. utility.
alower /aupper The lower/upper bound of charging power pl The price the charging station pays to the
constrained by battery energy level. electric utility for 1kW of recorded peak
power.
Manuscript received November 8, 2021; revised February 15, 2022; Q(s, a) The state-action value function of taking
accepted March 24, 2022. Date of publication April 6, 2022; date of cur- action a at state s.
rent version June 21, 2022. This work was supported by the University of
California Institute of Transportation Studies from the State of California r The reward received by a charger.
through the Public Transportation Account and the Road Repair and rb The reward received by a charger through
Accountability Act of 2017 (Senate Bill 1). Paper no. TSG-01793-2021. performing charging/discharging of EVs.
(Corresponding author: Nanpeng Yu.)
The authors are with the Department of Electrical and Computer rl The reward received by a charger for increas-
Engineering, University of California at Riverside, Riverside, CA 92521 USA ing the recorded peak power.
(e-mail: [email protected]). rp The reward received by a charger for not
Color versions of one or more figures in this article are available at
https://doi.org/10.1109/TSG.2022.3165479. charging an EV to its target energy level.
Digital Object Identifier 10.1109/TSG.2022.3165479 s The state of environment of a charger.
1949-3053
c 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: VIT University. Downloaded on January 09,2023 at 05:36:14 UTC from IEEE Xplore. Restrictions apply.
YE et al.: LEARNING TO OPERATE EV CHARGING STATION CONSIDERING VEHICLE-GRID INTEGRATION 3039

T A discrete time horizon. issues of model-based methods, researchers are turning to rein-
t t ∈ T, one of the time steps. forcement learning (RL), which excels at solving sequential
ta /td The arrival/departure time of an EV. decision-making problems in real-time. The RL agent aims at
tr The remaining dwelling time of an EV. learning a good control policy by interacting with the envi-
TB The billing period of demand charge Cl . ronment or a simulation system to achieve the best long-term
Z Profit of the charging station. rewards. Combining RL algorithms with deep neural networks
has led to breakthroughs in solving complex problems with
large-scale state and action spaces [17].
I. I NTRODUCTION The application of RL to schedule the charging of a sin-
N THE past decade, the global electric vehicle (EV) mar- gle EV is widely covered in the literature. An early attempt
I ket has been growing exponentially thanks to the rapid
advancement of battery technologies. To support further pene-
leverages the tabular Q-learning method [18], which can
only deal with small-scale state and action space. To over-
tration of EVs, it is critical to develop smart charging stations come the limitations of tabular methods, kernel averaging
that could satisfy the charging needs in a cost-effective manner. regression functions are used to approximate the action-
The topic of charging station operation optimization has value functions in scheduling the charging of an individual
been widely researched with model-based algorithms. The EV [19]. Reference [20] further extends the charging schedul-
pricing and scheduling of EV charging are jointly consid- ing problem for a single EV by considering bidirectional
ered in the charging station profit maximization problem energy transfer and utilizing deep neural networks to approx-
and solved by an enhanced Lyapunov optimization algo- imate state-action values. Apart from reducing the charging
rithm [1]. By jointly considering the admission control, cost, drivers’ anxiety about the EVs’ range [21] and pho-
pricing, and charging scheduling, a charging station profit tovoltaic self-consumption [22] can also be quantified and
maximization problem is formulated and solved by the com- included as training targets. To ensure the EV is sufficiently
bination of multi-sub-process based admission control and charged upon departure, [23] introduces an exponential penalty
gradient-based optimization algorithm [2]. The planning and term to encourage the charger finishing the charging task on
operation problem of single output multiple cables charging time, while [24] utilizes constrained policy optimization to
spots is formulated and solved by a two-stage stochastic pro- learn a safe policy without the need to manually design a
gramming [3]. The charging station scheduling problem that penalty term. Long short-term memory (LSTM) is also widely
considers vehicle-to-vehicle charging is formulated as a con- used to predict the electricity price as an input to the RL
strained mixed-integer linear program and solved by dual and algorithm [20], [25]–[27].
Benders decomposition [4]. The scheduling of battery replace- Compared to the single EV algorithms, it is much more
ment in charging stations is solved by a multistage stochastic challenging to apply RL to coordinate the charging of multiple
programming algorithm [5]. By considering renewable gener- EVs in a charging station. This is because the dimensional-
ation and energy storage systems in the design [6]–[8] and ity of the state space is changing with the stochastic arrival
operation of charging stations [9], [10], significant economic and departure of EVs. To address this problem, a two-stage
benefits can be gained. heuristic RL algorithm is proposed [28]. In the first stage, an
Model-based control approaches have also been applied RL agent learns the collective EV fleet charging strategy. In
to schedule EV charging considering vehicle-grid integration the second stage, a simple heuristic is used to translate collec-
(VGI). A decentralized control algorithm to optimally schedule tive control action back to individual charger control action.
EV charging is developed to fill the valleys in electric load pro- The drawback of this approach is that the simple heuristic
files [11]. To mitigate the impact on distribution networks, the cannot guarantee optimal charging scheduling. An alternative
EV charging and discharging coordination problem is solved approach to tackle the difficulty of dimension-varying state
with the strategy of the coalition formation [12]. To smooth out and action space is to represent the state-action function using
the power fluctuations caused by distributed renewable energy a linear combination of a set of carefully designed feature
sources, a joint iterative optimization method is developed to functions [29]. However, designing good feature functions is
schedule the charging of EVs and minimize the cost of load a manual process that can be challenging. To overcome this
regulation [13]. The problem of routing and charging schedul- challenge, [30] develops a novel grid representation of EVs’
ing of EVs considering VGI is formulated as an optimization required and remaining charging time such that the dimen-
problem solved by an approximated algorithm [14]. The VGI, sions of state and action are constants. The problem with this
vehicle-to-vehicle energy transfer, and demand-side manage- grid approach is that the charger can only select to charge
ment in vehicle-to-building aspects have also been studied with with full power or to not charge, without any intermediate
the mixed integer programming [15] and a distributed control power outputs. While in most of the research papers RL
algorithm [16]. agents are deployed to determine output charging powers, [31]
The aforementioned model-based methods rely heavily on instead uses RL to predict future conditions. A model-based
sophisticated algorithm designs that are tailored for specific method is used determine charging powers. The drawback of
charging scenarios. Most of the model-based methods cast the this approach is that the prediction can hardly be accurate.
charging station scheduling problem as a complex optimization Researchers also studied the problem of charging scheduling
problem, which is time-consuming to solve when the scale of multiple individual EVs in a local area to reduce the load of
of the charging station increases. To address the scalability the grid [32], [33]. In these work, each charger is an RL agent

Authorized licensed use limited to: VIT University. Downloaded on January 09,2023 at 05:36:14 UTC from IEEE Xplore. Restrictions apply.
3040 IEEE TRANSACTIONS ON SMART GRID, VOL. 13, NO. 4, JULY 2022

and collective features (e.g., the total load) are usually intro- problem. Case studies are included in Section IV to demon-
duced to the state vector. These pioneering studies demonstrate strate the effectiveness of our proposed method. Section V
the potential of RL in charging scheduling and pave the way provides a brief conclusion.
for further research.
Other interesting applications of RL related to charging II. P ROBLEM F ORMULATION
include the scheduling of multiple charging stations where
We will first formulate the charging station scheduling
each charging station is treated as a unit [34], [35], the
problem as an optimization problem. Let us consider a charg-
dynamic pricing for charging services [36]–[38], and the
ing station with N parking spots, among which N c are charging
scheduling of battery swapping stations [39], [40].
spots and N w are waiting spots, such that N = N w + N c .
In this paper, we propose an innovative centralized alloca-
When the number of EVs in the charging station is larger than
tion and decentralized execution (CADE) framework to sched-
the number of chargers, the charging station needs to deter-
ule the charging activities at the charging station. Compared
mine which EVs shall be connected to the chargers. Here, we
to previous work [28]–[31], the problem formulation in this
assume that the time of switching an EV from a waiting spot
paper considers waiting area, VGI, and demand charge, mak-
to a charger or vice versa is negligible.
ing it more realistic and comprehensive. By synergistically
The goal of the charging station’s scheduling system is to
combining the merits of RL and advanced optimization, our
maximize its profit while satisfying EV customers’ charging
proposed CADE framework is highly scalable and yields com-
needs. The operating profit of a charging station Z consists of
petitive charging station profit compared to state-of-the-art
three components: the net revenue of charging and discharg-
baseline algorithms. The centralized allocation module lever-
ing EVs, B, the penalty of failing to satisfy the customers’
ages an optimization technique to determine which EVs should
charging needs, Cp , and the demand charge of the station, Cl
be connected to the chargers. To address the challenge of the
as shown in (1).
dimension-varying state and action space of the charging sta-
tion, we propose a decentralized execution scheme, where each Z = B − Cp − Cl (1)
charger is represented by an RL agent and determines its own
charging/discharging actions. The individual chargers will sub- We assume that Z is measured over a time horizon T, which
mit operational experiences to a shared replay memory, which consists of intervals of equal length t. All of the incoming
provides training samples to update the policies of the RL EVs during T are denoted by a set I. We will describe each
agents. Most of the existing algorithms neglect the demand component of Z in detail in the next few subsections.
charge cost as it is extremely difficult to predict the peak
demand of a charging station ahead of time. In our proposed A. Net Revenue From Charging and Discharging EVs
framework, an online algorithm is designed to assign the The EV owners pay the charging station pct for each kWh of
charging station-wide penalty associated with demand charge electricity charged and receive pdt for each kWh of electricity
to the individual chargers. discharged from the batteries, which was sold to the electric
The contributions of this paper are summarized below. grid. The charging station pays or receives pet for each kWh
1) We propose a scalable centralized allocation and decen- of energy bought from or sold to the electric utility.
tralized execution (CADE) framework to operate an Suppose at time t, the ith EV is charged/discharged at a
EV charging station considering waiting area, vehicle- power level ait , where ait > 0 means charging and ait <
grid integration, and demand charge. By synergistically 0 means discharging. The net revenue from charging and
combining advanced optimization with reinforcement discharging EVs, B, can be calculated as:
learning algorithms, our proposed approach identifies
both the EV assignment and the chargers’ outputs that B= mt |ait |t, (2)
yield high charging station profit. i∈I t∈T
2) By modeling each charger as an intelligent RL agent, our where mt is the net revenue of charging/discharging an EV
proposed algorithm tackles the challenge of dealing with with 1kWh of energy at time t and can be calculated by:
dimension-varying state and action space associated with c
stochastic EV arrivals and charging needs. To improve pt − pet if ait ≥ 0
mt = (3)
the learning efficiency, we allow the chargers to exploit pet − pdt if ait < 0
a shared replay memory of operational experience in
Note that the price paid to the customer for discharging the
learning decentralized charger control policies.
EV battery is higher than the EV charging cost, i.e., pdt > pct ,
3) Comprehensive numerical studies demonstrate that our
∀t. This is necessary to cover the degradation cost of the EV
proposed CADE framework is not only extremely scal-
battery [41], [42].
able but also outperforms state-of-the-art baseline algo-
rithms in terms of charging station profit.
The rest of this paper is structured as follows. Section II for- B. Penalty of Failing to Satisfy Customers’ Charging Needs
mulates the charging scheduling problem as an optimization Upon arrival, an EV customer i will specify the departure
tgt
problem. Then, Section III reformulates the charging schedul- time tid and the desired target energy level ei at departure.
ing problem as a Markov Decision Process and proposes the The charging station will try to ensure that the EV’s energy
tgt
CADE framework to solve the sequential decision making level is not lower than ei at time tid . If the target energy level

is not met, a penalty will be imposed on the charging station D. Summary of the Charging Station Optimization Problem
p
to compensate the customer. The penalty is denoted as ci . The charging station scheduling problem can be formu-
Such a penalty shall reflect the gap between the final energy lated as the following optimization problem, which aims at
fnl tgt
level ei and the target energy level ei . For simplicity, we maximizing the profit Z (10) while satisfying the operational
p
use a linear function to calculate ci as suggested by [33]: constraints (11a) - (11g):

p tgt fnl +
ci = μ ei − ei , (4) max Z = mt |ait |t
ait ,ηit
i∈I t∈T
where μ is a constant coefficient and x+ = max(x, 0). Note ⎡ ⎛ d ⎞⎤+
fnl tgt fnl ti
that a penalty is triggered only when ei < ei . ei can be
μ⎣ei − ⎝eini ait T ⎠⎦
tgt
derived by (5): − i +
d
i∈I t=tia
ti
+
fnl
ei = eini + ait T, (5) T l
i − ph · max ait , (10)
t=tia TB t∈Th
h∈H i∈I
where eini a
i is the initial energy level of the ith EV and ti is subject to:
its arrival time. Also note that an EV customer cannot specify
an etgt higher than a threshold that is physically achievable amin ≤ ait ≤ amax , ∀i ∈ I, ∀t ∈ T (11a)
with the charging infrastructure. We explicitly express this ait = 0, if ηit = 0, ∀i ∈ I, ∀t ∈ T (11b)
requirement as shown in (6): ηit ∈ {0, 1}, ∀i ∈ I, ∀t ∈ T (11c)

tgt
ei ≤ min emax , eini
i +a
max d
ti − tia , (6) ηit ≤ N c , ∀t ∈ T (11d)
i∈I
where emax is the maximum capacity of the battery, and amax
is the maximal charging power. βit − ηit ≤ N w , ∀t ∈ T (11e)
p
The total penalty for a charging station Cp is the sum of ci i∈I i∈I

over all EVs as shown in (7): βit − ηit ≥ 0, ∀i ∈ I, ∀t ∈ T (11f)

p t

Cp = ci . (7) emin ≤ eini ait t ≤ emax , ∀i ∈ I, ∀t ∈ tia , tid
i +
i∈I t=tia

C. Demand Charge (11g)

Another important component of the charging station oper- The operational constraints can be grouped into the follow-
ation cost is the demand charge, which is calculated based ing three categories.
on the maximum load in the billing period T B . Note that the 1) Power Constraints: The feasible range of charging
demand charge during on-peak and off-peak periods can be power for an EV is defined by (11a). Equation (11b) states
drastically different, which incentivizes charging stations to that an EV shall receive zero power if it is not connected to a
charge EV batteries during off-peak hours and discharge EV charger, where ηit is a binary variable indicating if the ith EV
batteries during on-peak hours. is connected to a charger at time t. ηit = 1 indicates that the
Let us denote the set of time-of-use periods to be H, and EV is connected to a charger.
h ∈ H is one of the time-of-use periods. The price to be paid 2) Space Constraints: The total number of EVs connected
for each kW of peak demand in time-of-use period h is plh . to chargers at the same time is limited by (11d). Equation (11e)
The charging station’s demand charge Cl can be calculated limits the number of EVs in the waiting spots, where βit is
by (8): a binary variable indicating if the ith EV is in the charging
T l + station at time t. βit = 1 means that the EV is in the charging
Cl = B ph · Lh , (8) station. If an EV is in the charging station, it is either con-
T
h∈H nected to a charger or in a waiting spot. Equation (11f) implies
where T/TB serves as a scaling factor if we choose a sim- that an EV cannot be connected to a charger if it is not in the
ulation time horizon T that is longer or shorter than TB . Lh charging station.
is the peak load recorded in time-of-use period h during T. 3) Energy Constraints: The battery energy level of an EV
The electric utility does not compensate the customer when must fall within the feasible range of [emin , emax ] when it is
the peak demand is negative. Lh is calculated by summing up in the charging station as specified in (11g).
tgt a
the charging/discharging power of all the EVs at a given time If all future information of EVs, i.e., βit , eini
i , ei , ti , and
t ∈ Th and taking the maximal value among the results, as ti ∀i ∈ I, ∀t ∈ T are known, then an oracle is able to obtain
d
shown in (9): the global optimal solution of the charging station scheduling

problem. In this case, the charging station scheduling problem
Lh = max ait , (9) is formulated as a mixed integer programming (MIP) problem.
t∈Th
i∈I In practice, it is impossible to have perfect predictions
where Th = f (h) and f : H → T, i.e., Th is the set of time of arriving EVs. Instead, model predictive control methods
steps in time-of-use period h. are usually developed to handle similar problems [43], [44].

Authorized licensed use limited to: VIT University. Downloaded on January 09,2023 at 05:36:14 UTC from IEEE Xplore. Restrictions apply.
3042 IEEE TRANSACTIONS ON SMART GRID, VOL. 13, NO. 4, JULY 2022

Fig. 1. Overview of the centralized allocation and decentralized execution (CADE) framework.

However, the effectiveness of the MPC-based algorithm heav- Suppose an agent follows a policy π to make decisions, we
ily relies on accurate predictions of future EV arrivals and can define the state-action value function of the policy in (12)
does not scale well with the size of the problem. This to quantify how good it is to take an action a in a state s:
motivates us to develop model-free methods, such as reinforce-
Qπ (s, a) = E[Rt |st = s, at = a, π ] (12)
ment learning-based algorithms, which is capable of making
real-time decisions in complex environments. The state-action value function must satisfy the following
Bellman equations (13):

III. T ECHNICAL M ETHODS Qπ (s, a) = Es r + γ Ea ∼π(s ) Qπ s , a |s, a, π (13)
We propose a centralized allocation and decentralized exe-
We define the optimal state-action value function Q∗ (s, a)
cution framework (CADE) to solve the charging station
as Q∗ (s, a) = maxπ Qπ (s, a). Then the optimal determin-
scheduling problem formulated in Section II. The overall
istic policy π ∗ is the one that chooses actions a∗ =
framework of CADE for charging station scheduling is shown
argmaxa∈A Q∗ (s, a).
in Fig. 1. EVs stochastically arrive at the charging station.
Immediately after observing the actual EV arrival at time step B. Formulate the Charger Control Problem as an MDP
t, a centralized allocation algorithm determines which EVs
shall be connected to chargers or assigned to the waiting area. In the individual charger control problem, each charger is
Then each charger acts as an individual agent and makes treated as an agent who interacts with the environment, which
charging/discharging decisions based on the state of the EV consists of the EV connected to the charger and the charging
connected with it. The sequential decision-making problem station.
of each charger is formulated as a Markov Decision Process Suppose the set of chargers is J. For an individual charger
(MDP). The operational experiences of all chargers will be j ∈ J, its state of environment at time t is defined as
collected and used in a learning process. In this process, the sjt = {δjt , t, tjr , ejt , erjt , NtEV,w , Etr,w , h̃t , Lht }. δjt ∈ {0, 1}
parameters of the state-action value function’s neural networks indicates if there is an EV connected to charger j. t ∈ T denotes
will be updated and then broadcast to all chargers. the global time. tjr = tjd − t is the remaining dwelling time
of the EV connected to the charger. ejt denotes the current
tgt
A. Overview of Markov Decision Process energy level of the EV battery. erjt = ej − ejt is the remaining
energy to be charged for the EV battery. NtEV,w is the total
In this subsection, we briefly introduce MDP, which is the
number of EVs in the waiting area at time t. Etr,w is the sum
mathematical foundation for sequential decision-making prob-
of remaining energy to be charged for all EVs in the wait-
lems. In an MDP, an agent interacts with an environment over NtEV,w r
a sequence of discrete time steps t = {1, 2, . . . , T }. At each ing area, which can be calculated as Etr,w = k=1 ekt . h̃t is
time step t, the agent senses the state s ∈ S of the environment an one-hot encoded vector indicating the current time-of-use
and takes action a ∈ A accordingly. Once the action is taken, period. The dimension of h̃t is dim(H). For example, if there
the environment will transition to a new state s with probabil- are three time-of-use periods and we are currently in the sec-
ity Pr(s |s, a) and provide a reward r to the agent. The agent’s ond period, then h̃t = [0, 1, 0]. Lht is the maximum load that
goal is tomaximize the expected sum of discounted rewards the charging station has seen so far for time-of-use period h,
Rt = E[ Tτ =t γ τ −t rτ ], where γ ∈ [0, 1] is a discount factor which corresponds to global time t.
that balances the relative importance of near-term and future Note that tjr and erjt quantify the urgency of the need to
rewards. charge the EV connected to charger j. NtEV,w and Etr,w quantify

the urgency of the need to charge the EVs in the waiting Algorithm 1: Online Calculation Procedure for Rlt
area. The agent can obtain the current time-of-use pricing for 1 Obtain maximum charging station load Lht for time-of-use period h up
electricity given the global time t. to time t;
The action for a charger j at time t is its output power 2 Initialize
Rlt = 0;
ajt . In this paper, we discretize the action space using dis- 3 if j∈J ajt > Lht then

crete charging rates as suggested by [45], [46]. The upper 4 Rlt ← − TT plh j∈J ajt − Lht ;
upper B
bound of the charger power output action ajt can be calcu-
5 Lht ← j∈J ajt ;
lated as the minimum of the physical maximum charger output 6 end
and the maximum charging rate allowed to fill up the bat-
upper emax −e
tery of the EV: ajt = min(amax , t jt ). The lower bound
of the charger power output action alower jt can be calculated If a charger j selects a higher charging power ajt , then it
as the maximum of the physical minimum charger output and contributes more to the increase in demand charge. Thus, we
the maximum discharging rate allowed to deplete the battery design rjtl in a way that assigns a charger with higher power
emin −e
of the EV: alowerjt = max(amin , t jt ). Finally, the feasible output a greater penalty as shown in (16):
action space for a charge is defined as an ordered set of dis- ajt
crete and increasing charger output Ajt = {alower jt
upper
, . . . , ajt } rjtl = Rlt , (16)
with uniform difference a between adjacent actions. j∈J ajt
The design of the reward function for the chargers shall Note that if the study time horizon T is different from
be consistent with the charging station operation objective the electricity billing period TB , then the increase in demand
in (1), which is maximizing the net revenue B from charg- charge is scaled accordingly in Algorithm 1.
ing/discharging EVs and minimizing charging station demand
charge Cl and the penalty of not satisfying customers’ charg- C. Decentralized Execution at the Charger Level
ing needs Cp . The reward function of a charger equals to the
p The chargers in the charging station determine the charg-
sum of three components: rjt = rjtb + rjt + rjtl . The first com-
ing/discharging power for each time step by following an RL-
ponent rjtb = mt |ajt |t represents the net revenue of a charger based decentralized execution framework. Individual chargers
for charging/discharging an EV in the period of t. collect historical operational experiences, which include the
p p p,w
The second component rjt = −(cjt + cjt ) accounts for the current state, the action, the next state, and the reward,
penalty of failing to satisfy the customer’s charging needs. The and store them in a shared replay memory M. The action
second component of the reward function has two parts. The value functions corresponding to the chargers will be learned
p
first part cjt is a penalty assigned to the charger j when the EV through a deep Q-learning (DQN) algorithm [47] using the
connected to it does not reach its desired energy level upon shared replay memory. DQN is selected due to its ability
leaving, which can be calculated as follows: to handle large state and action spaces. The learned param-
+ eters of the Q networks will be periodically broadcasted to
tgt
p μ ej − ejt if tr = 0 the individual chargers. In real-time operations, the chargers
cjt = (14)
0 if tr > 0 will be taking charging/discharging actions based on its own
observation of the state in a decentralized manner.
p,w
The second part cjt is a penalty assigned to the charger j In the DQN algorithm, a deep neural network parameterized
when EVs in the waiting area do not reach their desired energy by θ is used to approximate the action value function Q. To
level upon leaving. Since EVs in the waiting area are not stabilize the learning process, a second neural network param-
directly associated with a specific charger, the corresponding eterized by θ − called the target network is introduced to reduce
penalty will be split among all chargers in the charging station. the strong correlations between the function approximator and
p,w
The aggregated penalty Rt for all EVs in the waiting area its target. The parameters of the Q-network are iteratively
p,w NtEV,w p updated by stochastic gradient descent on the following loss
can be calculated as Rt = k=1 ckt . To encourage charg-
ers to increase charging power when the EVs in the waiting function:
2
areas have not reached their desired energy level, we propose −
p,w
splitting Rt in a way that assigns chargers fewer penalties if L(θ ) = E r + γ (1 − d) max Q s , a ; θ − Q(s, a; θ ) ,
a ∈A
their power output levels are higher: (17)
p,w amax − ajt s
cjt = max Rp,w
t (15) where s and are the current state and the next state of
j∈J a − ajt the environment, respectively. d ∈ {0, 1} is an indicator
of the terminal state for each episode, and d = 1 indicates
The third component of the reward function rjtl is associated s is the terminal state. The target net’s parameters θ − are
with the demand charge of the charging station. The aggre- replaced by θ after every certain number of training episodes.
gated additional demand charge to be paid by the charging We design the Q-network with the following structure. The
station Rlt at time t is positive when the total charging power
input layer of the neural network includes the state and action
j∈J ajt exceeds the maximum load of the charging station Lht variables (s, a) of the individual chargers. The output layer
up to time t. Rlt can be calculated by following Algorithm 1. has a single neuron representing the value of Q(s, a). This

Authorized licensed use limited to: VIT University. Downloaded on January 09,2023 at 05:36:14 UTC from IEEE Xplore. Restrictions apply.
3044 IEEE TRANSACTIONS ON SMART GRID, VOL. 13, NO. 4, JULY 2022

design is suitable for the charger decision-making problem as Algorithm 2: Summary of the CADE Framework
the charging powers are ordinal variables. 1 Initialize exploration rate = 1;
2 Initialize replay memory M;
3 Initialize state-action value function network Q(s, a; θ ) with random
D. Centralized Allocation at the Charging Station Level weights θ ;
At each time step, the charging station scheduler needs to 4 Initialize target network Q(s, a, θ − ) with θ − = θ ;
5 for episode = 1 : n do
allocate the EVs to the chargers and waiting spots to maximize 6 Initialize t = 0 and NtEV = 0;
its own profit. Instead of directly solving the optimization 7 Initialize maximum loads Lht = 0, ∀h ∈ H;
problem formulated in Section II-D, we propose a central- 8 while t < T do
ized EV allocation algorithm that leverages the learned action 9 If NtEV < N, admit arriving EVs to the charging station;
10 Allocate EVs to chargers & waiting area by solving
value function Q for the chargers in the previous time step. optimization problem (18) - (19b) where Q(s, a; θ ) is utilized
Suppose at time t, we have NtEV EVs in the charging station. to generate qck and qw EV
k , ∀k ∈ {1, 2, . . . , Nt };
An EV can either be connected to one of the chargers or parked 11 for j ∈ J do
12 Obtain sjt for charger j;
in the waiting area. Let a binary variable αk denotes whether 13 Draw x from uniform distribution U(0, 1);
the kth EV is connected to a charger. αk = 1 indicates that 14 if x < then
the kth EV is connected to a charger, and such a connection 15 Randomly select ajt ∈ Ajt
else
will create an action value qck = maxa Q(s(αk = 1), a), where
16
17 Select a∗jt = arg maxajt Q(sjt , ajt ; θ );
Q is the action value function learned in the previous time 18 end
step and s(αk = 1) denotes the state of the charger when it is 19 Calculate rjtb ;
connected to the kth EV. On the other hand, when the kth EV 20 The environment transitions to the next state sj(t+1) ;
is parked on a waiting spot, it is equivalent to be connected to a 21 end
22 for j ∈ J do
charger with zero power output, in which case, the connection 23
p
Calculate rjt , rjtl , and finally rjt ;
will create an action value qw k = Q(s(αk = 1), a = 0). As a 24 Add (sjt , ajt , sj(t+1) , rjt ) to M;
summary, the action value created by the connection of the kth 25 end
EV is αk qck + (1 − αk )qw k . Note that for state vector s(αk = 1),
26 Sample a mini-batch m from M;
27 Update θ by using the gradient descent method to minimize
∀k ∈ {1, 2, . . . , NtEV }, the remaining energy to be charged for the loss function in (17);
all EVs in the waiting area can be approximated by its value 28 Decrease exploration rate ;
in the previous time step. 29 Set θ − = θ every K episodes;
30 t += t;
The charging station’s scheduling problem can be solved 31 end
by finding the EV allocation that maximize the summation 32 end
of the action values created by the connections of all EVs
while satisfying the space limitations as shown in (18), (19a),
and (19b):
different chargers are stored in the same replay memory. The
NtEV
-greedy technique is deployed during the training process to
max αk qck + (1 − αk )qw
k (18) ensure sufficient exploration.
αk
k=1
subject to: IV. N UMERICAL S TUDY
EV
Nt In this section, we evaluate the performance of the proposed

αk = min N c , NtEV , (19a) CADE framework in managing the operations of charging
k=1
stations. A comprehensive comparison between the proposed
and baseline algorithms in terms of optimality and scal-
αk ∈ {0, 1}, ∀k ∈ {1, 2, . . . , NtEV }, (19b)
ability is conducted. We also examine the learned action
Equation (19a) enforces the constraint that at most N c EVs value function under different operation scenarios. The train-
are connected to chargers if NtEV > N c . If NtEV ≤ N c , then all ing of the proposed RL algorithm is performed on a GPU
EVs should be allocated to charging area. (Nvidia GeForce RTX 2080 Ti). We leveraged a desktop with
AMD Ryzen 7 8-core CPU and Gurobi solver to execute the
E. Summary of CADE Framework MPC-based baseline algorithm.
By synergistically combining the centralized allocation and
decentralized execution modules, the CADE framework can be A. Setup of the Numerical Study
summarized by Algorithm 2, in which line 10 corresponds to It is assumed that the arrival of EVs at the charging station
the centralized allocation module and line 11 ∼ 25 correspond follows the Poisson process. The EV arrival patterns in four
to the decentralized execution module in the training phase. It different locations are evaluated: Office zone, residential area,
is worth noting that in the deployment phase when no more highway, and retail stores. The arrival rates per hour λ of the
exploration or training is required, the decentralized execution four locations are shown in Fig. 2. For instance, a charging
module only includes line 11, 17, and 21. Since we assumed station near the residential area may expect higher EV arrival
that the chargers in the charging station are homogeneous, they rates in the afternoon. Highway charging stations may see sev-
all share the same parameter vector θ . The experiences from eral peaks of arrival rates in a typical day [48] and the retail

TABLE II
C HARGING S TATION AND E NVIRONMENT PARAMETERS

Fig. 2. Four different EV arrival patterns.

TABLE I
E LECTRICITY P RICE AND D EMAND C HARGE

store features a noon-peak and a higher evening peak. It is

worth emphasizing that the proposed method does not assume
prior knowledge of the specific arrival patterns.
We assume that there are three time-of-use periods in a
day: on-peak, mid-peak, and off-peak. The electricity price pe
and the demand charge price pl paid by the charging station
in each of the time-of-use periods are listed in Table I. The
energy charging price pc and discharging price pd paid by EV
customers are assumed to be constant. Note that we set pd >
pc to compensate the EV customers for battery degradation.
The remaining charging station and environment parame-
ters such as size of the charging station, penalty price for not
satisfying charging needs, battery energy limits, dwelling time
and initial/target battery energy levels are listed in Table II.
The parameters of the RL module in the proposed CADE
framework is setup as follows. The neural network approxi-
mating the action value function has three hidden layers with
256, 128, and 64 neurons, respectively. The optimizer updating
Q-network parameters is stochastic gradient descent (SGD).
Fig. 3. The charging station profit breakdown in the model training process.
The initial learning rate is set at 0.01 and it is scheduled to
decrease along the training process. The discount factor γ is
set to be 1. The target network is updated every 25 episodes.
to zero, which means that the RL agents successfully learned
to charge EVs to their target energy levels upon departures.
B. Learning Performance of the Proposed CADE Framework The reduction of demand charge Cl is limited. This is because
The RL-module of the CADE framework is trained by of the difficulty to predict future EV charging demands. In
following Algorithm 2. The episodic returns as the train- sum, the CADE framework learned to improve the operation
ing progresses under the office zone EV arrival pattern with policy of the charging station.
N c = 10 and N w = 5 are depicted in Fig. 3. Sub-figure (a) In Fig. 4, we visualize the daily state-of-charge time series
shows that as training progresses, the charging station profit of EVs that have visited the charging station. As shown in
increases steadily and then stabilizes when the number of train- the figure, the chargers learned to charge EV batteries mostly
ing episodes reaches around 1,000. As shown in sub-figures during off-peak and mid-peak hours, when pc −pe > 0 and the
(b)-(e), all components of the profit improve over the training demand charge price pl is relatively low. Similarly, the charg-
session. The net revenues of charging (Bc ) and discharging ers learned to discharge EV batteries during on-peak hours
(Bd ) reach high levels. The penalty term Cp gradually reduces when pd − pe > 0 and pl is high.

Authorized licensed use limited to: VIT University. Downloaded on January 09,2023 at 05:36:14 UTC from IEEE Xplore. Restrictions apply.
3046 IEEE TRANSACTIONS ON SMART GRID, VOL. 13, NO. 4, JULY 2022

Fig. 4. The daily state-of-charge time series of EV batteries.

Fig. 6. Performance of CADE under different station sizes.

TABLE III
C OMPARISON OF C OMPUTATION T IME P ER S TEP

baseline methods in terms of profit except MPC-ideal, which

has an unfair advantage of having the perfect knowledge of
Fig. 5. Profit comparison between CADE & baseline methods. future EV arrivals.

D. Scalability Analysis
C. Profit Comparison With Baseline Methods In this subsection, we analyze the scalability of the proposed
We compare the charging station profit obtained by method. We evaluate the performance of CADE under 4 dif-
the proposed CADE framework and four baseline meth- ferent charging station sizes with the number of chargers
ods. The first baseline method is the MPC-based approach, N c ∈ {10, 20, 50, 100}. For each case, N w is kept as 0.5N c .
where the one-hour and two-hour ahead EV arrivals are pre- The office zone EV arrival pattern is used and scaled linearly
dicted based on the ground-truth Poisson distribution. The with the number of chargers. The average charging station and
second baseline is the MPC-ideal method, which assumes the per charger profit under the 4 different cases are reported in
charging station has perfect prediction for two-hour ahead EV Fig. 6. As the charging station size increases, the average daily
arrivals. For both MPC and MPC-ideal methods, at each time profit per charger stays in a stable range from $7.8 to $8.4.
step, the optimization problem (10)-(11g) are solved by replac- This result shows that the proposed CADE framework can
ing the original time horizon T with the prediction horizon and efficiently manage the operations of a large group of chargers.
the solution for this time step is executed. The third and fourth The proposed CADE framework not only outperforms
baselines are greedy methods. The GRD method assigns EVs baseline methods in terms of profit but also achieves high
to chargers based on urgency of charging needs. The urgency computation efficiency. The computation time of the proposed
is measured by er /tr , the minimum charging power required and MPC-based methods are listed in Table III. As shown in
to fulfill the energy demand upon an EV’s departure. If an the table, the computation time of the proposed CADE frame-
EV has a higher required minimum charging power, it has a work is much shorter than that of the MPC-based method. The
higher urgency, which leads to a higher priority to be allo- decentralized execution scheme of CADE enables its compu-
cated to a charger. The chargers then charges EVs greedily tation time to increase linearly with the number of chargers
at maximal power during off-peak hours, and discharges EVs rather than exponentially.
evenly during on-peak hours to a level (no less than Etgt ) that
avoids penalty. If discharging EV battery is not allowed, then E. Interpretation of the Learned State-Action Values
the method is denoted as GRD-noVGI. All results are obtained In our proposed CADE framework, the state-action value
with N c = 10 and N w = 5. The proposed and baseline meth- functions are represented by neural networks, which are
ods are evaluated using 30 randomly generated trajectories difficult to interpret. To better understand how the trained RL-
from each of the 4 different EV arrival patterns. As shown based chargers make decisions under different scenarios, we
in Fig. 5, the proposed CADE framework outperforms all analyze the learned state-action values. Specifically, we change

Fig. 7. Q values of four scenarios with different (a) time-of-use period h, (b) remaining charging time tr , (c) number of EVs in the waiting area NtEV,w ,
and (d) historical maximum loads Lh .

one component in the state vector at a time while fixing the rest [2] S. Wang, S. Bi, Y.-J. A. Zhang, and J. Huang, “Electrical vehicle
to quantify how this component impacts the distribution of Q charging station profit maximization: Admission, pricing, and online
scheduling,” IEEE Trans. Sustain. Energy, vol. 9, no. 4, pp. 1722–1731,
values. Fig. 7(a)-(d) illustrate Q values of four scenarios with Oct. 2018.
different (a) time-of-use-period h, (b) remaining charging time [3] H. Zhang, Z. Hu, Z. Xu, and Y. Song, “Optimal planning of PEV charg-
tr , (c) number of EVs in the waiting area NtEV,w = NtEV − N c , ing station with single output multiple cables charging spots,” IEEE
Trans. Smart Grid, vol. 8, no. 5, pp. 2119–2128, Sep. 2017.
and (d) historical maximum loads Lh . The vertical blue lines [4] P. You and Z. Yang, “Efficient optimal scheduling of charging station
indicate the maximum Q values and the associated actions. As with multiple electric vehicles via V2V,” in Proc. IEEE Int. Conf. Smart
shown in Fig. 7(a), the RL-based charger prefers discharging in Grid Commun. (SmartGridComm), 2014, pp. 716–721.
[5] Q. Dong, D. Niyato, P. Wang, and Z. Han, “The PHEV charging schedul-
on-peak hours and charging in off-peak hours to increase oper- ing and power supply optimization for charging stations,” IEEE Trans.
ating profit. Fig. 7(b) shows that when the remaining charging Veh. Technol., vol. 65, no. 2, pp. 566–580, Feb. 2016.
time tr is close to 0, i.e., the EV is about to leave, the RL [6] Q. Huang, Q.-S. Jia, Z. Qiu, X. Guan, and G. Deconinck, “Matching EV
charging load with uncertain wind: A simulation-based policy improve-
agent prefers higher charging powers in order to meet the tar- ment approach,” IEEE Trans. Smart Grid, vol. 6, no. 3, pp. 1425–1433,
get battery energy level. In Fig. 7(c), we can see that higher May 2015.
charging powers are preferred when there are more EVs in [7] J. Domínguez-Navarro, R. Dufo-López, J. Yusta-Loyo, J. Artal-Sevil,
and J. Bernal-Agustín, “Design of an electric vehicle fast-charging sta-
the waiting area. As shown in Fig. 7(d), if a higher historical tion with integration of renewable energy and storage systems,” Int. J.
maximum load Lh is recorded, the chargers are less concerned Electr. Power Energy Syst., vol. 105, pp. 46–58, Feb. 2019.
about incurring even higher demand charge, which led to the [8] Y. Deng, Y. Zhang, F. Luo, and Y. Mu, “Operational planning of cen-
tralized charging stations utilizing second-life battery energy storage
preference of larger charging power. systems,” IEEE Trans. Sustain. Energy, vol. 12, no. 1, pp. 387–399,
Jan. 2021.
V. C ONCLUSION [9] Q. Yan, B. Zhang, and M. Kezunovic, “Optimized operational cost
reduction for an EV charging station integrated with battery energy
In this paper, a centralized allocation and decentralized exe- storage and PV generation,” IEEE Trans. Smart Grid, vol. 10, no. 2,
cution (CADE) framework is developed to operate a charging pp. 2096–2106, Mar. 2019.
[10] A. S. Awad, M. F. Shaaban, T. H. El-Fouly, E. F. El-Saadany, and
station considering vehicle-grid integration. The objective of M. M. Salama, “Optimal resource allocation and charging prices for
the proposed algorithm is to maximize the profit of the charg- benefit maximization in smart PEV-parking lots,” IEEE Trans. Sustain.
ing station by considering the net revenue of charging and Energy, vol. 8, no. 3, pp. 906–915, Jul. 2017.
[11] L. Gan, U. Topcu, and S. H. Low, “Optimal decentralized protocol
discharging, as well as the operational costs associated with for electric vehicle charging,” IEEE Trans. Power Syst., vol. 28, no. 2,
demand charge and penalty. A centralized optimization mod- pp. 940–951, May 2013.
ule handles the allocation of EVs among waiting and charging [12] R. Yu, J. Ding, W. Zhong, Y. Liu, and S. Xie, “PHEV charging and
discharging cooperation in V2G networks: A coalition game approach,”
spots. Enabled by the reinforcement learning algorithm, the IEEE Internet Things J., vol. 1, no. 6, pp. 578–589, Dec. 2014.
chargers collect and share operational experiences, which are [13] C. Wei, J. Xu, S. Liao, and Y. Sun, “Aggregation and scheduling models
used to learn charger control policies. Comprehensive numer- for electric vehicles in distribution networks considering power fluctu-
ations and load rebound,” IEEE Trans. Sustain. Energy, vol. 11, no. 4,
ical studies with different EV arrival patterns show that the pp. 2755–2764, Oct. 2020.
proposed CADE framework outperforms state-of-the-art base- [14] X. Tang, S. Bi, and Y.-J. A. Zhang, “Distributed routing and charging
line algorithms. The scalability analysis shows that the CADE scheduling optimization for Internet of Electric Vehicles,” IEEE Internet
Things J., vol. 6, no. 1, pp. 136–148, Feb. 2019.
framework is more computationally efficient than the base- [15] A.-M. Koufakis, E. S. Rigas, N. Bassiliades, and S. D. Ramchurn,
line model-based control algorithm. Detailed analysis on the “Towards an optimal EV charging scheduling scheme with V2G and
learned state-action value function provides insights on how an V2V energy transfer,” in Proc. IEEE Int. Conf. Smart Grid Commun.
(SmartGridComm), 2016, pp. 302–307.
RL-based charger makes charging and discharging decisions [16] H. K. Nguyen and J. B. Song, “Optimal charging and discharging for
under different operational scenarios. multiple PHEVs with demand side management in vehicle-to-building,”
J. Commun. Netw., vol. 14, no. 6, pp. 662–671, 2012.
R EFERENCES [17] G. Dulac-Arnold et al., “Deep reinforcement learning in large discrete
action spaces,” 2015, arXiv:1512.07679.
[1] Y. Kim, J. Kwak, and S. Chong, “Dynamic pricing, scheduling, and [18] S. Dimitrov and R. Lguensat, “Reinforcement learning based algorithm
energy management for profit maximization in PHEV charging stations,” for the maximization of EV charging station revenue,” in Proc. Int. Conf.
IEEE Trans. Veh. Technol., vol. 66, no. 2, pp. 1011–1026, Feb. 2017. Math. Comput. Sci. Ind., 2014, pp. 235–239.

Authorized licensed use limited to: VIT University. Downloaded on January 09,2023 at 05:36:14 UTC from IEEE Xplore. Restrictions apply.
3048 IEEE TRANSACTIONS ON SMART GRID, VOL. 13, NO. 4, JULY 2022

[19] A. Chiş, J. Lundén, and V. Koivunen, “Reinforcement learning-based [40] X. Wang, J. Wang, and J. Liu, “Vehicle to grid frequency regulation
plug-in electric vehicle charging with forecasted price,” IEEE Trans. capacity optimal scheduling for battery swapping station using deep
Veh. Technol., vol. 66, no. 5, pp. 3674–3684, May 2017. Q-network,” IEEE Trans. Ind. Informat., vol. 17, no. 2, pp. 1342–1351,
[20] Z. Wan, H. Li, H. He, and D. Prokhorov, “Model-free real-time EV Feb. 2021.
charging scheduling based on deep reinforcement learning,” IEEE Trans. [41] B. Foggo and N. Yu, “Improved battery storage valuation through degra-
Smart Grid, vol. 10, no. 5, pp. 5246–5257, Sep. 2019. dation reduction,” IEEE Trans. Smart Grid, vol. 9, no. 6, pp. 5721–5732,
[21] L. Yan, X. Chen, J. Zhou, Y. Chen, and J. Wen, “Deep reinforcement Nov. 2018.
learning for continuous electric vehicles charging control with dynamic [42] N. Yu and B. Foggo, “Stochastic valuation of energy storage in whole-
user behaviors,” IEEE Trans. Smart Grid, vol. 12, no. 6, pp. 5124–5134, sale power markets,” Energy Econ., vol. 64, pp. 177–185, May 2017.
Nov. 2021. [43] P. Kou, D. Liang, L. Gao, and F. Gao, “Stochastic coordination of plug-in
electric vehicles and wind turbines in microgrid: A model predictive con-
[22] M. Dorokhova, Y. Martinson, C. Ballif, and N. Wyrsch, “Deep rein-
trol approach,” IEEE Trans. Smart Grid, vol. 7, no. 3, pp. 1537–1551,
forcement learning control of electric vehicle charging in the pres-
May 2016.
ence of photovoltaic generation,” Appl. Energy, vol. 301, Nov. 2021,
[44] W. Tang and Y. J. Zhang, “A model predictive control approach for
Art. no. 117504.
low-complexity electric vehicle charging scheduling: Optimality and
[23] F. Chang, T. Chen, W. Su, and Q. Alsafasfeh, “Charging control of an scalability,” IEEE Trans. Power Syst., vol. 32, no. 2, pp. 1050–1063,
electric vehicle battery based on reinforcement learning,” in Proc. 10th Mar. 2017.
Int. Renew. Energy Congr. (IREC), 2019, pp. 1–63. [45] G. Binetti, A. Davoudi, D. Naso, B. Turchiano, and F. L. Lewis,
[24] H. Li, Z. Wan, and H. He, “Constrained EV charging scheduling based “Scalable real-time electric vehicles charging with discrete charging
on safe deep reinforcement learning,” IEEE Trans. Smart Grid, vol. 11, rates,” IEEE Trans. Smart Grid, vol. 6, no. 5, pp. 2211–2220, Sep. 2015.
no. 3, pp. 2427–2439, May 2020. [46] B. Sun, Z. Huang, X. Tan, and D. H. K. Tsang, “Optimal scheduling
[25] F. Zhang, Q. Yang, and D. An, “CDDPG: A deep-reinforcement- for electric vehicle charging with discrete charging levels in distribution
learning-based approach for electric vehicle charging control,” IEEE grid,” IEEE Trans. Smart Grid, vol. 9, no. 2, pp. 624–634, Mar. 2018.
Internet Things J., vol. 8, no. 5, pp. 3075–3087, Mar. 2021. [47] V. Mnih et al., “Human-level control through deep reinforcement
[26] F. Wang, J. Gao, M. Li, and L. Zhao, “Autonomous PEV charging learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015.
scheduling using Dyna-Q reinforcement learning,” IEEE Trans. Veh. [48] M. B. Arias, M. Kim, and S. Bae, “Prediction of electric vehi-
Technol., vol. 69, no. 11, pp. 12609–12620, Nov. 2020. cle charging-power demand in realistic urban traffic networks,” Appl.
[27] S. Li et al., “Electric vehicle charging management based on deep rein- Energy, vol. 195, pp. 738–753, Jun. 2017.
forcement learning,” J. Mod. Power Syst. Clean Energy, early access,
Jun. 25, 2021, doi: 10.35833/MPCE.2020.000460.
[28] S. Vandael, B. Claessens, D. Ernst, T. Holvoet, and G. Deconinck,
“Reinforcement learning of heuristic EV fleet charging in a day-
ahead electricity market,” IEEE Trans. Smart Grid, vol. 6, no. 4,
Zuzhao Ye (Graduate Student Member, IEEE)
pp. 1795–1805, Jul. 2015.
received the B.E. degree in thermal energy and
[29] S. Wang, S. Bi, and Y. J. A. Zhang, “Reinforcement learning for real- power engineering from the University of Science
time pricing and scheduling control in EV charging stations,” IEEE and Technology of China, Hefei, China, in 2015.
Trans. Ind. Informat., vol. 17, no. 2, pp. 849–859, Feb. 2021. He is currently pursuing the Ph.D. degree in electri-
[30] N. Sadeghianpourhamami, J. Deleu, and C. Develder, “Definition and cal and computer engineering with the University
evaluation of model-free coordination of electrical vehicle charging of California at Riverside, Riverside, CA, USA.
with reinforcement learning,” IEEE Trans. Smart Grid, vol. 11, no. 1, His research interests include big data analytics,
pp. 203–214, Jan. 2020. machine learning, and optimization, particularly in
[31] T. Ding, Z. Zeng, J. Bai, B. Qin, Y. Yang, and M. Shahidehpour, their applications to the planning and operation of
“Optimal electric vehicle charging strategy with Markov decision pro- electric-vehicle charging infrastructure.
cess and reinforcement learning technique,” IEEE Trans. Ind. Appl.,
vol. 56, no. 5, pp. 5811–5823, Sep./Oct. 2020.
[32] F. L. Da Silva, C. E. H. Nishida, D. M. Roijers, and A. H. R. Costa,
“Coordination of electric vehicle charging through multiagent reinforce-
ment learning,” IEEE Trans. Smart Grid, vol. 11, no. 3, pp. 2347–2356,
Yuanqi Gao (Member, IEEE) received the B.E.
May 2020.
degree in electrical engineering from Donghua
[33] F. Tuchnitz, N. Ebell, J. Schlund, and M. Pruckner, “Development and University, Shanghai, China, in 2015, and the Ph.D.
evaluation of a smart charging strategy for an electric vehicle fleet based degree in electrical engineering from the University
on reinforcement learning,” Appl. Energy, vol. 285, Mar. 2021, Art. no. of California at Riverside, Riverside, CA, USA in
116382. 2020, where he is currently a Postdoctoral Scholar
[34] M. Shin, D.-H. Choi, and J. Kim, “Cooperative management for with the Department of Electrical and Computer
PV/ESS-enabled electric vehicle charging stations: A multiagent deep Engineering. His research interests include big data
reinforcement learning approach,” IEEE Trans. Ind. Informat., vol. 16, analytics and machine learning applications in smart
no. 5, pp. 3493–3503, May 2019. grids.
[35] S. Lee and D.-H. Choi, “Dynamic pricing and energy management
for profit maximization in multiple smart electric vehicle charging
stations: A privacy-preserving deep reinforcement learning approach,”
Appl. Energy, vol. 304, Dec. 2021, Art. no. 117754.
[36] Z. Zhao and C. K. M. Lee, “Dynamic pricing for EV charging stations: A
deep reinforcement learning approach,” IEEE Trans. Transp. Electrific.,
Nanpeng Yu (Senior Member, IEEE) received the
early access, Dec. 30, 2021, doi: 10.1109/TTE.2021.3139674.
B.S. degree in electrical engineering from Tsinghua
[37] A. Abdalrahman and W. Zhuang, “Dynamic pricing for differentiated University, Beijing, China, in 2006, and the M.S. and
PEV charging services using deep reinforcement learning,” IEEE Trans. Ph.D. degrees in electrical engineering from Iowa
Intell. Transp. Syst., vol. 23, no. 2, pp. 1415–1427, Feb. 2022. State University, Ames, IA, USA, in 2007 and 2010,
[38] V. Moghaddam, A. Yazdani, H. Wang, D. Parlevliet, and F. Shahnia, “An respectively. He is currently an Associate Professor
online reinforcement learning approach for dynamic pricing of electric with the Department of Electrical and Computer
vehicle charging stations,” IEEE Access, vol. 8, pp. 130305–130313, Engineering, University of California at Riverside,
2020. Riverside, CA, USA. His current research interests
[39] Y. Gao, J. Yang, M. Yang, and Z. Li, “Deep reinforcement learn- include machine learning in smart grid, electricity
ing based optimal schedule for a battery swapping station considering market design and optimization, and smart energy
uncertainties,” IEEE Trans. Ind. Appl., vol. 56, no. 5, pp. 5775–5784, communities. He is an Editor of IEEE T RANSACTIONS ON S MART G RID
Sep./Oct. 2020. and IEEE T RANSACTIONS ON S USTAINABLE E NERGY.

Authorized licensed use limited to: VIT University. Downloaded on January 09,2023 at 05:36:14 UTC from IEEE Xplore. Restrictions apply.

Deep Learning for EV Charging Control
No ratings yet
Deep Learning for EV Charging Control
13 pages
Learning Assisted Demand Charge Mitigation For Workplace Electric Vehicle Charging
No ratings yet
Learning Assisted Demand Charge Mitigation For Workplace Electric Vehicle Charging
9 pages
A Deep Reinforcement Learning Based Charging and Discharging Scheduling Strategy For Electric Vehicles
No ratings yet
A Deep Reinforcement Learning Based Charging and Discharging Scheduling Strategy For Electric Vehicles
10 pages
Context Aware DRL
No ratings yet
Context Aware DRL
19 pages
Spatiotemporal Optimized Dispatch of Electric Vehicles Under Electricity-Carbon Joint Market
No ratings yet
Spatiotemporal Optimized Dispatch of Electric Vehicles Under Electricity-Carbon Joint Market
14 pages
EVCS Control Using Customer - Behaviors
No ratings yet
EVCS Control Using Customer - Behaviors
9 pages
Battery Health-Informed and Policy-Aware Deep Reinforcement Learning For EV-Facilitated Distribution Grid Optimal Policy
No ratings yet
Battery Health-Informed and Policy-Aware Deep Reinforcement Learning For EV-Facilitated Distribution Grid Optimal Policy
14 pages
EV2Gym: A Flexible V2G Simulator For EV Smart Charging Research and Benchmarking
No ratings yet
EV2Gym: A Flexible V2G Simulator For EV Smart Charging Research and Benchmarking
10 pages
IET Intelligent Trans Sys - 2024 - Erüst - Spatio Temporal Dynamic Navigation For Electric Vehicle Charging Using Deep
No ratings yet
IET Intelligent Trans Sys - 2024 - Erüst - Spatio Temporal Dynamic Navigation For Electric Vehicle Charging Using Deep
12 pages
Energies 15 06992
No ratings yet
Energies 15 06992
25 pages
Distributed Electric Vehicle Charging Mechanism: A Game-Theoretical Approach
No ratings yet
Distributed Electric Vehicle Charging Mechanism: A Game-Theoretical Approach
9 pages
Risk Adversarial Learning System For Connected and Autonomous Vehicle Charging
No ratings yet
Risk Adversarial Learning System For Connected and Autonomous Vehicle Charging
20 pages
Demand-Side Management Using Deep Learning For Smart Charging of Electric Vehicles
No ratings yet
Demand-Side Management Using Deep Learning For Smart Charging of Electric Vehicles
9 pages
Smart Online Charging Algorithm For Electric Vehicles Via Customized Actor-Critic Learning
No ratings yet
Smart Online Charging Algorithm For Electric Vehicles Via Customized Actor-Critic Learning
11 pages
Constrained EV Charging Scheduling Based On Safe Deep Reinforcement Learning
No ratings yet
Constrained EV Charging Scheduling Based On Safe Deep Reinforcement Learning
3 pages
Jcao J Ev
No ratings yet
Jcao J Ev
12 pages
A Multiagent Federated Reinforcement Learning Approach For Plug-In Electric Vehicle Fleet Charging Coordination in A Residential Community
No ratings yet
A Multiagent Federated Reinforcement Learning Approach For Plug-In Electric Vehicle Fleet Charging Coordination in A Residential Community
14 pages
Pesgm2024 000011
No ratings yet
Pesgm2024 000011
5 pages
Optimizing Electric Vehicles Charging Through Smart Energy Allocation and Cost-Saving
No ratings yet
Optimizing Electric Vehicles Charging Through Smart Energy Allocation and Cost-Saving
6 pages
A Cognitive Stochastic Approximation Approach To Optimal Charging Schedule in Electric Vehicle Stations
No ratings yet
A Cognitive Stochastic Approximation Approach To Optimal Charging Schedule in Electric Vehicle Stations
6 pages
Power Flow Analysis of The Economic Dispatch Considering The Flexible EV Charging
No ratings yet
Power Flow Analysis of The Economic Dispatch Considering The Flexible EV Charging
4 pages
Game Theory Based Charging Solution For Networked Electric Vehicles: A Location-Aware Approach
No ratings yet
Game Theory Based Charging Solution For Networked Electric Vehicles: A Location-Aware Approach
13 pages
EV Charging Scheduling Under Demand Charge A Block Model Predictive Control Approach
No ratings yet
EV Charging Scheduling Under Demand Charge A Block Model Predictive Control Approach
14 pages
An Online Reinforcement Learning Approach For Dynamic Pricing of Electric Vehicle Charging Stations
No ratings yet
An Online Reinforcement Learning Approach For Dynamic Pricing of Electric Vehicle Charging Stations
9 pages
1 s2.0 S0360544223000415 Main
No ratings yet
1 s2.0 S0360544223000415 Main
16 pages
Deep Reinforcement Learning For Charging Schedulin
No ratings yet
Deep Reinforcement Learning For Charging Schedulin
22 pages
1 s2.0 S0378779624009477 Main
No ratings yet
1 s2.0 S0378779624009477 Main
12 pages
Energies 16 08102
No ratings yet
Energies 16 08102
18 pages
Optimization of Pricing Policy of Electric Vehicle Charging Station Based On Big Data
No ratings yet
Optimization of Pricing Policy of Electric Vehicle Charging Station Based On Big Data
11 pages
Game Theoretic-Based Distributed Charging Strategy
No ratings yet
Game Theoretic-Based Distributed Charging Strategy
10 pages
Deep-Learning-Based Probabilistic Forecasting of Electric Vehicle Charging Load With A Novel Queuing Model
No ratings yet
Deep-Learning-Based Probabilistic Forecasting of Electric Vehicle Charging Load With A Novel Queuing Model
14 pages
A Novel Electric Vehicles Charging Discharging Management Protocol Based On Queuing Model
No ratings yet
A Novel Electric Vehicles Charging Discharging Management Protocol Based On Queuing Model
12 pages
EV Charging Stations With A Provision of V2G and Voltage Support in A Distribution Network
No ratings yet
EV Charging Stations With A Provision of V2G and Voltage Support in A Distribution Network
10 pages
1 s2.0 S0957417422019595 Main
No ratings yet
1 s2.0 S0957417422019595 Main
18 pages
Active Distribution System Reinforcement Planning With EV Charging Stations-Part I: Uncertainty Modeling and Problem Formulation
No ratings yet
Active Distribution System Reinforcement Planning With EV Charging Stations-Part I: Uncertainty Modeling and Problem Formulation
9 pages
A Heuristic Operation Strategy For Commercial Building Microgrids Containing Evs and PV System
No ratings yet
A Heuristic Operation Strategy For Commercial Building Microgrids Containing Evs and PV System
11 pages
EV Charging Stations With A Provision of V2G and Voltage Support in A Distribution Network
No ratings yet
EV Charging Stations With A Provision of V2G and Voltage Support in A Distribution Network
10 pages
Large Scale Scenarios of EV Charging W A Data Driven Model of Control - Powell
No ratings yet
Large Scale Scenarios of EV Charging W A Data Driven Model of Control - Powell
43 pages
Deep Reinforcement Learning Based Charging Scheduling For Household Electric Vehicles in Active Distribution Network
No ratings yet
Deep Reinforcement Learning Based Charging Scheduling For Household Electric Vehicles in Active Distribution Network
12 pages
Qian 2020
No ratings yet
Qian 2020
11 pages
Learning and Optimization For Price-Based Demand Response of Electric Vehicle Charging
No ratings yet
Learning and Optimization For Price-Based Demand Response of Electric Vehicle Charging
6 pages
Risk-Based Day-Ahead Scheduling of Electric Vehicle Aggregator Using Information Gap Decision Theory
No ratings yet
Risk-Based Day-Ahead Scheduling of Electric Vehicle Aggregator Using Information Gap Decision Theory
10 pages
A Structural Property of Charging Scheduling Policy For Shared Electric Vehicles With Wind Power Generation
No ratings yet
A Structural Property of Charging Scheduling Policy For Shared Electric Vehicles With Wind Power Generation
13 pages
Content 1 5
No ratings yet
Content 1 5
5 pages
IET Renewable Power Gen - 2022 - Amir - Agent Based Online Learning Approach For Power Flow Control of Electric Vehicle
No ratings yet
IET Renewable Power Gen - 2022 - Amir - Agent Based Online Learning Approach For Power Flow Control of Electric Vehicle
13 pages
Electric Vehicle Charging Scheduling Problem Heuristicsand Metaheuristic Approaches
No ratings yet
Electric Vehicle Charging Scheduling Problem Heuristicsand Metaheuristic Approaches
21 pages
Real-Time Energy Management in Smart Homes Through Deep Reinforcement Learning
No ratings yet
Real-Time Energy Management in Smart Homes Through Deep Reinforcement Learning
18 pages
Smart Electric Vehicle Charging Algorithm To Reduce The Impact On Power Grids A Reinforcement Learning Based Methodology
No ratings yet
Smart Electric Vehicle Charging Algorithm To Reduce The Impact On Power Grids A Reinforcement Learning Based Methodology
13 pages
Liu 2019
No ratings yet
Liu 2019
12 pages
Collaborative EV Routing and Charging Scheduling With Power Distribution and Traffic Networks Interaction
No ratings yet
Collaborative EV Routing and Charging Scheduling With Power Distribution and Traffic Networks Interaction
12 pages
Charging Station Placement For Electric Vehicles: A Case Study of Guwahati City, India
No ratings yet
Charging Station Placement For Electric Vehicles: A Case Study of Guwahati City, India
13 pages
Electric Vehicle Charging Load Prediction Based On Weight Fusion Spatial-Temporal Graph Convolutional Network
No ratings yet
Electric Vehicle Charging Load Prediction Based On Weight Fusion Spatial-Temporal Graph Convolutional Network
17 pages
Fair Charging Management of Phevs in Radial Distribution Networks With DG Resources-A Case Study
No ratings yet
Fair Charging Management of Phevs in Radial Distribution Networks With DG Resources-A Case Study
29 pages
Smart Charging Infrastructure For Electric Vehicles in A Charging Station
No ratings yet
Smart Charging Infrastructure For Electric Vehicles in A Charging Station
7 pages
Discover Applied Sciences
No ratings yet
Discover Applied Sciences
18 pages
Large-Scale Forecasting of Electric Vehicle Charging Demand Using
No ratings yet
Large-Scale Forecasting of Electric Vehicle Charging Demand Using
12 pages
Historical Background of Hev
No ratings yet
Historical Background of Hev
10 pages
Mobile Charging Station Placements in Internet of Electric Vehicles A Federated Learning Approach
No ratings yet
Mobile Charging Station Placements in Internet of Electric Vehicles A Federated Learning Approach
17 pages
EV Charging Station Strategy
No ratings yet
EV Charging Station Strategy
13 pages
A Novel Data-Driven Approach For Solving The Electric Vehicle Charging Station Location-Routing Problem
No ratings yet
A Novel Data-Driven Approach For Solving The Electric Vehicle Charging Station Location-Routing Problem
11 pages
Re Open Dates Odd Sem 2015 16
No ratings yet
Re Open Dates Odd Sem 2015 16
2 pages
The Design and Implementation of Host-Based Intrusion Detection System
100% (1)
The Design and Implementation of Host-Based Intrusion Detection System
4 pages
A 4096-Neuron 1M-Synapse 3.8-pJ SOP Spiking Neural Network With On-Chip STDP Learning and Sparse Weights in 10-nm FinFET CMOS
No ratings yet
A 4096-Neuron 1M-Synapse 3.8-pJ SOP Spiking Neural Network With On-Chip STDP Learning and Sparse Weights in 10-nm FinFET CMOS
11 pages
Hands-On Bayesian Neural Networks
No ratings yet
Hands-On Bayesian Neural Networks
24 pages
Data Science and Machine Learning
100% (2)
Data Science and Machine Learning
190 pages
ASR with Deep Neural Networks
No ratings yet
ASR with Deep Neural Networks
6 pages
BTP Report
No ratings yet
BTP Report
32 pages
T-Gsa: Transformer With Gaussian-Weighted Self-Attention For Speech Enhancement
No ratings yet
T-Gsa: Transformer With Gaussian-Weighted Self-Attention For Speech Enhancement
5 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
13 pages
Data Science Expertise & Projects
No ratings yet
Data Science Expertise & Projects
2 pages
FIFA Video Game - Players Classification
No ratings yet
FIFA Video Game - Players Classification
26 pages
RPA Module-1 Notes
No ratings yet
RPA Module-1 Notes
16 pages
Machine Learning & Some Industry Applications
No ratings yet
Machine Learning & Some Industry Applications
43 pages
Isaeva
No ratings yet
Isaeva
2 pages
Immediate Download Real World Natural Language Processing 1st Edition Masato Hagiwara Ebooks 2024
No ratings yet
Immediate Download Real World Natural Language Processing 1st Edition Masato Hagiwara Ebooks 2024
37 pages
AI Associate Trailhead Badges Notes
No ratings yet
AI Associate Trailhead Badges Notes
94 pages
Thesis Draft 08 Final Revision
No ratings yet
Thesis Draft 08 Final Revision
109 pages
Ad3501 - Deep Learning
No ratings yet
Ad3501 - Deep Learning
2 pages
ATM Cash Management
No ratings yet
ATM Cash Management
5 pages
Zhang 1992
No ratings yet
Zhang 1992
10 pages
NNDL Lab Manual
No ratings yet
NNDL Lab Manual
41 pages
A Neural Network Approach To Ordinal Regression: Jianlin Cheng, Zheng Wang, and Gianluca Pollastri
No ratings yet
A Neural Network Approach To Ordinal Regression: Jianlin Cheng, Zheng Wang, and Gianluca Pollastri
6 pages
Insc D 22 00136
No ratings yet
Insc D 22 00136
16 pages
Anomaly Detection and Failure Root Cause Analysis
No ratings yet
Anomaly Detection and Failure Root Cause Analysis
36 pages
Soil Suitability Classification For Crop Selection in Precis - 2023 - Ecological
No ratings yet
Soil Suitability Classification For Crop Selection in Precis - 2023 - Ecological
15 pages
SoftMax ASAP2020 June14
No ratings yet
SoftMax ASAP2020 June14
9 pages
Bioresource Technology
No ratings yet
Bioresource Technology
11 pages
Alexandre 34801500 & Blanckaert 03781500 - 2020
No ratings yet
Alexandre 34801500 & Blanckaert 03781500 - 2020
174 pages
ML Project
No ratings yet
ML Project
8 pages
Neural Networks: Dawid Połap, Marcin Woźniak
No ratings yet
Neural Networks: Dawid Połap, Marcin Woźniak
11 pages
Chaudhary, Aryan - Mallik, Biswadip Basu - Mukherjee, Gunjan - Kar, - Deep Learning Applications in Operations Research, Advances in Computational Collective Intelligence (2025, Taylor & Francis Gro
No ratings yet
Chaudhary, Aryan - Mallik, Biswadip Basu - Mukherjee, Gunjan - Kar, - Deep Learning Applications in Operations Research, Advances in Computational Collective Intelligence (2025, Taylor & Francis Gro
463 pages

Learning To Operate An Electric Vehicle Charging Station Considering Vehicle-Grid Integration

Uploaded by

Learning To Operate An Electric Vehicle Charging Station Considering Vehicle-Grid Integration

Uploaded by

3038 IEEE TRANSACTIONS ON SMART GRID, VOL. 13, NO.

Learning to Operate an Electric Vehicle Charging

over all EVs as shown in (7): βit − ηit ≥ 0, ∀i ∈ I, ∀t ∈ T (11f)

C. Demand Charge (11g)

Fig. 2. Four different EV arrival patterns.

store features a noon-peak and a higher evening peak. It is

Fig. 4. The daily state-of-charge time series of EV batteries.

Fig. 6. Performance of CADE under different station sizes.

baseline methods in terms of profit except MPC-ideal, which

You might also like