0% found this document useful (0 votes)
21 views14 pages

Spatiotemporal Optimized Dispatch of Electric Vehicles Under Electricity-Carbon Joint Market

This article, accepted for publication in IEEE Transactions on Transportation Electrification, presents a real-time dispatch method for electric vehicles (EVs) in an electricity-carbon joint market using multi-agent deep reinforcement learning (MADRL). The proposed model optimizes EV dispatch by considering various factors such as charge/discharge benefits, carbon trading, and battery degradation costs, validated through case studies in San Diego. Results indicate that the method can achieve significant daily revenue through spatiotemporal flexibility in EV operations.

Uploaded by

santhosha bm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views14 pages

Spatiotemporal Optimized Dispatch of Electric Vehicles Under Electricity-Carbon Joint Market

This article, accepted for publication in IEEE Transactions on Transportation Electrification, presents a real-time dispatch method for electric vehicles (EVs) in an electricity-carbon joint market using multi-agent deep reinforcement learning (MADRL). The proposed model optimizes EV dispatch by considering various factors such as charge/discharge benefits, carbon trading, and battery degradation costs, validated through case studies in San Diego. Results indicate that the method can achieve significant daily revenue through spatiotemporal flexibility in EV operations.

Uploaded by

santhosha bm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

This article has been accepted for publication in IEEE Transactions on Transportation Electrification.

This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TTE.2025.3549747

Spatiotemporal Optimized Dispatch of Electric


Vehicles under Electricity-carbon Joint Market
Dong Han, Huarui Zhang, Jiawen Peng, Zhuoxin Lu, Xijun Ren

Abstract—The electrification of urban transportation systems DOD Depth of discharge


is a critical step toward achieving low-carbon transportation and CS Charing station
meeting climate commitments. With the development of Vehicle- Indices and sets:
to-Grid technology, electric vehicles (EVs) have become a vital tT Index and set of time steps
component of the power-transportation network. In order to iI Index and set of EVs
perform the optimal control of EVs with low-carbon and
n,mN Index and set of charging stations
spatiotemporal characteristics, this paper proposes a real-time
dispatch method under the electricity-carbon joint market based dD Index and set of battery degradation
on the two-layer multi-agent deep reinforcement learning cycle
(MADRL). Firstly, the EVs are grouped into the fleets, whose RO Set of roads
dispatch model is then constructed, considering the W Set of weights for roads
charge/discharge arbitrage benefits, carbon trading benefits, jJ Index and set of decision steps
spatial transfer and capacity degradation costs of EVs. Secondly, L arr arr
I,j , C I,j Set of LMP and LME at all charging
the dispatch problem is described as a Markov game, and the stations upon the ith EV’s arrival
model is solved through a two-layer MADRL framework to obtain Sch I, Achi Set of states and actions of
the discrete spatial transfer decisions and continuous charging/discharging decision layer
charging/discharging decisions synergistically. Finally, extensive for the ith agent
case studies are developed with the real-world locational marginal Smi , Ami Set of states and actions of spatial
price data and the location information of 30 charging stations of transfer decision layer for the ith
San Diego in US California to verify the validity of the proposed agent
scheme. Simulation results show that the proposed method Parameters:
facilitates the arbitrage strategies to obtain average daily revenue
t Interval of a time step
of $1,968.7 in the manner of its spatiotemporal flexibility.
cbattery Cost coefficient of capacity
Index Terms—electric vehicle, electricity-carbon joint market,
degradation
multi-agent deep reinforcement learning, spatiotemporal dispatch. ctra Cost coefficient of spatial transfer
Pmax Maximum power of the batteries
NOMENCLATURE Eini Initial capacity of batteries
Acronyms:  sei,  sei Coefficients of solid electrolyte
EV Electric vehicles interface film
PDN Power distribution network SOCmax Upper limit of the SOC
TN Transportation network SOCmin Lower limit of the SOC
V2G Vehicle-to-Grid  tra,  tra Retardation coefficients
PTN Power-transportation network ct Carbon quota price
LMP Locational marginal price Lev Maximum mileage of the EV per unit
DRL Deep reinforcement learning electric quantity
MADRL Multi-agent deep reinforcement Er Maximum carbon emissions of fuel
learning vehicles driving 1 km
MAPPO Multi-agent proximal policy cSOC Penalty coefficient of SOC constraint
optimization violation
LME Locational marginal emission Variables:
MG Markov game Ra Arbitrage benefit from charging and
MATD3 Multi-agent twin delayed deep discharging
deterministic policy gradient RE Carbon trading benefit
SOC State of charge Cbattery Battery capacity degradation cost

Dong Han, Huarui Zhang, Jiawen Peng and Zhuoxin Lu are with the Corresponding author: Huarui Zhang. (Address: No.516 Jungong Road,
Department of Electrical Engineering, University of Shanghai for Science and University of Shanghai for Science and Technology, Yangpu District,
Technology, Yangpu District, Shanghai 200093, China. (email: Shanghai, China).
[email protected]; [email protected]; [email protected]; The paper has not been presented at a conference or submitted elsewhere
[email protected]). previously.
Xijun Ren is with Institute of Economy and Technology of State Grid Anhui
Electric Power Co., Ltd., Hefei 230022, Anhui Province, China. (email:
[email protected]).

horized licensed use limited to: Ballari Institute of Technology & Management (formerly Bellary Eng College). Downloaded on March 17,2025 at 09:04:18 UTC from IEEE Xplore. Restrictions app
© 2025 IEEE. All rights reserved, including rights for text and data mining and training of artificial intelligence and similar technologies. Personal use is permitted,
but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Transactions on Transportation Electrification. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TTE.2025.3549747

Ctra Spatial transfer cost battery


i Cost coefficient related to charging
n,t,i Binary variable represents whether and discharging power of ith EV
the ith EV stays at charging station n Estart
d , Eend
d Remaining capacity before and after
at time step t the dth cycle
ptn,t LMP of charging station n at time
step t
Pn,t,i Charging/discharging power of the 𝑖th I. INTRODUCTION

R
EV at charging station n at time step t egarding the continuing increase of carbon emissions
Eloss
i Degradation capacity of the 𝑖th EV and the growth of energy consumption, low-carbon
ttra
i Transfer time of the ith EV economy has become the focus of governments and
ch, dis Charging and discharging efficiency research institutions worldwide. In this regard, the
Pch, Pdis Charging and discharging power decarbonization of transportation sector is particularly
fd Comprehensive stress factor of cycle essential, and it is significant to achieve the low-carbon
d behaviors of vehicles in the area of transportation. Electric
Eloss', Eloss Capacity loss of the battery before vehicle (EV), as a low-pollution and spatially flexible
and after the current period transportation technology, has gradually received much
SOCt,i State of charge of the 𝑖th EV at time attention [1]. In 2023, global EV sales reached 14 million, and
step t it is projected to reach 17 million in 2024 [4].
rnm Road from station n to station m
Free-flow travel time of road rnm A. Motivation
tr0nm
The widespread adoption of EVs has intensified the
C a p rnm , xrnm ,t Capacity and the real-time traffic interdependence between the power distribution network
volume of road rnm at time step t (PDN) and the transportation network (TN) [5]. At the same
nm,t,i Binary variable whether the ith EV time, the rapid proliferation of EVs leads to the exposure of the
travel from charging station n to weaknesses in the structure and operation of PDN, which is
charging station m at time step t unable to adequately meet the sharply increasing charging
M ev1
i, Carbon emission quota of the ith EV demands of users [6]. On the one hand, EVs can conduct spatial
at time step t transformation with their characteristics of transportation. On
M ev2
i,t Carbon emission generated by the ith the other hand, the development of Vehicle-to-Grid (V2G)
EV at time step t technology has enabled EVs to not only consume electricity but
en,t LME of charging station n at time also act as storages and providers of electricity through
step t bidirectional charging and discharging. Therefore, leveraging
CSi,j Charging station of the ith EV at the coordinated scheduling in the power-transportation network
decision step j (PTN), the aggregators manage EVs in the way of fleets, which
tj Current time step at decision step j integrates and optimizes their charging/discharging and spatial
SOCi,j Current SOC of the ith EV at decision transfer behaviors across various scenarios. This approach
step j effectively meets the operational needs of the PTN, achieving
si,j ,e si,j LMP and LME of charging station its low-carbon and efficient operation [7].
which the 𝑖th EV is located at
decision step j B. Literature review
Pi,j Charging/discharging power of ith Current research has already explored various applications
EV at decision step j of EVs across multiple scenarios. Due to their spatiotemporal
chi,j , dis
i,j Charging and discharging efficiency flexibility and the development of V2G technology, EVs are
of ith EV at decision step j primarily applied in fields such as renewable energy integration
rmi,j Reward for the spatial transfer [10], enhancing resilience of PDN [12], and providing auxiliary
decision layer of the ith EV at services to the grid, including peak shaving [14], voltage
decision step j regulation [15], and local congestion relief [16]. [17] propose a
rchi,j Reward for the discharging/charging heuristic algorithm-based discrete charging and discharging
decision layer of the ith EV at dispatch method for EVs, which effectively filled the valley
decision step j load of the grid. In [18], the authors introduce a mutually
rtra
i,j Spatial transfer cost of the ith EV at beneficial operational framework for virtual power plants and
decision step j EV CSs, coordinating multiple stakeholders to reduce EV
roci,j, roc’i,j Punishment of constraint violation of charging costs. The above studies have delved into the flexible
the ith EV at decision step j charging and discharging strategies of EVs in PDN, but EVs
rai,j Arbitrage benefit of the ith EV at also need to operate in complex TN generally. Therefore, in
decision step j order to fully explore the spatial flexibility of EVs, achieve
rEi,j Carbon trading benefit of the ith EV more efficient energy utilization and more stable grid operation,
at decision step j it is necessary to conduct research on the collaborative control
rbattery
i,j Capacity degradation cost of the ith strategy of EVs in PTN. The authors in [19] design a fast-
EV at decision step j charging navigation strategy for EVs based on weighted pricing

horized licensed use limited to: Ballari Institute of Technology & Management (formerly Bellary Eng College). Downloaded on March 17,2025 at 09:04:18 UTC from IEEE Xplore. Restrictions app
© 2025 IEEE. All rights reserved, including rights for text and data mining and training of artificial intelligence and similar technologies. Personal use is permitted,
but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Transactions on Transportation Electrification. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TTE.2025.3549747

in coupled networks, achieving coordinated economic TABLE I


operation of EVs in the PTN. [20] develop a method based on COMPARATIVE FEATURE OF RECENT SIMILAR RESEARCHES
genetic algorithm for the planning of charging infrastructure,
vehicle dispatch, and charging management of battery electric P P Electricity Electricity- Real-
buses, which improved the economic benefits of the operation D T market carbon time
of battery electric buses. Great achievements have been made N N joint dispatch
in the above works. Nonetheless, there still exist some market
limitations. In practical scenarios, variations in EV charging [17,18,22, √ √
demand, the electricity prices, battery status, and traffic 23]
conditions are much more complex. Requiring the development [19,20] √ √
of real-time scheduling strategies that respond to dynamic [21,29] √ √ √
charging demands and time-varying electricity prices. A real-
[24] √ √ √
time dispatch strategy for EVs in PTN that accurately perceive
environmental changes should be considered. [25] √ √ √ √
The flexible charging services for EVs have brought [28] √ √ √
considerable benefits. [21] develop a hierarchical trading [30] √ √
energy framework to induce and coordinate EVs charging This paper √ √ √ √ √
demand and distributed energy resources (DER) generation in characterization of uncertainties such as traffic conditions, load
local distribution networks, which benefit both EV owners and distribution, renewable energy distribution, and electricity
DER investors through secure local energy trading. In [22] prices. The model-driven solution method uses a scenario-
authors propose a dynamic control strategy for charging EVs in based stochastic programming method to deal with uncertainty.
response to regulation signals, and the results show that this This method only captures a small number of representative
strategy can significantly improve the profitability of EV scenarios, resulting in insufficient representative results. Due to
charging control. Another work refers to [23] in which a the existence of a large number of integer variables, the solution
coordinated operation strategy between EV CSs and complexity of the model-based algorithm will increase
distribution system operators is designed, and integrates a peer- exponentially with the increase in the size of the problem,
to-peer (P2P) trading model based on Nash bargaining game resulting in low efficiency. Given the bottleneck of model-
theory to maximize the profits of participating entities. The driven solving methods, previous studies have shown that deep
above study has conducted in-depth research on the economic reinforcement learning (DRL), as a data-driven method,
benefits of EVs in the electricity market. However, the impact presents an excellent performance in dealing with uncertainty
of carbon quotas on the spatiotemporal dispatch strategy of EVs and complex modeling problems [27]. For instance, one study
has not been considered. With the continuous improvement of propose an EV fleet charging strategy based on DRL to prevent
carbon trading mechanisms, research should be conducted on grid overload caused by disorderly EV fleet charging [28].
the economic viability of EVs in the electricity-carbon joint Another work, [29], design a joint charging and order dispatch
market. Authors in [24] develop a probabilistic carbon footprint scheme for large-scale shared EV fleets based on DRL, aiming
management strategy. The direct and indirect carbon emission to maximizing the benefits of fleet operators. In actual research,
is restricted by a chance-constrained carbon footprint it is necessary to carefully characterize multiple EV
management model from the supply and demand sides. In [25], spatiotemporal dispatch strategies. Therefore, multi-agent deep
a multistage low-carbon EV charging facilities planning model reinforcement learning (MADRL), as an extended algorithm of
is adopted for the PTN, including carbon emission amount from DRL, is widely used for routing problems in PTN. The authors
the perspective of consumption side is calculated by carbon in [30] introduce a hierarchical MARL to handle the problem
emission flow. The works above set up carbon caps to restrict of effectively coordinating the dispatch of RCs towards system
carbon emissions of individual power lines, which may resilience. Currently, most DRL algorithms for EVs only
overlook uncertainty and system dynamics. Therefore, we consider a single action space. However, multiple decision-
adopted a carbon emission estimation method based on LMP, making actions such as discrete spatial transfer actions and
which more directly characterizes the spatiotemporal continuous charging/discharging actions are required to make
differences in carbon emissions. precise spatiotemporal decisions for EVs in PTNs. Therefore, a
In terms of model solving, the charging/discharging multi action space MADRL algorithm is needed to achieve
behaviors as well as spatial transfer behaviors of EVs exhibit comprehensive optimization of EV spatiotemporal dispatch.
randomness and unpredictability [26]. The spatiotemporal
dispatch problem of EVs in PTN requires accurate C. Contributions
To fulfill the mentioned research gaps, this paper aims to
solve the problem of real-time spatiotemporal dispatch of EVs
with V2G in the electricity-carbon joint market, so as to
maximize the benefits of their aggregator. The differences
between this paper with recent research works can be
summarized in Table I. The major contributions of this paper
are threefold:
1) Traffic information network, energy information
network, and carbon emission information network are

horized licensed use limited to: Ballari Institute of Technology & Management (formerly Bellary Eng College). Downloaded on March 17,2025 at 09:04:18 UTC from IEEE Xplore. Restrictions app
© 2025 IEEE. All rights reserved, including rights for text and data mining and training of artificial intelligence and similar technologies. Personal use is permitted,
but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Transactions on Transportation Electrification. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TTE.2025.3549747

constructed in this paper to meet the information needs of real- in Section V.


time low-carbon spatiotemporal dispatch of EV fleets in the
PTN. We develop the optimization model considering the II. Mathematical models
electricity and carbon joint trading, routing, and battery
characteristics of EVs with the objective of maximizing the net A. Problem setting
benefits of EV aggregators. In the electricity-carbon joint market, the mechanism for the
2) A collaborative solution framework for discrete action low-carbon spatiotemporal optimization dispatch of EVs is
space and continuous action space based on two-layer MADRL shown in Fig 1. Within the PTN, EVs can obtain spatial transfer
is designed. A sequential training strategy is proposed to ensure time information from traffic information network, LMP data
training sufficiency and model convergence. from energy information network and locational marginal
3) Through a simulation in real-world scenario, the emission (LME) data from carbon emission information
stability of the proposed method in making real-time decisions network, as well as their own energy storage battery
in response to electricity price changes, traffic congestion, etc. information. All EVs transmit global observations to the
is evaluated, and the scalability of the method over a long-term aggregator, which utilizes a two-layer MADRL algorithm to
scale is verified. generate spatial transfer decision and charging/discharging
decision, thereby guiding the behavior for EVs in both the TN
D. Organization of the paper and PDN. The spatial transfer decision determines charging
The rest of this paper is organized as follows. Section II station (CS) where the EV will be located at the next time step,
presents the spatiotemporal dispatch mechanism of EV fleets in while the charging/discharging decision determines the EV's
the electricity-carbon joint market. It explains the charging/discharging amounts at the current time step. The
spatiotemporal dispatch model of EVs that takes into account aggregator can not only direct EVs to discharge with high LMP
carbon trading costs, spatial transfer costs, capacity decay costs, and charge with low LMP at CS, thus participating in the
and arbitrage benefits. Section III describes the two-layer electricity market to gain arbitrage benefit, but also participate
MADRL solution framework and sequential training process. in the carbon trading market according to the allocated carbon
Section IV provides the case study, and conclusions are drawn emission quotas of EVs and real-time carbon emissions.
Spatial transfer time

Road Traffic information network


Energy flow
Carbon emission flow
Spatial transfer
Global observation EVs decision

Electricity trading Energy information network


Carbon trading
DG
Two-layer Charging/discharging
decision LMP
MADRL DG

Electricity-
...

carbon joint Benefits/costs


market Aggregator
Carbon emission information network
Policy
Agent DG
DG
Charging/discharging
decision

LME

Fig. 1. Spatiotemporal dispatch mechanism of EVs.


The basic settings of the spatiotemporal dispatch of EVs are from the power flow. Therefore, the LME can be estimated by
supposed as follows: LMP as the preconditions for calculation.
1) Several EVs form an EV fleet, EVs assigned to same fleet
B. Model formulation
execute the same policy. EV fleets can share observation
information and participate in the electricity-carbon joint 1) Objective function
market through centralized dispatching of aggregator. The goal Considering the arbitrage benefits from EVs charging and
of aggregator is to maximize their total benefits. 2) EVs support discharging, carbon market trading benefits, battery capacity
V2G technology. When stationed at CS, EVs can perform degradation costs, and spatial transfer costs, the objective
charging, discharging, or idle operations. While moving within function is established with the objective of maximizing the net
the TN, EVs will experience energy loss. 3) All the roads benefits of EVs in the electricity-carbon joint market.
connecting CSs in the TN are bidirectional, and EVs are not max f  R a  R E  C battery  C tra (1)
allowed to change their destination while moving. 4) The CSs Eq. (1) represents the net benefits of EVs for a single
are connected to the PDN, and the charging and discharging dispatch period. 𝑅 represents the arbitrage benefits of EVs
prices are measured by LMP. Carbon emissions are only from the charging and discharging, which can be calculated
generated during the charging of EVs and measured by LME, with Eq. (2). 𝑅 represents the carbon trading benefits of EVs,
which is the incremental carbon emissions of the system which can be calculated with Eq. (3). 𝐶 represents
brought by increasing unit load. 5) Both LMP and LME come battery capacity degradation costs of EVs, which can be

horized licensed use limited to: Ballari Institute of Technology & Management (formerly Bellary Eng College). Downloaded on March 17,2025 at 09:04:18 UTC from IEEE Xplore. Restrictions app
© 2025 IEEE. All rights reserved, including rights for text and data mining and training of artificial intelligence and similar technologies. Personal use is permitted,
but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Transactions on Transportation Electrification. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TTE.2025.3549747

calculated with Eq (4). 𝐶 represents spatial transfer costs of step t is:


EVs, which can be calculated with Eq (5). SOCt ,i  SOCmin (8)
T
R    n ,i ,t  P
a pt
t (2)
SOCt ,i  SOCmax
t  0 iI nN
n ,t n ,i ,t (9)
T where 𝑆𝑂𝐶 , represents the state of charge of the 𝑖th EV during
R E    RiE,t (3) time step 𝑡. Constraint (8) limits the lower limit of the SOC for
t  0 iI
the 𝑖 th EV at time step 𝑡 , reflecting the range anxiety of EV
C battery
 cbattery  Eiloss (4) owners. Constraint (9) limits the upper limit of the SOC for the
iI 𝑖 th EV at time step 𝑡 , preventing safety hazards due to
Ctra  ctra  titra (5) overcharging.
iI To ensure the normal operation of EVs in the next dispatch
where  , , {0,1} decides whether the 𝑖th EV stays at CS𝑛 at period, the SOC of the EVs at the end of the cycle must be the
time step 𝑡.  , represents the LMP at CS𝑛 at time step 𝑡. 𝑃 , , same as the initial SOC:
represents the charging/discharging power of the 𝑖th EV at time SOCtmax 1,i  SOCinit (10)
period 𝑡 . 𝑃 , , 0 represents discharging. 𝑃 , , 0 where 𝑆𝑂𝐶 represents the SOC of the 𝑖th EV at initial time
represents charging. 𝑅 , represents carbon market trading step.
benefit of the 𝑖 th EV at time step 𝑡 . 𝐸 represents the b) Traffic information network
degradation capacity of the 𝑖th EV. 𝑐 represents the cost Traffic information network provides spatial transfer time of
coefficient of capacity degradation. 𝑐 represents unit time all roads in TN for EVs. The traffic constraints are as follows:
spatial transfer cost of the ith EV. 𝑡 represents the transfer The network topology characteristic of TN can be
time of the ith EV. effectively modeled by graph theory. Let the connected directed
2) Constraints graph 𝐺 𝑁, 𝑅𝑂, 𝑊𝑇 represent the traffic information
In order to meet the information requirements of EVs real- network, where 𝑁 and 𝑅𝑂 represent the node set and arc set of
time spatiotemporal dispatch, this paper constructs energy the graph, respectively. The arcs connecting nodes 𝑛
information network, traffic information network and carbon 1, 2, … , 𝑁 and nodes 𝑚 1, 2, … , 𝑁 are denoted as 𝑟 . Let
emission information network. The above three networks and 𝑊 𝑇 , 𝑟 𝑅𝑂 be the set of weights of the roads,
constraints are composed as follows: reflecting transfer time from CS 𝑛 to CS 𝑚 at time step t. 𝑇 ,
a) Energy information network denotes the time required for EVs to travel on road 𝑟 at time
Energy information network provides LMP of all CSs in step 𝑡.
PDN for EVs. The energy constraints are as follows: To accurately calculate the spatial transfer cost, Bureau of
The constraint on the charging and discharging power of the Public Roads (BPR) function is used to compute the spatial
𝑖th EV at time step t is below. transfer time, achieving a thorough consideration of real-time
n,i,t Pmax  Pn,i,t  n,i,t Pmax (6) traffic congestion in the network. The travel time on road r at
time step t is expressed as:
where 𝑃 represents the maximum power of EVs.
The capacity degradation of the battery depends on xrnm ,t tra
Trnm ,t  tr0nm [1   tra ( ) ] (11)
environmental temperature, depth of discharge (DOD), SOC, Caprnm
and battery operating time. Inspired by the reference [31], a
semi-empirical battery capacity degradation model is used to where 𝑡 is the free-flow travel time of road 𝑟 , which
calculate the capacity loss of energy storage batteries within one depends on the length of road. 𝐶𝑎𝑝 , 𝑥 , correspond to
cycle. capacity and the real-time traffic volume of road 𝑟 at time
 ini step 𝑡 , respectively; 𝛼 , 𝛽 correspond to the retardation
 E [1   exp(    f d )
sei sei
coefficients. The spatial transfer time of the ith EV for a
 d
dispatch period is:

E loss    (1   sei ) exp(   f d )], E loss  0 (7) T
 d t itra    nm ,i ,t Tr ,t (12)
 E ini  ( E ini  E loss ' ) exp( 
nm

  f d ), E loss '  0 t 0
d where  , ,  0,1 represents whether the 𝑖th EV travels from
where 𝐸 and 𝐸 represent the capacity loss of the energy CS 𝑛 to CS 𝑚 at time step 𝑡.
storage battery before and after the current period, respectively. The spatiotemporal dispatch rules for EVs refer to the
𝛼 and 𝛽 are the coefficients of the solid electrolyte spatial transfer constraints of energy storage in reference [32].
interface film when the battery is formed. 𝑓 is the Additionally, to ensure the proper dispatch of EVs in the next
comprehensive stress factor of cycle 𝑑 related to temperature, dispatch period, EVs must return to their initial CS at the end of
DOD, SOC and battery operating time, which is obtained by the dispatch period. The ith EV must satisfy constraint (13):
method proposed in reference [32]. CS start ,1,i  CS start ,T ,i  1 (13)
To ensure the proper operation of EVs, the SOC of any EV i i

at any time step must not exceed the upper and lower limits of where 𝐶𝑆 represents the initial CS where 𝑖 th EV was
the SOC of batteries. The SOC constraint of the 𝑖th EV at time located.
c)Carbon emission information network

horized licensed use limited to: Ballari Institute of Technology & Management (formerly Bellary Eng College). Downloaded on March 17,2025 at 09:04:18 UTC from IEEE Xplore. Restrictions app
© 2025 IEEE. All rights reserved, including rights for text and data mining and training of artificial intelligence and similar technologies. Personal use is permitted,
but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Transactions on Transportation Electrification. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TTE.2025.3549747

Carbon emission information network provides LME of all The state space of the ith EV in the charging/discharging
CSs in PDN for EVs. The carbon emission constraints are as decision layer at decision step j is expressed as:
follows: sich, j  {CSi, j , t j , SOCi, j , is, j , eis, j } Sich (18)
EVs aggregator own the total carbon quotas of EVs and
participate in carbon trading market. The carbon trading where  , and 𝑒 , represent the LMP and LME of the CS where
benefits of the 𝑖th EV at time step 𝑡 is: the 𝑖th EV is located at decision step 𝑗, respectively.
T 2) Action space
RiE,t    ct ( M iev,t 1  M iev,t 2 ) (14) a) Spatial transfer decision layer
iI t  0
The action of the 𝑖th EV in the spatial transfer decision layer
where  represents the carbon quota price. 𝑀 , represents at decision step 𝑗 is to select the CS at the next decision step,
the carbon emission quota of the 𝑖th EV at time step 𝑡, which which is given as follows:
can be calculated with Eq. (15). 𝑀 , represents the carbon aim, j  {CSinext
, j } Ai
m
(19)
emission generated by the 𝑖th EV charging behavior at time step
𝑡, which can be calculated with Eq. (16). b) Charging/discharging decision layer
The action of the 𝑖th EV in the charging/discharging
M ev1
i ,t  Pn,i,t L E
ev r
(15) decision layer at decision step j is charging/discharging power
at the current CS, which is given as follows:
en,t Pn,i ,t t , Pn,i ,t  0
M tev 2   (16) aich, j  {Pi, j } Aich (20)
0, else
3) Transition function
The 𝑖th EV performs actions according to the state 𝑠 , at
where 𝐿 represents the maximum mileage of an electric
decision step j and interacts with the environment to transition
vehicle per unit of electricity. 𝐸 represents the maximum
to the next state 𝑠 , . The traffic information, LMP, and LME
carbon emissions of an ordinary fuel vehicle per kilometer. 𝑒 ,
are updated based on the initial input dataset, while the update
represents the LME of CS 𝑛 at time step 𝑡.
methods for CSs where the agent is located, current time step,
and SOC are as follows:
III. Two-layer Multi-agent Deep Reinforcement Learning
Model of EVs CSi , j 1  aim, j (21)
In order to solve the spatiotemporal dispatch problem of t j  1, CSi , j  aim, j
EVs considering carbon trading with MADRL algorithm, it is t j 1   (22)
t j  Ti, j (ai, j ), else
tra m
necessary to express the problem in Section 2 as a Markov game
(MG) tuple: 𝐼, 𝑆 , 𝐴 , 𝑷, 𝑹,  , where 𝐼 represents the
number of agents; 𝑆 represents the joint states of the 𝑖 th
where 𝑇 , 𝑎 , represents the time required for the 𝑖th EV to
agent; 𝐴 represents the joint actions of the 𝑖 th agent; 𝑷
represents the state transition probability of the agents; 𝑹 move from current station 𝐶𝑆 , to 𝑎 , .
represents the global cumulative reward function; represents If the agent chooses to make spatial transfer decision, the
the reward discount factor [33]. SOC will be updated based on the energy consumed by the EV
during the movement, which is given as follows:
A. Algorithm Model
LSOC m
i , j (ai , j )
In the spatiotemporal dispatch problem of EVs, each EV SOCi, j 1  SOCi, j  (23)
performs two types of decisions, which are the discrete spatial Eini
transfer decisions and continuous charging/discharging where 𝐿 , 𝑎 , represents the energy lost for the 𝑖 th EV to
decisions. To coordinate the discrete and continuous actions of move from current station 𝐶𝑆 , to 𝑎 , .
EVs, a two-layer decision-making framework based on the
If the agent chooses to make charging/discharging decision,
MADRL algorithm is established in this paper. The MG tuple
the SOC will be updated based on the amount of energy charged
for the EV agents is defined as follows:
or discharged by the EV, which is given as follows:
1) State space
a) Spatial transfer decision layer  aich, j  t ch
SOCi , j  i , j  , ai , j  0
ch
The state space of the 𝑖th EV in the spatial transfer decision E ini

layer at decision step 𝑗 is expressed as: SOCi , j 1   (24)
SOC  1  ai , j  t ,ach  0
ch
sim, j  {CSi , j , t j , SOCi, j , Ti,r j , Larr
i , j , Ci , j }  Si
arr m
(17)
 i, j
idis E ini
i, j
where 𝐶𝑆 , represents the current CS where the 𝑖 th EV is  ,j

located at decision step 𝑗 . 𝑡 represents the current time step, 4) Reward function
and 𝑆𝑂𝐶 , represents the current SOC of the 𝑖th EV at decision a) Spatial transfer decision layer
step 𝑗. The vector 𝑻 , represents the required time for the 𝑖th The reward function for the spatial transfer decision layer is
the sum reward values for the spatial transfer decisions of all
EV to move to the remaining CSs at decision step 𝑗. The vectors
EVs. The reward of the 𝑖th EV in the spatial transfer decision
𝑳 , and 𝑪 , represent the LMP and LME at all CSs upon the
layer at decision step 𝑗 is:
ith EV’s arrival, respectively.
b) Charging/discharging decision layer

horized licensed use limited to: Ballari Institute of Technology & Management (formerly Bellary Eng College). Downloaded on March 17,2025 at 09:04:18 UTC from IEEE Xplore. Restrictions app
© 2025 IEEE. All rights reserved, including rights for text and data mining and training of artificial intelligence and similar technologies. Personal use is permitted,
but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Transactions on Transportation Electrification. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TTE.2025.3549747

rim, j  rich convergence speed and relatively straightforward parameter


, j  ri, j  ri, j
tra oc
(25)
tuning.
, j  c Ti , j (CSi , j )
ritra tra tra next
(26) To improve sample efficiency, the MAPPO algorithm
employs importance sampling techniques. For each agent, it
c Ti , j (CSi ,0 ), CSi ,T  CSi ,0
tra tra

,j  
rioc (27) utilizes two Actor networks: Actor(new) as the network
0, else being optimized and Actor(old) as fixed network to collect
where, 𝑟 , represents the reward of the ith EV in the data and estimate the new policy. After a certain batch of
discharging/charging decision layer at decision step 𝑗 , which updates, Actor(new) is synchronized with Actor(old) to
can be obtained by Eq. (28); 𝑟 , represents spatial transfer cost reuse the same batch of training data. The optimization
of the 𝑖th EV at decision step 𝑗; 𝑟 , represents the cost for the objective is:
𝑖th EV to return to the initial CS at the end of dispatch period  
J old ( )  ( s , a )    Aold ( s, a)  (34)
under the constraint (13). old
  old 
b) Charging/discharging decision layer
The reward function for the charging/discharging decision To ensure training stability, the network parameters  and
layer is the total reward value for the charging/discharging old input the same state and produce similar action
decisions of all EVs. The reward of the 𝑖 th EV in the probability distributions. The MAPPO algorithm uses a
charging/discharging decision layer at decision step 𝑗 is: clipping function to constrain the update speed of the policy
network, ensuring that the new policy remains close to the
, j  ri , j  ri , j  ri , j  ri , j
rich a E oc ' battery
(28)
old policy, which is given as follows:
ria, j  is, j Pi , j t (29)  
 old
J PPO ( )   min[  A old , clip (  ,1   ,1   ) A old ] (35)
 
 ctj Pi, j (Lev Er  eis, j t ), Pi, j  0 ( s ,a )  old  old

ri,Ej   ct (30) where 𝑐𝑙𝑖𝑝 represents clipping function used to constrain


 j Pi, j L E , Pi, j  0
ev r
𝜋 /𝜋 between 1 𝜀 and 1 𝜀 when  and  differ
 c SOC SOCi , j 1  SOCi ,0 , j  J significantly. 𝜀 represents parameter related to the clipping
,j  
rioc '
(31) amplitude. The network structure of the MAPPO algorithm
 0, else is shown in Fig. 2.
where 𝑟 , and 𝑟 , represent the arbitrage and the carbon trading Actor Network Critic Network
Value a1
benefits from charging/discharging for the ith EV at decision Network Value Network

step j, respectively. 𝑟 , is the penalty term arising from  old Trajectory Q w


S1,τ  old buffer
constraint (10), which represents the cost of 𝑖th EV to adjust the Update
Advantage
Target
SOC to the initial SOC at the end of dispatch period. 𝑟 , a1,τ Network Clear
Clip Critic_loss
function
Update
represents the battery capacity degradation costs of the 𝑖th EV  new Update
Environment

at decision step 𝑗, which can be calculated with Eq (32). 𝑐 1 2 1 2 1


{[ s , s ],[a , a ], rt ,[ s , s ]}
t t t t t 1
2
t 1
Agent 1

represents the cost coefficient for SOC variation. 𝐽 represents Sample update
Replay buffer Update parameters
Agent 2
the max decision step. Value a2
To calculate the capacity degradation costs at each decision Network Value Network

step, the degradation costs is calculated based on the S2,τ  old


Trajectory Q w
Update  old buffer
charging/discharging power at each decision step inspired by a2,τ Advantage
recent work in [32], which is given as follows: Target
Network Clear
Clip Critic_loss
function
Update
ribattery
,j   ibattery Pi , j (32)  new Update
Actor Network Critic Network
Edstart  Edend Fig. 2. Structure of MAPPO-Network.
 ibattery  cbattery (33)
 j 1 Pi, j 2) Charging/discharging decision layer
Td

The charging/discharging decision layer adopts the Multi-


where 𝛼 represents the cost coefficient related to Agent Twin Delayed Deep Deterministic Policy Gradient
charging/discharging power; 𝐸 and 𝐸 are the (MATD3) algorithm to decide on the agents'
remaining capacity before and after the 𝑑th cycle, respectively. charging/discharging actions. MATD3 is also based on the
𝑇 represents 𝛼 being updated every 𝑇 steps. Actor-Critic framework, where two Critic networks can be used
to estimate the value of actions. It improves algorithm stability
B. Algorithm implementation by delaying updates to the Actor network during the learning
1) Spatial transfer decision layer process.
The spatial transfer decision layer adopts the MAPPO In the MATD3 algorithm, each agent uses two Critic
algorithm to govern the agents’ spatial transfer behavior. networks to estimate the value of actions and selects the
MAPPO is an extension of the PPO algorithm designed to minimum value between them. The error computation
address policy optimization in multi-agent systems. It is formula is as follows:
based on an Actor-Critic architecture, known for its faster

horized licensed use limited to: Ballari Institute of Technology & Management (formerly Bellary Eng College). Downloaded on March 17,2025 at 09:04:18 UTC from IEEE Xplore. Restrictions app
© 2025 IEEE. All rights reserved, including rights for text and data mining and training of artificial intelligence and similar technologies. Personal use is permitted,
but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Transactions on Transportation Electrification. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TTE.2025.3549747

1 N make spatial transfer decision, and the MAPPO algorithm's


 (min j 1,2 Q ( si , ai  j )  y )2
N i 1
x loss  (36) network parameters are updated to maximize the advantage
function to achieve agent training. Finally, aggregator can use
where 𝑦 represents the estimation of the target value. To the trained charging/discharging decision model and spatial
prevent the instability in learning and local optimal policies transfer decision model to make decisions for the agents, which
caused by the dependency between Critic networks and can effectively reduce communication and computational
target values, the MATD3 algorithm calculates target values burdens during execution.
using target networks periodically updated by copying the
Actor and Critic networks. The formula for calculating the IV. Case studies
target value is:
y  r   min j 1,2 (Q j ( si 1 ,  ( si 1 |   ) |  q   )) (37) A. Experimental setting
j
During the training phase, LMP data, LME data, and traffic
where 𝑄 and 𝑄 represent the target Critic networks, 𝜇 information from 30 CSs in the San Diego area of California
represents the target Actor network. 𝜃 , 𝜃 , 𝜃 represent from 2020 to 2022 are taken as an example to be simulated in
target network parameters. 𝜎 represents the noise estimate this paper. The LME data is estimated from the LMP at the
for smoothing. The network structure of the MATD3 corresponding times and the geographical locations of the CSs
algorithm is shown in Fig. 3. are shown in Fig. 5. LMP data can be downloaded from the
Actor Network Update Critic Network California Independent System Operator (CAISO) website [34],
V
Value
Optimizer
Q Value
Network1
Value
Network2
and free-flow travel time information can be obtained by the
Network
  ,1  q1 ,1  q2 ,1 Google Maps Developer Platform API [35]. The parameter of
S1,τ Update
a1 Soft Update Update Soft estimation method for LME are provided in Appendix A.
update LOSS update
Target Target Target During the execution phase this paper selects LMP data, LME
y
a1,τ Network a1 Network1
 q ,1
Network2
 q ,1
data, and traffic information from typical days at the 30 CSs to
  ,1 1 2
verify the reliability of the trained model. This paper assumes
Environment

{[ s1t , st2 ],[at1 , at2 ], rt ,[ s1t 1 , st21 ]}


Agent 1 that the aggregator has 3 EV fleets, each consisting of 10 EVs.
Sample update
Replay buffer Update parameters According to the description in section 2.1, EVs in same fleet
Agent 2
Update V
execute the same policy. Therefore, there are only 3 agents in
Value Q Value Value
Network Optimizer Network1 Network2 the spatiotemporal dispatch problem of EVs. The basic
S2,τ   ,2  q1 ,2  q2 ,2 parameter settings of this paper are provided in Appendix B.
Update
a2 Soft Update Update Soft The MADRL algorithm is implemented in Python 3.7. The
update LOSS update
a2,τ
Target Target y Target MATD3 network framework is built by TensorFlow2.6.0, and
Network a2 Network1
 q ,2
Network2
 q ,2 the MAPPO network framework is built by
  ,2 1 2

Actor Network Critic Network


PyTorch1.13.0+CUDA 11.7.0. The computational platform's
hardware consists of an Intel Core [email protected] and 16
Fig. 3. Structure of MATD3-Network. GB RAM.
C. Algorithm implementation B. Analysis of algorithm performance
The spatial transfer decision layer adopts the MAPPO 1) Comparisons with DRL algorithm
algorithm to govern the agents’ spatial transfer actions, while Based on the process illustrated in Fig. 4, this paper first
the charging/discharging decision layer adopts the Multi-Agent employed MATD3 to train the charging and discharging
Twin Delayed Deep Deterministic Policy Gradient (MATD3) decision agents, followed by the use of MAPPO to train the
algorithm to decide on the agents' charging/discharging actions. spatial transfer agents. The training results of the charge-
To improve the training efficiency of the proposed MATD3- discharge decision layer and the space transfer decision layer
MAPPO framework, a sequential training method is proposed were compared with the typical continuous action space
to train two groups of agents. The training and execution phase decision algorithm and the typical discrete action space
of the MATD3-MAPPO framework is shown in Fig. 4. Firstly, decision algorithm respectively. The resulting reward curves
the MATD3 model is trained. The charging/discharging are shown in Fig. 6 and Fig. 7, where the light-colored lines are
decision agents will interact with the environment to update the the true reward after smoothing, and the dark-colored lines are
network parameters, eventually obtaining and saving the the average reward. The reward curve goes through three stages.
trained charging/discharging decision model. Secondly, the The first stage is before 1000 episodes, which is a random
MAPPO model is trained. During training, each agent randomly decision-making stage. In this stage, the replay buffer does not
selects an initial CS and interacts with the designed meet the minimum sampling requirements, and the agents
environment. The agent first determines whether to stay at the randomly generates decisions, and the reward curve fluctuates
current CS. If stay, the agents will use trained greatly. The second stage is the learning stage, when the replay
charging/discharging decision model to make the buffer is full, agents learn and update their strategies, and the
charging/discharging decisions. Otherwise, the agents will

horized licensed use limited to: Ballari Institute of Technology & Management (formerly Bellary Eng College). Downloaded on March 17,2025 at 09:04:18 UTC from IEEE Xplore. Restrictions app
© 2025 IEEE. All rights reserved, including rights for text and data mining and training of artificial intelligence and similar technologies. Personal use is permitted,
but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Transactions on Transportation Electrification. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TTE.2025.3549747

Training phase

Initialize environment Select initial stations Obtain charge/ discharge decisions


Obtain global rewards and transition to the
and network parameters for each agent based on each agent's state
next state.
No Yes No
Check if the maximum Check if the maximum time
iterations has been reached period has been reached
Yes

s ch
j , a ch ch ch
j , r j , s j 1 
Store state, action, reward, and next state into the replay buffer until full
Save the charge and discharge decision layer model
Sample from the replay buffer and update network parameters

If the agent is stationary, make charge and


Initialize environment Obtain initial state of the Obtain Spatial transfer decisions
discharge decisions to obtain global rewards and the
and network parameters agent based on each agent's state
next state; otherwise, obtain them directly based on
No No spatial transfer decisions.
Check if the Check if the
maximum maximum time
iterations has period has been
Yes been reached Yes reached

s m
j , a mj , rjm , s mj 1 
Save the spatial transfer layer model Store state, action, reward, and next state into the replay buffer until full
Sample from the replay buffer and update network parameters

Execution phase
Stand
Complete
Obtain Charging/discharging decisions one step
scheduling
No Stand
Obtain initial state of the Obtain Spatial transfer decisions
agent

Fig. 4. Training and execution flowchart.


stage., agents gradually find the optimal strategy in the process
30
of updating the network parameters, and the reward reaches a
stable state and finally stabilizes at a relatively fixed level.
29
In the comparison of the continuous action space algorithm
training process, the MATD3 algorithm used in this paper
27 26 28
partially overlaps with the MADDPG algorithm in the learning
23 24 25
stage. In the end, the reward of the MATD3 algorithm
22 21
16 17
19
20
converges to a higher level. This is because the MATD3
14
15 18 algorithm introduces an additional Critic network on the basis
12
of the MADDPG algorithm and delays the update of the Actor
13
10 9 11 network, thereby ensuring the stability of the training process
8 7 and enabling the agents to learn better strategies in the
6 5 interaction with the environment. The reward of the MASAC
4 2 algorithm is improved compared to the MADDPG algorithm,
1
3 but the convergence speed is slower than that of the MATD3
algorithm and the MADDPG algorithm. This is because the
MASAC algorithm is more sensitive to the dynamics of the
Fig. 5. Locations of 30 CSs. environment during training. When the environment changes or
reward gradually improves. The third stage is the convergence is highly unstable, the algorithm requires a longer training time

horized licensed use limited to: Ballari Institute of Technology & Management (formerly Bellary Eng College). Downloaded on March 17,2025 at 09:04:18 UTC from IEEE Xplore. Restrictions app
© 2025 IEEE. All rights reserved, including rights for text and data mining and training of artificial intelligence and similar technologies. Personal use is permitted,
but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Transactions on Transportation Electrification. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TTE.2025.3549747

10

to ensure performance. which also proves the validity of the algorithm. In terms of
In the comparison of the training process of discrete action solving speed, the model-based algorithm takes 23 523.23 s,
space algorithms, the reward curve of the MAPPO algorithm while the algorithm proposed in this paper only takes 0.1 s. The
used in this paper converged to a higher level and showed a comparison results show that, although the model-based
faster convergence speed. In the convergence stage, the reward algorithm yields slightly better results than the DRL-based
curve fluctuations of the MADQN algorithm and the MA- algorithm, it faces two bottlenecks in practical applications.
rainbow algorithm were significantly greater than those of the Firstly, the model-based algorithm requires a long time to
MAPPO algorithm. In particular, the MADQN algorithm obtain dispatch results, while the DRL-based algorithm can
showed large fluctuations around 2700 and 3200 episodes. This obtain them in just 0.1 s. This is because the DRL-based
is because the state dimension of the studied environment is algorithm completes a lengthy training phase in advance,
high, and the Q-Learning-based method needs to store and allowing it to directly output results during the execution phase.
update a large number of Q values, which increases the However, the model-based algorithm needs to solve the
computational pressure of the MADQN and MA-rainbow scheduling strategy for an entire day from scratch. Second,
algorithms, resulting in unstable training process. In contrast, model-based optimization algorithms often require information
the gradient calculation of the loss function of the MAPPO for the entire day, whereas in practical scenarios, data such as
algorithm used in this paper is relatively simple and efficient, electricity prices and traffic information are often difficult to
which is conducive to the training and convergence of the obtain. In contrast, DRL-based optimization algorithms only
algorithm, and obtains better training results. require current information and forecasts for the upcoming
Therefore, the MATD3-MAPPO framework with higher periods. Therefore, in this case, the DRL-based algorithm is
reward and faster convergence speed is a reasonable choice to more suitable.
study this problem.
3000 C. Analysis of dispatch results
2500
In order to cope with the complex changes in the real
environment, three EV fleets are numbered EVF1, EVF2 and
2000 EVF3 respectively, and different starting positions are selected for
1500 them in the execution phase. The starting position of EVF1 is
CS16, which is convenient for all-round routing coverage; The
Reward

1000
starting position of EVF2 is CS30, which is far away from other
500 CSs and is suitable for processing edge tasks and supporting
routing dispatch needs in surrounding areas; The starting position
0
of EVF3 is CS1 which is remote but has slightly more nearby CSs
MADDPG(Average) MADDPG(True with smooth)
-500 MASAC(Average) MASAC(True with smooth) than CS30, so it is suitable for handling mission requirements in
MATD3(Average) MATD3(True with smooth)
-1000
the southern region.
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 1) Dispatch results analysis for one day (Only consider the
Episode
electricity market)
Fig. 6. Training process of continuous action algorithms. To facilitate comparative analysis, this section conducted the
3000
dispatch of EVs within a typical day considering only the
2000 electricity market scenario. The charging/discharging conditions
of EVs and the LMP of corresponding CSs are shown in Fig. 8. In
1000 this scenario, EVs' spatiotemporal decisions will comprehensively
consider battery degradation costs and spatial transfer costs. Upon
Reward

0
observing profitable opportunities at other CSs, they will move to
-1000
discharge at CSs with higher LMP or charge at CSs with lower
LMP to maximize arbitrage benefit, and return to the starting CS
-2000 by the end of the last time step. EVs' charging decisions are mainly
MADQN(Average)
MA-Rainbow(Average)
MADQN(True with smooth)
MA-Rainbow(True with smooth)
concentrated in the time steps 25-55 and 75-85. Specifically, EVs
-3000 MAPPO(Average) MAPPO(True with smooth) always choose to charge at CS14 during the 75th to 85th time steps.
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 The reason is that the LMP at CS14 is relatively low, allowing the
Episode arbitrage benefit from moving to CS14 to cover the spatial transfer
Fig. 7. Training process of discrete action algorithms. costs. EVs' discharging decisions are mainly concentrated at the
2) Comparisons with model-based algorithm time steps 5-15 and 60-70. Referring to the corresponding LMP
To verify the validity of the proposed method, a comparison curve, it can be seen that the LMP is relatively high during these
was made between the algorithm proposed in this paper and periods. After weighing the spatial transfer costs and arbitrage
model-based algorithms. For the model-based method, the benefit, EVs will choose to make discharging decisions. In
optimized dispatch of electric vehicles is cast as MILP problems. addition, to ensure the normal operation of the next dispatch
In the electricity-carbon joint market, the method proposed period, EVs may make charging and discharging decisions at the
in this paper has a one-day benefits of $2 048.5, while the end of the period to ensure that the SOC returns to the initial level
model-based optimization algorithm has a one-day benefits of by the end of the dispatch period. In some periods, EVs neither
$2 125.5, which is 3.7% higher than the method proposed in make spatial transfer decisions nor charging/discharging
this paper. The difference between the two is not significant, decisions. The reason is that the designed model comprehensively

horized licensed use limited to: Ballari Institute of Technology & Management (formerly Bellary Eng College). Downloaded on March 17,2025 at 09:04:18 UTC from IEEE Xplore. Restrictions app
© 2025 IEEE. All rights reserved, including rights for text and data mining and training of artificial intelligence and similar technologies. Personal use is permitted,
but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Transactions on Transportation Electrification. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TTE.2025.3549747

11

considers the battery degradation costs and spatial transfer costs and make more frequent charging and discharging decisions to
of EVs, leading to the neglect of arbitrage benefit when LMP increase carbon trading revenue. On the other hand, EVs can
changes are minor, thus ensuring the health of the battery. also accurately sense LME changes and make prudent decisions
2) Dispatch results analysis for one day (Electricity-carbon to avoid charging during high LME periods, thereby reducing
joint market) carbon emissions generated during operation and ensuring
To analyze the impact of the carbon market on the economic benefits.
spatiotemporal allocation of EVs, this section conducted the 3) Dispatch results analysis for one day under fluctuating
dispatch of EVs within a typical day considering the electricity- price (Electricity-carbon joint market)
carbon joint market scenario. The charging/discharging of EVs To verify the stability of the proposed MATD3-MAPPO
and the LMP of corresponding CSs are shown in Fig. 9, and the method when electricity price fluctuate, this section selected a
LME of corresponding CSs are shown in Fig.10. In this typical day with large LMP fluctuations to analyze the dispatch
scenario, the carbon trading benefits of EVs are jointly affected results under the electricity price fluctuation scenario. The
by the charging and discharging decisions and the LME of charging/discharging of EVs and the LMP of corresponding
corresponding CSs. EVs are more inclined to make charging CSs are shown in Fig. 11. When electricity prices fluctuate
and discharging decisions to obtain carbon emission quotas, greatly, EVs locations change frequently. With sufficient
thereby obtaining more carbon trading benefits. The EVs arbitrage benefit under the high LMP difference to cover the
charged and discharged a total of 13 260kWh, an increase of spatial transfer costs of EVs, EVs tend to move to CSs with
24.7% compared to the scenario that only the electricity market higher LMP to discharge or CSs with lower LMP to charge in
is considered. In particular, the charging behavior of EVs is order to obtain higher benefits. At the same time, EVs tend to
mostly concentrated at the time steps 45-55. Combined with the make charging and discharging decisions more frequently.
LME curve in Fig. 10, the LME level of the relevant CSs during EVF1, EVF2, and EVF3 making charging and discharging
these time steps is low, and the carbon emissions generated by decisions 39, 36, and 39 times, respectively, with a total
EVs charging behavior are less, so higher carbon trading profits charging and discharging capacity of 16 560kWh, an increase
can be obtained. The results in the electricity-carbon joint of 55.7% compared to scenario b). Therefore, the results prove
market show that the proposed method is effective for the that the designed method can still accurately perceive the
spatiotemporal dispatch of EVs. On the one hand, EVs can environment when electricity prices fluctuate, guiding EVs to
accurately measure the relationship between benefits and costs make reasonable decisions.
LMP @ CS1 LMP @ CS4 LMP @ CS15 LMP @ CS16 LMP @ CS30

Power @ CS1 Power @ CS4 Power @ CS15 Power @ CS16 Power @ CS30
600 150

100
400

Price($/MWh)
50
200
Power(kW)

0
0
-50
-200
-100
-400
-150

-600 -200
0 10 20 30 40 50 60 70 80 90 0 10 20 30 40 50 60 70 80 90 0 10 20 30 40 50 60 70 80 90
Time(15min) Time(15min) Time(15min)

a)EVF1 b)EVF2 c)EVF3


Fig. 8. Dispatch results of EVs in the electricity market.
LMP @ CS1 LMP @ CS14 LMP @ CS15 LMP @ CS16 LMP @ CS23 LMP @ CS29 LMP @ CS30

Power @ CS1 Power @ CS14 Power @ CS15 Power @ CS16 Power @ CS23 Power @ CS29 Power @ CS30
600 150

400 100
Price($/MWh)

50
200
Power(kW)

0
0
-50
-200
-100
-400
-150

-600 -200
0 10 20 30 40 50 60 70 80 90 0 10 20 30 40 50 60 70 80 90 0 10 20 30 40 50 60 70 80 90
Time(15min) Time(15min) Time(15min)
a)EVF1 b)EVF2 c)EVF3
Fig. 9. Dispatch results of EVs in the electricity-carbon joint market.

horized licensed use limited to: Ballari Institute of Technology & Management (formerly Bellary Eng College). Downloaded on March 17,2025 at 09:04:18 UTC from IEEE Xplore. Restrictions app
© 2025 IEEE. All rights reserved, including rights for text and data mining and training of artificial intelligence and similar technologies. Personal use is permitted,
but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Transactions on Transportation Electrification. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TTE.2025.3549747

12

1 results show that the designed method can still guide EVs to
0.9 make reasonable decisions when the traffic network model
changes.
0.8
D. Analysis of dispatch results in one year
0.7
To analyze the scalability and practicability of the proposed
CO2(kg/kWh)

0.6 method for V2G in the electricity-carbon joint market, the


0.5 dispatch period is set to one year in this section. The EVs
CS 1 charged a total of 2,401,355.2kWh and discharged 2,067,360.9
0.4
CS 14 kWh. The charge and discharge energy of EVs at each CS in
0.3 CS 15 one year are shown in Fig. 13 and Fig. 14, respectively. The
CS 16
0.2 CS 23 results show that EVs tend to charge at CS 14 because the LMP
CS 29 of CS 14 is always low, allowing EVs to charge at a very low
0.1 CS 30 price, thereby reducing the cost of charging behavior. EVs tend
0 to discharge at CSs 15 and 30 because their LMP prices are
10 20 30 40 50 60 70 80 90 higher, which enables EVs to obtain higher arbitrage benefits.
Time(15min) In addition, due to the different initial CSs selected, EVF1,
Fig. 10. LME of corresponding CSs. EVF2, and EVF3 have more power interactions at their
4) Dispatch results analysis for one day with jam respective initial CSs.
(Electricity-carbon joint market) In the electricity-carbon joint market, aggregator of EVs has
To verify the stability of the MATD3-MAPPO method in earned considerable benefit after one year of operation.
guiding spatial transfer decisions, this section analyzed the EVs Specifically, the benefit from the carbon market is $607,115.2,
dispatch results under traffic congestion scenarios. In this the benefit from the electricity market is $305,153.1, and the
scenario, the spatial transfer cost increases to three times than final total benefit of EVs is $718,592.1, with an average daily
that of normal. Fig. 12 shows the charging and discharging revenue of $1,968.7. Refer to the average price ($55,353) of
power of EVs and the CSs where they are located. The spatial electric vehicles in the United States published by Kelley Blue
transfer of EVF1 is CS16→CS30→CS14→CS16, the spatial Book, under the dispatch method proposed in this paper, the
transfer of EVF2 is CS30 → CS14 → CS30, and the spatial cost can be recovered in about 2.31 years. If policy subsidies
transfer of EVF3 is CS1→CS15→CS14→CS1. Combined with for the purchase of EVs are taken into account, the payback
the geographic location information in Fig. 5, compared with period may be even shorter. In addition, the price of EVs may
the scenario without jam, it can be seen that in the scenario with further decline in the future with technological advancement
the spatial transfer cost increasing, EVs must more carefully and expansion of production scale, which will further shorten
weigh the relationship between spatial transfer cost and the cost recovery period. Therefore, the method proposed in this
arbitrage benefit. In this scenario, EVs tend to choose shorter paper can effectively offer accurate guidance to the EVs to run
routes and charge and discharge at CSs that are closer to each in an economic operation way in the electricity-carbon joint
other to minimize the impact of transfer costs. Therefore, the market and has promising application prospects in the future.
LMP @ CS1 LMP @ CS16 LMP @ CS18 LMP @ CS19 LMP @ CS28 LMP @ CS30

Power @ CS1 Power @ CS16 Power @ CS18 Power @ CS19 Power @ CS28 Power @ CS30
600 1500

400
1000
200 Price($/MWh)
Power(kW)

0 500

-200
0
-400

-600 -500
0 10 20 30 40 50 60 70 80 90 0 10 20 30 40 50 60 70 80 90 0 10 20 30 40 50 60 70 80 90
Time(15min) Time(15min) Time(15min)
a)EVF1 b)EVF2 c)EVF3
Fig. 11. Dispatch results of EVs under fluctuating price.
volumes and daily revenues under these three scenarios are
E. Sensitivity analysis
shown in Table II. As observed from Table II, EVs'
To study the impact of carbon trading benefits on the charging/discharging power and daily benefits are positively
spatiotemporal dispatch of EVs, this section conducts a correlated with  . This is because as  decreases, the
sensitivity analysis based on different carbon quota price levels, benefits from EVs participating in carbon market trading also
considering the changes in EV charging/discharging behaviors decreases, leading to a reduced willingness to charge and
as well as carbon emissions. discharge. Although less charging and discharging behavior
Referring to the carbon allowance prices from CEA, the reduces the cost of battery capacity degradation, both the
annual carbon allowance price ranges from 0.01 to 0.014 $/kg, arbitrage revenue from charging/discharging and the carbon
with an average value of 0.012 $/kg. Therefore, 0.01, 0.012, and trading revenue decrease, resulting in a decline in daily revenue.
0.014 are selected for analysis. The EVs charging/discharging

horized licensed use limited to: Ballari Institute of Technology & Management (formerly Bellary Eng College). Downloaded on March 17,2025 at 09:04:18 UTC from IEEE Xplore. Restrictions app
© 2025 IEEE. All rights reserved, including rights for text and data mining and training of artificial intelligence and similar technologies. Personal use is permitted,
but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Transactions on Transportation Electrification. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TTE.2025.3549747

13

Therefore, it can be seen that the modeling of carbon market reasonable decisions, and the total amount of charge and
trading in this paper is relatively reasonable. discharge increases by 20.2% in electricity-carbon joint market.
TABLE II When traffic congestion occurs, EVs will choose the stations
SENSITIVITY ANALYSIS OF CARBON ALLOWANCE PRICES adjacent to the initial CS to save the spatial transfer cost.
Charging/discharging Benefits / ($) 3) The long-term dispatch results show that the dispatch
 /($/kgCO2) method of EVs under the electricity-carbon joint market
power /(kWh)
proposed in this paper can achieve both scalability and
0.014 13 260 2 048.5 feasibility. The dispatch method proposed in this paper can
0.012 13 010 1 705.3 guide EVs to obtain an average daily income of approximately
0.010 12 440 1 294.6 $1,968.7, which enables EVs dispatch to recover costs in 2.31
years or even shorter.
In the future work, we plan to adopt more precise modeling
Discharing power Charing power Location of EVF(With jam) Location of EVF(Without jam) of carbon emissions from EVs. In this paper, we use LMP to
30
estimate LME to quantify carbon emissions during EV charging
20
behavior, which is just a preliminary idea. Additionally, to

CS
10 enhance the economic benefits of EVs, we will analyze a larger
EVF3
0 number of EVs to explore large-scale EV dispatch strategies.

Acknowledgment
This work is supported by National Natural Science
EVF2 Foundation of China (12171145).
600

REFERENCES
Power(kW)

400

[1] S. Englberger, K. A. Gamra, B. Tepe, M. Schreiber, A. Jossen and Holger


200
Hesse, "Electric vehicle multi-use: Optimizing multiple value streams
EVF1
0 using mobile storage systems in a vehicle-to-grid context," Applied Energy,
0 10 20 30 40 50 60 70 80 90
vol. 304, pp. 117862, 2021, doi: 10.1016/j.apenergy.2021.117862.
Fig. 12. Dispatch results of EVs with jam. [2] R. Lauvergne, Y. Perez, M. Françon and A. T. D. L. Cruz, " Integration of
105

electric vehicles into transmission grids: A case study on generation


Energy(kWh)

EVF1 3

2
adequacy in Europe in 2040, " Applied Energy, vol. 326, pp. 120030, 2022,
EVF2

1
doi: 10.1016/j.apenergy.2022.120030.
EVF3
[3] H. Wang, T. Zheng, W. Sun and M. Q. Khan, " Research on the pricing
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Station
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
strategy of park electric vehicle agent considering carbon trading," Applied
Energy, vol. 340, pp. 121017, 2023, doi: 10.1016/j.apenergy.2023.121017.
Fig. 13. Charge energy of EVs at each CS in one year. [4] Global EV Outlook 2024. International Energy Agency.
105

EVF1 3 https://www.iea.org/reports/global-ev-outlook-2024, 2024(accessed 15


Energy(kWh)

EVF2
2 August 2024).
1 [5] K. M. Tan, V. K. Ramachandaramurthy, J. Y. Yong, " Integration of
EVF3
electric vehicles in smart grid: A review on vehicle to grid technologies and
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Station
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
optimization techniques," Renewable and Sustainable Energy Reviews, vol.
Fig. 14. Discharge energy of EVs at each CS in one year. 53, pp. 720-732, 2016, doi: 10.1016/j.rser.2015.09.012.
[6] K. Guo, "Research on location selection model of distribution network with
constrained line constraints based on genetic algorithm," Neural
V. CONCLUSION Computing and Applications, vol. 32, pp. 1679–1689, 2020, doi:
10.1007/s00521-019-04257-y.
To realize the low-carbon-economic scheduling of EVs in [7] X. Li, Z. Wang, L. Zhang, F. Sun, D. Cui, C. Hecht, J. Figgener and D. U.
the electricity-carbon joint market, a spatiotemporal dispatch Sauer, "Electric vehicle behavior modeling and applications in vehicle-grid
method for EV fleets in the PTN based on MATD3-MAPPO is integration: An overview," Energy, vol. 268, pp. 126647, 2023, doi:
proposed in this paper, which provides practical strategy for 10.1016/j.energy.2023.126647.
[8] K. Li and L. Wang, "Optimal electric vehicle subsidy and pricing decisions
EVs in the PTN. The main conclusions are as follows: with consideration of EV anxiety and EV preference in green and non-
1) Algorithm performance results show that the MATD3 green consumers," Transportation Research Part E: Logistics and
and MAPPO algorithms used in this paper exhibit faster Transportation Review, vol. 170, pp. 103010, 2023, doi:
convergence speed and higher rewards. The high adaptability to 10.1016/j.tre.2022.103010.
[9] S. Hemavathi and A. Shinisha, "A study on trends and developments in
the modeling environment provides an effective solution to the electric vehicle charging technologies," Journal of Energy Storage, vol. 52,
spatiotemporal dispatch problem of the EVs under electricity- pp. 105013, 2022, doi: 10.1016/j.est.2022.105013.
carbon joint market. [10]R. Fachrizal, K. Qian, O. Lindberg, M. Shepero, R. Adam, J. Widén and J.
2) The dispatch results within a typical day show that EVs Munkhammar, "Urban-scale energy matching optimization with smart EV
charging and V2G in a net-zero energy city powered by wind and solar
accurately measure the benefit and cost, and make reasonable energy," eTransportation, vol. 20, pp. 100314, 2024, doi:
decisions under the scenarios of market structure changes, 10.1016/j.etran.2024.100314.
electricity price fluctuations and traffic congestion. Compared [11]Julia K. Szinai, Colin J.R. Sheppard, Nikit Abhyankar, Anand R. Gopal,
with the scenario of only considering the electricity market, the "Reduced grid operating costs and renewable energy curtailment with
electric vehicle charge management," Energy Policy, vol. 146, pp. 111051,
total amount of charge and discharge of EVs increases by 2020, doi: 10.1016/j.enpol.2019.111051.
24.7%. As the electricity price fluctuates greatly, EVs [12]W. Gan, J. Wen, M. Yan, Y. Zhou and W. Yao, "Enhancing Resilience
accurately perceive the electricity price information and make With Electric Vehicles Charging Redispatching and Vehicle-to-Grid in

horized licensed use limited to: Ballari Institute of Technology & Management (formerly Bellary Eng College). Downloaded on March 17,2025 at 09:04:18 UTC from IEEE Xplore. Restrictions app
© 2025 IEEE. All rights reserved, including rights for text and data mining and training of artificial intelligence and similar technologies. Personal use is permitted,
but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Transactions on Transportation Electrification. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TTE.2025.3549747

14

Traffic-Electric Networks," IEEE Transactions on Industry Applications, Reinforcement Learning," in IEEE Transactions on Smart Grid, vol. 12, no.
vol. 60, no. 1, pp. 953-965, 2024, doi: 10.1109/TIA.2023.3272870. 2, pp. 1380-1393, 2021, doi: 10.1109/TSG.2020.3025082.
[13]A. Ravi, L. Bai and H. Wang, "Optimal Siting of EV Fleet Charging Station [30]D. Qiu, Y. Wang, T. Zhang, M. Sun and G. Strbac, " Hierarchical multi-
Considering EV Mobility and Microgrid Formation for Enhanced Grid agent reinforcement learning for repair crews dispatch control towards
Resilience," Applied Sciences, vol. 13, no. 22, pp. 12181, 2023, doi: multi-energy microgrid resilience," Applied Energy, vol. 336, pp. 120826,
10.3390/app132212181. 2023, doi: 10.1016/j.apenergy.2023.120826.
[14]A.G. Olabi, T. Wilberforce, E. T. Sayed, A. G. Abo-Khalil, H. M. [31]B. Xu, A. Oudalov, A. Ulbig, G. Andersson and D. S. Kirschen, "Modeling
Maghrabie, K. Elsaid and M. A. Abdelkareem, " Battery energy storage of Lithium-Ion Battery Degradation for Cell Life Assessment," in IEEE
systems and SWOT (strengths, weakness, opportunities, and threats) Transactions on Smart Grid, vol. 9, no. 2, pp. 1131-1140, 2018, doi:
analysis of batteries in power transmission," Applied Sciences, vol. 254, pp. 10.1109/TSG.2016.2578950.
123987, 2022, doi: 10.1016/j.energy.2022.123987. [32]T. Chen, X. Xu, H. Wang and Z. Yan, "Routing and Scheduling of Mobile
[15]J. Y. Yong, V. K. Ramachandaramurthy, K. M. Tan and J. Selvaraj, Energy Storage System for Electricity Arbitrage Based on Two-Layer Deep
"Experimental Validation of a Three-Phase Off-Board Electric Vehicle Reinforcement Learning," in IEEE Transactions on Transportation
Charger With New Power Grid Voltage Control," IEEE Transactions on Electrification, vol. 9, no. 1, pp. 1087-1102, 2023, doi:
Smart Grid, vol. 9, no. 4, pp. 2703-2713, 2018, doi: 10.1109/TTE.2022.3201164.
10.1109/TSG.2016.2617400. [33]S. Gronauer and K. Diepold, " Multi-agent deep reinforcement learning: a
[16]Y. Tao, J. Qiu, S. Lai, X. Sun, H. Liu and J. Zhao, "Distributed Electric survey," Artificial Intelligence Review, vol. 55, pp. 895–943, 2022, doi:
Vehicle Assignment and Charging Navigation in Cyber-Physical Systems," 10.1007/s10462-021-09996-w.
IEEE Transactions on Smart Grid, vol. 15, no. 2, pp. 1861-1875, 2024, doi: [34] LMP data: http://oasis.caiso.com/mrioasis/logon.do.
10.1109/TSG.2023.3293251. [35] Traffic information data: http://developers.google.com/maps.
[17]B. Sun, Z. Huang, X. Tan and D. H. K. Tsang, "Optimal Scheduling for
Electric Vehicle Charging With Discrete Charging Levels in Distribution
Grid," in IEEE Transactions on Smart Grid, vol. 9, no. 2, pp. 624-634, 2018,
Appendix
doi: 10.1109/TSG.2016.2558585. A. Parameter of estimation method for LME based on LMP
[18]H. Wang, Y. Jia, M. Shi, C. S. Lai and K. Li, "A Mutually Beneficial
Operation Framework for Virtual Power Plants and Electric Vehicle Table A1 Parameter of estimation method for LME
Charging Stations," in IEEE Transactions on Smart Grid, vol. 14, no. 6, pp.
4634-4648, 2023, doi: 10.1109/TSG.2023.3273856. Parameters Coal Natural gas Oil
[19]L. Ran, J. Qin, Y. Wan, W. Fu, W. Yu and F. Xiao, "Fast Charging 𝜇 21.86 70.69 112.39
Navigation Strategy of EVs in Power-Transportation Networks: A Coupled
Network Weighted Pricing Perspective," in IEEE Transactions on Smart 𝜎 6.56 21.21 33.72
Grid, vol. 15, no. 4, pp. 3864-3875, 2024, doi: 10.1109/TSG.2024.3354300. Carbon emission
[20]Y. He, Z. Liu, Z. Song, " Joint optimization of electric bus charging 0.517 0.361 0.867
infrastructure, vehicle scheduling, and charging management," factor/(kgCO2/kWh)
Transportation Research Part D: Transport and Environment, vol. 117, pp.
103653, 2023, doi: 10.1016/j.trd.2023.103653. B. Basic Parameter Setting
[21]J. Yang, T. Wiedmann, F. Luo, G. Yan, F. Wen and G. H. Broadbent, "A
Fully Decentralized Hierarchical Transactive Energy Framework for Table B1 Basic Parameter Setting
Charging EVs With Local DERs in Power Distribution Systems," in IEEE Parameters Value Parameters Value
Transactions on Transportation Electrification, vol. 8, no. 3, pp. 3041-3055,
2022, doi: 10.1109/TTE.2022.3168979. 𝑃 /kW 60 𝐸 /kWh 150
[22]S. Gao, R. Dai, W. Cao and Y. Che, "Combined Provision of Economic
𝛼 0.575 𝛽 121
Dispatch and Frequency Regulation by Aggregated EVs Considering
Electricity Market Interaction," in IEEE Transactions on Transportation 𝑆𝑂𝐶 0.9 𝑆𝑂𝐶 0.1
Electrification, vol. 9, no. 1, pp. 1723-1735, 2023, doi:
10.1109/TTE.2022.3195567.
𝛼 0.15 𝛽 4
[23]J. Zhang, L. Che, X. Wan and M. Shahidehpour, "Distributed Hierarchical  /($/kgCO2) 0.014 𝐿 /(km/kWh) 6.3
Coordination of Networked Charging Stations Based on Peer-to-Peer
Trading and EV Charging Flexibility Quantification," in IEEE
𝐸 /(kgCO2/km) 1.6 𝑆𝑂𝐶 0.8
Transactions on Power Systems, vol. 37, no. 4, pp. 2961-2975, 2022, doi:
10.1109/TPWRS.2021.3123351.
[24]G. Liu, Y. Tao, Z. Ge, J. Qiu, F. Wen and S. Lai, "Data-Driven Carbon
Footprint Management of Electric Vehicles and Emission Abatement in
Electricity Networks," IEEE Transactions on Sustainable Energy, vol. 15,
no. 1, pp. 95-108, 2024, doi: 10.1109/TSTE.2023.3274813.
[25]T. Wu, Z. Li, G. Wang, X. Zhang and J. Qiu, "Low-Carbon Charging
Facilities Planning for Electric Vehicles Based on a Novel Travel Route
Choice Model," IEEE Transactions on Intelligent Transportation Systems,
vol. 24, no. 6, pp. 5908-5922, 2023, doi: 10.1109/TITS.2023.3248087.
[26]H. Jahangir, S. S. Gougheri, B. Vatandoust, M. A. Golkar, A. Ahmadian
and A. Hajizadeh, "Plug-in Electric Vehicle Behavior Modeling in Energy
Market: A Novel Deep Learning-Based Approach With Clustering
Technique," in IEEE Transactions on Smart Grid, vol. 11, no. 6, pp. 4738-
4748, 2020, doi: 10.1109/TSG.2020.2998072.
[27]M. Zhang, H. Yang, Y. Xu and H. Sun, "Learning-Based Real-Time
Aggregate Flexibility Provision and Scheduling of Electric Vehicles,"
IEEE Transactions on Smart Grid, vol. 15, no. 6, pp. 5840-5852, 2024, doi:
10.1109/TSG.2024.3400968.
[28]F. Tuchnitz, N. Ebell, J. Schlund and M. Pruckner, " Development and
Evaluation of a Smart Charging Strategy for an Electric Vehicle Fleet
Based on Reinforcement Learning," Applied Energy, vol. 285, pp. 116382,
2023, doi: 10.1016/j.apenergy.2020.116382.
[29]Y. Liang, Z. Ding, T. Ding and W. -J. Lee, "Mobility-Aware Charging
Scheduling for Shared On-Demand Electric Vehicle Fleet Using Deep

horized licensed use limited to: Ballari Institute of Technology & Management (formerly Bellary Eng College). Downloaded on March 17,2025 at 09:04:18 UTC from IEEE Xplore. Restrictions app
© 2025 IEEE. All rights reserved, including rights for text and data mining and training of artificial intelligence and similar technologies. Personal use is permitted,
but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.

You might also like