Context Aware DRL
Context Aware DRL
ABSTRACT The widespread adoption of electric vehicles (EVs) has introduced new challenges for
stakeholders ranging from grid operators to EV owners. A critical challenge is to develop an effective
and economical strategy for managing EV charging while considering the diverse objectives of all
involved parties. In this study, we propose a context-aware EV smart charging system that leverages deep
reinforcement learning (DRL) to accommodate the unique requirements and goals of participants. Our
DRL-based approach dynamically adapts to changing contextual factors such as time of day, location, and
weather to optimize charging decisions in real time. By striking a balance between charging cost, grid load
reduction, fleet operator preferences, and charging station energy efficiency, the system offers EV owners
a seamless and cost-efficient charging experience. Through simulations, we evaluate the efficiency of our
proposed Deep Q-Network (DQN) system by comparing it with other distinct DRL methods: Proximal Policy
Optimization (PPO), synchronous Advantage Actor-Critic (A3C), and Deep Deterministic Policy Gradient
(DDPG). Notably, our proposed methodology, DQN, demonstrated superior computational performance
compared to the others. Our results reveal that the proposed system achieves a remarkable, approximately
18% enhancement in energy efficiency compared to traditional methods. Moreover, it demonstrates about a
12% increase in cost-effectiveness for EV owners, effectively reducing grid strain by 20% and curbing CO2
emissions by 10% due to the utilization of natural energy sources. The system’s success lies in its ability
to facilitate sequential decision-making, decipher intricate data patterns, and adapt to dynamic contexts.
Consequently, the proposed system not only meets the efficiency and optimization requirements of fleet
operators and charging station maintainers but also exemplifies a promising stride toward sustainable and
balanced EV charging management.
INDEX TERMS Electric vehicles, smart charging, deep reinforcement learning, context-awareness, energy
efficiency, cost-effectiveness, grid strain reduction, CO2 emissions reduction.
systems capable of achieving optimal trade-offs across The results of our research reveal that the proposed system
various objectives, such as minimizing the strain on the grid achieves an impressive, approximately 18% improvement
and reducing environmental impact, which is a formidable in energy efficiency compared to conventional approaches.
task, as highlighted by [4], [5], [6], [7], [8], [9], and [10]. Furthermore, it demonstrates a substantial 12% increase in
The prevalent adoption of electric vehicles (EVs) represents cost-effectiveness for EV owners while also reducing grid
a significant trend in the transportation sector, fueled by strain by 20% and curbing CO2 emissions by 10% through
concerns surrounding energy security, climate change, and air the utilization of natural energy sources. At the core of
pollution [11]. In response to these challenges, researchers this system’s success lies its ability to facilitate sequential
have proposed a multitude of solutions. These include decision-making, decipher intricate data patterns, and adapt
the implementation of time-of-use pricing schemes [12], to dynamic contexts. Our work represents a significant
[13], [14], [15], dynamic load management [16], [17], step forward in addressing the multifaceted challenges of
and the application of intelligent charging algorithms, EV charging management. By embracing the principles
such as the stochastic game approach [18], vehicle-to-grid of deep reinforcement learning and context-awareness, our
(V2G) optimization [19], Pareto optimal solutions in multi- proposed system not only aligns with the efficiency and
objective optimization [20], real-time energy management optimization requirements of fleet operators and charging
systems [21], and blockchain-based charging systems [22], station maintainers but also exemplifies a promising stride
[23], [24]. However, the integration of EVs into the electricity towards a sustainable and balanced future for EV charging
grid poses additional challenges for grid operators, fleet management. In the following sections, we delve into the
operators, charging station operators, and EV owners. The intricate details of our approach and present the empirical
primary issue revolves around striking a balance between evidence supporting its effectiveness and potential for
various objectives, such as reducing EV charging costs, widespread adoption.
alleviating the load on the power grid, optimizing fleet
management, and enhancing energy efficiency at charging II. LITERATURE REVIEW
stations [25], [26]. The primary objective of this literature review section is
One critical challenge in implementing the proposed to provide a comprehensive overview of existing research
context-aware EV smart charging system is the dynamic and practices in electric vehicle (EV) charging management.
adaptation to rapidly changing contextual factors. Ensuring By examining the limitations and shortcomings of traditional
the system’s ability to accurately assess and respond to charging strategies, reviewing relevant literature on deep rein-
real-time variations in time of day, location, and weather forcement learning (DRL), and exploring various approaches
is crucial for optimal charging decisions. Moreover, main- such as renewable energy integration, grid demand man-
taining the balance between different objectives, such as agement, and charging station services, this review aims to
cost-effectiveness for EV owners and grid load reduc- establish the foundation for the proposed research objective.
tion, while considering fleet operator preferences and The proposed research seeks to bridge the identified gaps
station energy efficiency, requires a robust and adaptable in the literature by developing a context-aware EV smart
algorithm within the deep reinforcement learning framework. charging system based on DRL. This system will optimize
To address this pressing challenge, our research endeavors charging decisions in real-time while accommodating the
to introduce a novel paradigm: ‘‘Smart EV Charging with diverse objectives of multiple stakeholders and dynamically
Context-Awareness: Enhancing Resource Utilization via changing contexts. Through this review, we position the
Deep Reinforcement Learning.’’ In this paradigm, we pro- proposed system as a novel and holistic solution to the
pose the development of a context-aware EV smart charging challenges presented in the existing literature on EV charging
system that leverages the power of deep reinforcement management.
learning (DRL) to revolutionize the way we manage EV
charging. By dynamically adapting to a multitude of
contextual factors, such as the time of day, geographical A. EXISTING STRATEGIES FOR EV CHARGING
location, and weather conditions, our approach empowers MANAGEMENT
EVs and charging infrastructure to make real-time, data- Electric vehicle (EV) adoption has introduced novel chal-
driven decisions. This context-aware system is designed lenges for efficient charging management. Traditional
to strike an optimal balance between various key consid- strategies, often based on fixed schedules, are commonly
erations. It addresses the need for cost-efficient charging employed. These strategies, while straightforward, have
experiences for EV owners, the reduction of grid load to significant limitations. They overlook dynamic contextual
ensure its stability, the preferences and objectives of fleet factors that influence the cost-effectiveness and environmen-
operators, and the enhancement of charging station energy tal impact of charging. Fixed schedules fail to adapt to
efficiency. Through meticulous simulations and rigorous real-time fluctuations in electricity prices, grid demand, and
evaluation, we aim to showcase the remarkable advan- renewable energy availability. Consequently, they may lead
tages our proposed system offers over existing, traditional to suboptimal charging practices, with adverse consequences
methods. for both grid operators and EV owners.
B. GRID DEMAND MANAGEMENT payment systems, detailed in [41], now offer versatile
Efficient grid demand management is crucial for balancing options such as pay-per-use, subscriptions, and interoperable
electricity supply and demand while ensuring grid stability. platforms, enhancing user convenience. Load Management:
Strategies like demand response, where consumers adjust Grid-friendly charging strategies, explored in [42], address
usage during peak periods, help alleviate grid strain. high-power EV charging impacts on the grid, ensuring
Advanced metering infrastructure (AMI) provides real-time stable and efficient energy distribution. Demand Response
energy data to enable demand response, while smart grids Integration: Charging stations, as outlined in [43], seam-
enhance monitoring and distribution management. Energy lessly integrate with demand response programs, optimizing
storage systems store and release energy, stabilizing the charging times for grid stability and reduced electricity costs.
grid and reducing the need for costly upgrades. Distributed Dynamic Pricing: Emerging dynamic pricing schemes,
energy resources (DERs), such as solar panels and wind as highlighted in [44], incentivize off-peak charging and
turbines, generate power closer to consumption points, alleviate congestion during peak hours. Fleet Charging
further lessening grid pressure. In summary, grid demand Solutions: Management systems for large EV fleets, dis-
management combines diverse technologies and strategies cussed in [45], optimize schedules and monitor vehicle
to boost reliability, reduce energy waste, and promote a health. Maintenance and Monitoring: Advanced monitor-
sustainable energy future (cf. [27], [28], [29], [30], [31]). ing and predictive maintenance, detailed in [46], proactively
ensure charging infrastructure reliability. Renewable Energy
C. RENEWABLE ENERGY INTEGRATION Integration: Charging stations incorporating renewable
sources, demonstrated in [47], reduce the carbon footprint of
Efficiently integrating renewable energy into the existing grid
EV charging. Regulatory Framework: Evolving regulatory
is a vital aspect of transitioning to a sustainable, low-carbon
energy system. This complex process involves incorporating frameworks, highlighted in [48], ensure safety standards,
sources like solar, wind, and hydropower to meet increasing interoperability, and equitable access. In conclusion, charging
energy demands while curbing greenhouse gas emissions. station services and management, driven by technology and
regulation, have evolved significantly, supporting widespread
Grid modernization, discussed in [32] and [33], stands out as
EV adoption and fostering a sustainable and accessible
a primary method, enhancing infrastructure and implement-
transportation ecosystem.
ing advanced systems to handle renewable variability. Energy
storage, detailed in [34] and [35], plays a key role by storing E. REVIEW OF DEEP REINFORCEMENT LEARNING (DRL)
excess energy for release when needed, providing stability. Deep reinforcement learning (DRL) algorithms, like Prox-
Demand-side management, as outlined in [36], optimizes imal Policy Optimization [49], Asynchronous Advantage
consumption patterns to align with renewable generation, Actor-Critic (A3C) [50], Deep Deterministic Policy Gradi-
reducing reliance on backup fossil fuel plants. Regional ent [51], and Deep Q-Network [52], have garnered notable
grid interconnection, explored in [37], enables resource attention for solving intricate tasks across diverse domains,
sharing, enhancing reliability. Smart inverters and microgrid including gaming, robotics, and resource optimization for
technologies, discussed in [38], improve handling of gener- electric vehicles. Specifically in the realm of electric vehicle
ation fluctuations. Finally, policy incentives and regulations, (EV) charging and resource optimality, DRL has proven
highlighted in [39], are crucial for fostering renewable energy promising. These algorithms, powered by neural networks,
deployment. In essence, a multifaceted approach combining exhibit excellence in discerning complex data patterns and
technology, grid enhancements, and supportive policies is adjusting behavior based on environmental cues. DRL’s capa-
essential for successful renewable energy integration and the bility to learn and optimize policies in dynamic environments
creation of a sustainable energy future. aligns seamlessly with the variable nature of EV charging,
making it a valuable tool for concurrently optimizing cost,
D. CHARGING STATION SERVICES AND MANAGEMENT grid strain, and environmental impact.
Charging station services and management have become Recent studies, such as [53], showcase the application
increasingly critical as the adoption of electric vehicles (EVs) of Deep Reinforcement Learning (DRL) to formulate algo-
continues to surge. This multifaceted domain encompasses rithms for optimizing charging schedules at stations. DRL
a range of services and technologies aimed at facilitating agents skillfully balance user demand, grid limitations, and
convenient and efficient EV charging while ensuring the dynamic pricing, efficiently allocating charging resources
sustainability of the charging infrastructure. Recent devel- to minimize grid stress and reduce costs for EV owners.
opments from 2020 onward have shed light on several key In the domain of load management, as demonstrated in [54],
aspects of charging station services and management: DRL is utilized to control charging station loads, aligning
Networked Charging Infrastructure: The rise of networked them with grid capacity to ensure stable operations and
charging stations, as discussed in [40], has simplified prevent overloads during peak times. Explored in [55], DRL-
the EV charging experience, allowing owners to locate based solutions facilitate effective participation in demand
and access points effortlessly through mobile apps and response programs, optimizing charging times based on grid
online platforms. Payment and Billing Solutions: Enhanced signals and alleviating strain. Intelligent charging strategies
tied to dynamic pricing, exemplified in [56], involve DRL stakeholders, including utilities, charging station operators,
agents learning to predict price fluctuations and adjusting and policymakers, is vital for sustainable development. The
charging patterns for optimal cost savings. Furthermore, dynamic context, especially concerning carbon neutrality
fleet charging management, depicted in [57], leverages objectives, requires real-time adaptability of DRL models to
DRL to optimize schedules for companies with electric evolving grid conditions, traffic patterns, and user preferences
vehicle fleets, considering operational needs and minimizing while minimizing environmental impact [63]. Collaborative
downtime. Demonstrated in [58], DRL models enhance learning strategies within a multi-stakeholder environment
the reliability of charging station infrastructure through should be contextually informed, engaging participants
predictive maintenance, where agents monitor components like EV owners, charging station operators, and utilities.
and predict maintenance needs, thereby reducing downtime. Facilitated by DRL models, collaborative learning aligns
This personalized approach is highlighted in [59], where objectives with the overarching goal of achieving carbon
DRL-driven personalization enhances the user experience at neutrality. Bridging these gaps in resource optimality and
charging stations by learning user preferences and habits, context awareness within a carbon-neutral context is pivotal
recommending optimal charging times and locations for for advancing the efficiency, sustainability, and inclusivity of
improved convenience. At the end, DRL techniques have EV charging systems. This section aims to fill gaps in current
emerged as powerful tools for optimizing charging station literature by introducing a context-aware EV smart charging
services and management in the electric vehicle ecosystem. system powered by Deep Reinforcement Learning (DRL).
By leveraging these advanced AI-driven approaches, the EV This system will dynamically optimize charging decisions
charging industry can enhance efficiency, reduce operational in real-time, accommodating diverse stakeholder objectives
costs, and contribute to the sustainable integration of electric and adapting to changing contexts. The goal is to position
vehicles into the energy grid. our proposed system as an innovative and comprehensive
solution, addressing challenges identified in prior research on
F. IDENTIFYING GAPS IN THE LITERATURE EV charging management.
While extensive literature exists on EV charging manage-
ment, the proposed system targets notable gaps. Current III. PROPOSED CONTEXT-AWARE EV SMART CHARGING
studies often concentrate on singular objectives like cost SYSTEM
minimization or grid load reduction in isolation. Few The heart of our research lies in the architecture of the
approaches systematically consider the multiple, sometimes context-aware EV smart charging system, a meticulously
conflicting, objectives of various stakeholders, including grid crafted framework comprising an intelligent agent, a dynamic
operators, fleet managers, charging station operators, and environment, a reward function, and a neural network.
EV owners. Additionally, existing context-aware charging We expound upon each component’s functionality, highlight-
strategies, though present, lack the adaptability and sophis- ing their synergy in refining charging decisions, all the while
tication inherent in DRL-based systems. A holistic approach embracing the ever-shifting contextual factors like time of
is needed, leveraging DRL’s power to optimize EV charging day, location, and weather.
in real-time while accommodating diverse objectives and Artificial intelligence (AI) represents a vast domain
dynamically changing contexts (Reference from research within computer science, instrumental in the development
gate save list). of intelligent systems and computers capable of performing
In the domain of electric vehicle (EV) charging station ser- tasks that traditionally required human intelligence. These
vices and management, it is imperative to address critical gaps AI-powered machines not only excel in problem-solving
related to resource optimality and context awareness, partic- but also contribute to better decision-making processes,
ularly within the framework of achieving carbon neutrality. effectively taking on responsibilities previously reserved
Concerning resource optimality, challenges encompass scal- for humans [64] and [65]. Within the expansive field of
ability and effective resource allocation amid a growing EV AI, Machine Learning (ML) emerges as a crucial subset.
market. The scalability of Deep Reinforcement Learning ML relies on data-driven approaches and the training of
(DRL) models in charging station management must be algorithms using data. Notably, ML models possess the
addressed to accommodate an increasing number of stations remarkable ability to unearth patterns and insights from
and EVs. Exploring multi-objective optimization within the data they ingest, without the need for explicit human
DRL algorithms, balancing user convenience, grid stability, intervention. ML employs a diverse range of algorithmic
operational costs, and carbon neutrality objectives, is crucial. techniques to decipher data, enabling it to make predictions,
Enhancing energy efficiency in line with carbon neutrality improve itself, and elucidate complex data structures. These
goals involves optimizing energy consumption patterns and models can be trained through various strategies, including
minimizing environmental impact using advanced DRL supervised, unsupervised, semi-supervised, and reinforce-
methodologies [60], [61], [62]. On the front of context aware- ment learning. Among the multifaceted methodologies within
ness, persistent gaps involve ensuring multi-stakeholder ML, Deep Learning stands out as a subset characterized
context awareness and dynamic contextual adaptation for by its utilization of artificial neural networks. These neural
carbon neutrality. Integrating the interests and constraints of networks, composed of multiple layers, exhibit self-learning
capabilities through exposure to data, enabling them to is succinctly encapsulated in a unique signal known as the
accomplish a wide array of tasks, such as image recognition reward, transmitted from the environment to the agent at
and speech recognition [66]. each time step. This reward is a straightforward scalar value,
denoted as Rt that belonging to the set of real numbers,
R. The informal objective of the agent revolves around
the maximization of the cumulative reward it accrues over
time. This entails optimizing not just for immediate rewards
but also considering the long-term perspective. The concept
of return encapsulates the agent’s aspiration to maximize
future benefits, typically expressed in terms of expected
value. The specific definition of return varies based on the
nature of the task at hand and whether delayed rewards
FIGURE 1. Reinforcement learning model.
are a part of the equation. For tasks that naturally break
Reinforcement Learning (RL) represents the science of down into discrete episodes, an undiscounted formulation
decision-making within the realm of machine learning. In RL, of return is suitable. Conversely, continuous tasks that do
a computer program assumes the role of an intelligent agent, not naturally have episodic breaks benefit from a discounted
engaging with its environment and acquiring the ability formulation of return, which extends indefinitely. Our goal
to make informed decisions based on its interactions. For is to elucidate the concept of return for both episodic and
instance, consider the scenario of a robotic agent mastering ongoing scenarios, presenting a unified framework that can
the intricacies of foot movement in order to excel in a game be applied across both paradigms. By solving the Bellman
of football; this exemplifies the essence of reinforcement optimal equations, which serve as consistency conditions
learning [64]. At the core of RL lies a fundamental model for optimal value functions, we can systematically derive
where an agent actively interacts with its environment, an optimal policy based on these functions. This process
striving to learn an optimal policy for making decisions allows us to navigate the intricate landscape of reinforcement
across varying states. At each discrete time step, denoted learning, ultimately leading to informed decision-making
as t, the agent observes the current state, represented as within various environments and tasks.
St, of the environment and proceeds to select an action,
denoted as Ai, based on its pre-defined policy. Subsequently,
the environment transitions to a new state, St+1, and the
agent receives a reward, Rt, corresponding to the action it
undertook in state St. The overarching objective for the agent
is to acquire knowledge and refine its policy to maximize
the expected cumulative reward over time. The value of a
state-action pair, represented as (St, Ai), encapsulates the
anticipated cumulative reward commencing from state St,
executing action Ai, and then adhering to the optimal policy
thereafter. This value is formally denoted as Q(St, Ai).
FIGURE 2. Deep reinforcement learning model with policy DNN. FIGURE 3. Proposed context-aware EV smart charging system using DRL.
In Figure 2, the agent is depicted as the primary learner and We have conducted a comprehensive review of various
decision-maker, while the environment serves as the interface research initiatives undertaken by distinct organizations, each
through which the agent interacts with its objectives. The functioning effectively within its domain. However, a recur-
environment, in response to the agent’s actions, continually ring issue has been the inefficient utilization of resources
presents new scenarios and offers rewards, which are due to a lack of collaboration and coordination among these
numerical values the agent strives to maximize over time entities. To illustrate this challenge, let’s consider the scenario
through its chosen activities. The agent’s overarching purpose depicted in Figure 3. In this scenario, we have five primary
stakeholders: Stakeholder ‘A’ whose objective is to get the aspects like charging activities and electric supply
optimised cost, Stakeholder ‘B’ whose objective is to get reservations. This information significantly contributes
the optimised energy, and Stakeholder ‘E’ whose primary to and influences the efficient and grid-friendly utiliza-
objective is to motivate EV-enduser to use Environmental- tion of charging stations. Typically, the grid operator
friendly source of energy to charge their vehicle that has can employ advanced distribution network modeling
directly less impact on the environment. For example, The technologies to forecast feeder and transformer loading
first participants denoted as EV-end users primary interest for the next twenty-four hours with a high degree of
lies in finding an optimal charging point during their journey accuracy.
from ‘location X’ to ‘location Y.’ Their objectives are to 3) Stakeholder-C: Charging Stations Maintainer: The
minimize both charging time and cost. Secondly, The grid role of the charging station maintainer is to ensure
operator is tasked with generating and supplying electricity to the continuous functionality of the charging station,
meet the demands of the region’s charging point efficiently. guaranteeing it meets the demands of end-users
However, they often lack precise information about the and provides reliable services, even in the event of
specific electricity needs of the EV charging stations in their unforeseen disruptions. In cases where the cost of
area. Thirdly, The last stakeholder, offers demands of users renewable energy experiences a decline, the charging
related to promote Environmental friendly resources such as station’s owner may proactively notify customers,
energy from wind, PV, etc. Historically, these stakeholders enabling them to charge their vehicles at a lower cost.
have operated independently, with limited awareness of Additionally, prior to their visit, end-users have the
the real-time demands and requirements of other vendors. option to reserve a charging station for their specific
This lack of synchronization often resulted in resource fleet.
inefficiencies and suboptimal outcomes. However, the state- 4) Stakeholder-D: Fleet Operator The central responsi-
of-the-art methodology proposed in this paper addresses bility of the fleet operator revolves around monitoring
these challenges effectively. It introduces a realistic approach the fleet’s availability for booking and ascertaining the
that integrates the preferred demands and requirements of energy source it relies on (e.g., hydrogen, gas, gasoline,
various stakeholders, enabling more efficient resource alloca- or electric). Additionally, the fleet manager will have
tion and utilization. This collaborative framework promises access to vital battery usage information, including
to usher in a new era of resource management, fostering discharge rates, which can aid in diagnosing problems
synergy among stakeholders and ultimately enhancing the and scheduling necessary repairs. Furthermore, the
overall effectiveness of EV charging systems. fleet operator handles requests for specific fleet types,
The following part explains thoroughly the manner in considering their associated costs and ensuring that
which the suggested architecture works. We demonstrate how they align with the load requirements specified by
the algorithm makes use of contextual data to determine customers.
the win-win requirement for each stakeholder. We define 5) Stakeholder-E: (CO2-Based Energy Provider) The
three different sets of stakeholders as an example in the responsibility of this stakeholder is to supply energy
efficient transportation eco-system including Stakeholder- derived from environmentally sustainable sources,
X, Stakeholder-Y, and Stakeholder-Z renounce in figure as including wind, photovoltaic, biomass, and water.
STK-X, STK-Y and STK-Z respectively. Additionally, they serve as an informed resource
1) Stakeholder-A: EV end-users: The EV end-user is provider to entities like charging station maintain-
encouraged to share their travel itinerary, including ers, facilitating the utilization of energy at more
details such as the starting point and destination of cost-effective rates compared to conventional gasoline
their journey. Additionally, the end-user will receive resources like oil, gas, and coal.
routing suggestions, from which they can choose the The information gathered from individual stakeholders is
most suitable path. The technical specifications of the represented as a set (Ai − > An , Bi − > Bn , Ci − > Cn , . . .,
vehicle, such as battery type, are also determined by Zi − > Zn ), each associated with its respective initial rewards.
the EV end-user. Subsequently, the algorithm generates These parameters serve as the states, as depicted in the left
a set of optimal route options based on these inputs, section of Figure 3, and are subsequently provided as inputs
taking into account key performance indicators such to the model with associated objectives respectively. This data
as pricing and the availability of charging stations. is then subjected to a cutting-edge approach rooted in deep
The EV end-user can then select the routing option reinforcement learning (DRL). Within this framework, the
that aligns with their specific needs and preferences, computer learns the weights of DRL parameters from these
making an informed decision based on both their input sets, recommended domains, and their corresponding
immediate surroundings and the recommendations constraints, as well as associated priorities. Upon reaching
provided by the algorithm. the expected threshold, the precise results are generated,
2) Stakeholder-B: Grid-Operator: The Grid operator as illustrated on the middle of Figure 3 which represents
plays a crucial role by furnishing data pertaining the Q-value of action Ai1 − > Ain for Objective A, the
to feeder and transformer loads, which encompasses Q-value of action Bi1 − > Bin for Objective B, and so
on. In this output, tailored information is presented for sizes varying between 50 and 250, and processes them within
specific stakeholders. For instance, ‘‘EV end-users’’ receive a batch table. The Deep Neural Network (DNN) utilized
personalized scheduling and routing options tailored to their by the DQN agent comprises sixteen input features and
vehicle’s battery needs and environmental considerations. incorporates two hidden layers, each housing a multitude of
‘‘Grid Operators’’ obtain insights into anticipated power interconnected nodes. The DNN outputs four distinct states,
demands for a given region based on charging station reser- sequentially numbered from 1 to bs , aligning with the number
vations, facilitating the management of electric fluctuations, of participants involved in the system.
among other benefits. Furthermore, it’s important to note
that the system continuously refines its understanding of its
surroundings by dynamically adjusting weights and other
relevant parameters to optimize its performance and achieve
the maximum reward to fulfill its task efficiently. Now, let’s
define the total reward i.e. Rt , as the sum of individual rewards
such as Rev is sum of individual reward of electric vehicle end-
user, Rgrid is sum of individual reward of Grid-Operator, and
so on:
Rt = Rev + Rgrid + Rcs + Rfleet + Rco2 (1)
The goal is to learn a policy π that maximizes the expected
total reward:
∞ FIGURE 5. Based on training and prediction of the current and
X subsequent states, the DQN agent state transition Markov diagram
π∗ = argMaxπ r t Rt (2) illustrates the learning process.
n=1
The four output states serve as representations of the
We begin by introducing an objective function as shown
Q-values associated with each action for individual partici-
in equation 2, for reinforcement learning and delineating its
pants. These Q-values play a pivotal role in determining the
purpose. Our computation revolves around a reward function,
optimal action for each participant within the given state. The
denoted as ‘r,’ which operates across different time steps,
action vector, as depicted in Figure 5, mirrors this format.
symbolized as ‘t.’ Utilizing this objective function, we can
In this context, an action signifies the decision made by the
systematically accumulate all the rewards. At each specific
agent following its assessment of the environment during a
time step, a ‘state’ is denoted by ‘x,’ while the action taken
predefined time window. The network agent compiles a list of
within that state is represented by ‘a.’ The ‘reward,’ denoted
actions in the form of an action vector by combining the input
as ‘r,’ encapsulates the computed outcome based on both the
from the neural network with its respective features. These
state ‘x’ and the action ‘a’ taken within it. It’s worth noting
resulting Q-values are subsequently utilized to assess the
that each task aims to maximize a discounted sum of its
effectiveness of information acquisition. The agent proceeds
rewards, incorporating a discount factor ’γ ’ across particular
by providing the current DQN with the state vector using
time steps [67].
the designated batch size. It then evaluates the DQN’s
output, leveraging threshold rates and Q-values, to determine
the Q-threshold value, which aids in the classification of
stakeholders. Overall, the DQN agent harnesses input states
from stakeholders to learn the optimal policy for orchestrating
the charging of electric vehicles in a decentralized manner.
This process is elaborated upon in the forthcoming method-
ology section and is further elucidated through an illustrative
FIGURE 4. DQN model prediction using states and deep neural networks, example. The core functionality has been encapsulated within
the outputs are Q-values, and actions are computed based on Argmax Qi a software package that facilitates interactions among users
for the current state.
across diverse sectors through our platform. To streamline
As depicted in Figure 4, the DQN agent is supplied with this interaction, we’ve developed middleware as a service
input states originating from five distinct stakeholders: EV component. This enhancement empowers us to showcase the
end users, Grid Operators, Charging Station Maintainers, model’s utility even at the urban scale, capable of handling
Fleet Operators, and source of energy. These input states high computing demands, extensive datasets, and model
encompass a total of 16 features, denoted as X1 to X20 in scalability.
Figure 3. The DQN agent employs a batch size ranging from
d1 to dbs for each of these input-feature states, designated as IV. METHODOLOGY
S1 , S2 , and so forth in Figure 4. For each state, the DQN The main objective of this section is to offer a comprehensive
agent retrieves a batch of records from memory, with batch explanation of the research methodology employed in
developing and evaluating the proposed deep reinforcement DQL agent’s neural network model is constructed, featuring
learning algorithm for optimizing the smart2charge applica- hidden layers, a ReLU activation function, and output layers.
tion for electric vehicles. This includes detailing the processes The algorithm proceeds to train the DQL agent through
of data collection, initial processing and purification, data numerous episodes and iterations. At the commencement of
normalization, and the integration of essential insights from each episode, the states are reset, and the algorithm iterates
all stakeholders participating in the electric vehicle (EV) over various states. These states can encompass variables like
charging process. the current state of the EV battery level, the EV’s location,
the charging cost at the present location, the proximity to the
A. DATA COLLECTION nearest charging station, and more. Within each iteration, the
Data for this study was gathered from diverse sources, action values are randomly set with a probability of epsilon,
including actual electric vehicle (EV) charging data, power while they are determined by predicting the actual state with
grid load data, and pertinent datasets from key stakeholders a probability of 1-epsilon. Actions in this context represent
in the EV charging process, such as EV end-users, grid decisions made by the EV end-user, such as opting to charge
operators, fleet operators, and charging station operators. at the current location or driving to a different location.
Additionally, a specific subset of the data was chosen and
anonymized. To enhance the quality and uniformity of the
data, several pre-processing measures were implemented.
These steps involved eliminating irrelevant or duplicate
data, normalizing the data to ensure a consistent format,
and integrating information from various sources. These
data pre-processing efforts were undertaken to guarantee
the reliability and coherence of the dataset used in the
analysis [68].
Data Cleaning: The collected data and information from
various sources underwent thorough cleansing to guarantee
precision and reliability for training the deep reinforcement
learning algorithm. This involved eliminating any missing
or inconsistent values and ensuring that the data was
appropriately formatted for algorithm training. Data Nor-
malization: The data underwent normalization to establish a
consistent format for seamless utilization during training and
evaluation operations. This process involved transforming
information into a standardized format, including converting
facts into numerical values, standardizing value ranges,
and aligning the data with the sophisticated methodology.
Location Integration: Latitude and longitude points were
FIGURE 6. Algorithm 1: Training a Deep Q-learning Agent in the
added as an additional column labeled ‘‘locations’’ to the Smart2ChargeApp Environment [69].
dataset, containing the geographical coordinates of the route
direction. This information is utilized to link the charging The rewards within this context can signify the charging
station dataset for calculating the distance from the current cost of the EV and the time required to reach the next
position to the charging station. Energy Source Inclusion: charging station. These rewards are strategically designed to
A new parameter, ‘‘energy source,’’ has been incorporated incentivize the agent to make decisions that lead to reduced
into the dataset, specifying the type of energy used by each charging costs and shorter charging times. Subsequently, the
charging station operator during vehicle charging. All the target Q-value is calculated, and the model undergoes training
aforementioned procedures were completed to ensure that the based on the current state and target Q-value. The loss is
input data is comprehensive and well-prepared for subsequent computed, and the state is updated to the subsequent state
analysis. until the iteration is concluded. This process is reiterated for
each episode until the entire training is finalized. To assess
B. ALGORITHM IMPLEMENTATION the computational performance of the agent, a comparison
This section outlines the overarching framework for the is made with the desired outcomes, and performance metrics
implementation of the strategy through deep reinforcement such as loss/reward, discount factor, and computational time
learning. The algorithm utilized is a deep Q-learning (DQL) are monitored, as illustrated in the accompanying figure 7.
agent training algorithm tailored for the Smart2ChargeApp The computational graph delineates the interplay among
environment. The process commences by taking the discount factors ($$), loss and reward values, and compu-
Smart2ChargeDS data as input, subjecting it to preprocess- tational time within the framework of the DQN learning
ing, and initializing the DQL parameters. Subsequently, the process. The loss and reward values serve as indicators
A. USER STORY: EV-ENDUSER OPTIMAL COST • Station A: The cost of charging is calculated as
This involves minimizing both charging time and cost by 20 kilowatt-hours multiplied by $0.15 per kilowatt-
strategically selecting the nearest and most cost-effective hour, resulting in $3.00.
charging stations. Additionally, the aim is to enhance the • Station B: The charging cost is determined by mul-
reliance on renewable energy sources, achieved by opting tiplying 20 kilowatt-hours by $0.20 per kilowatt-
for charging stations powered by renewable sources like hour, equaling $4.00.
photovoltaic (PV) or wind instead of conventional sources • Station C: Charging expenses are computed as
such as coal or oil. This not only has a direct positive 20 kilowatt-hours multiplied by $0.10 per kilowatt-
environmental impact by reducing CO2 emissions but also hour, yielding $2.00.
encourages electric vehicle users to adopt eco-friendly energy
sources.
1) EXPERIMENTAL DESIGN
The proposed experimental design is structured into three
main steps: Experiment Design One, Experiment Design
Two, and Experiment Design Three, as illustrated in
Figure 11.
1) Objective(s)
a) To reduce the charging expenses for electric
vehicle end-users, the strategy involves selecting
the closest and most economical charging station.
b) To optimize the utilization of renewable energy
sources, the approach is to choose charging
stations that are powered by renewable energy. FIGURE 12. Simulation of EV without constraints and optional
parameters.
c) The goal is to minimize the time required to
reach the charging station and mitigate the impact Based on the provided inputs see figure 12, the
of factors such as traffic congestion, weather above computation indicates that charging station C
conditions, and wind direction on the charging offers the most economical rates per kilowatt-hour.
process. Consequently, it emerges as the optimal choice for
d) The objective is to decrease the environmental the electric vehicle end-user when considering the
impact by reducing CO2 emissions. charging of their electric car. It’s important to note
that this calculation does not address any constraints or
2) EVALUATION optional parameters. For instance, if the electric vehicle
The fundamental concept underlying the assessment metrics cannot reach station C due to range limitations, stations
is to appraise the effectiveness of the devised strategy, ensur- B or A may become more cost-effective alternatives.
ing the judicious use of resources in electric vehicle charging
aligns with the objectives outlined by all participants. Various
standard evaluation metrics are employed in this context,
including energy efficiency, charging time, charging cost,
battery life, grid impact, and environmental impact. In the
context of this paper, the primary experiments will focus on
evaluating the charging costs for electric vehicle owners.
1) Experiment Design One: Imagine there are three
charging stations accessible to the electric vehicle
end-user, labeled A, B, and C. Station A relies on
renewable energy, charging $0.15 per kilowatt-hour.
Station B is powered by conventional energy, charging
$0.20 per kilowatt-hour, while station C, also relying
on conventional energy, charges $0.10 per kilowatt-
hour. Considering the electric vehicle has a range of FIGURE 13. Optimal cost calculation for experiment Design 1.
100 miles and necessitates 20 kilowatt-hours of energy
for a complete charge, the charging costs at each station In summary, these calculations do not account for
can be computed as follows: constraints or optional parameters. The charging cost is
maximum cost efficiency. During off-peak hours with each approach’s impact on these critical factors, we gain
rates at $0.08 per kWh, charging speeds increase to valuable insights into their effectiveness and sustainability.
60 kWh per hour. Moreover, the simulation accounts This comparison will assist in making an informed decision
for solar energy predictions. If solar energy is predicted about which charging approach aligns best with Alice’s goals
to be available at 30% during the day, it adjusts of optimizing fleet efficiency, reducing operational costs,
charging schedules to prioritize renewable energy and minimizing environmental impact. Let’s proceed with
sources when feasible.These dynamic adaptations lead a detailed examination of each metric across the various
to differentiated charging costs, where during peak charging approaches:
hours, the cost per kWh is $0.15, and during off-peak 1) Energy Efficiency (kWh): This metric represents
hours, it is $0.08, contributing to optimized charging the total energy consumed by the fleet. Lower
expenses and greater overall efficiency. values indicate better efficiency. In this context,
the DRL-based approach consumes the least energy
TABLE 1. Different simulation methodology comparison. (7,500 kWh), followed by the renewable energy-
aware model, indicating that these approaches optimize
energy utilization.
to lower grid strain, with the DRL approach achieving Ultimately, Alice’s decision should prioritize her goals of
the lowest (18 kW). optimizing operational efficiency, minimizing costs, and
reducing environmental impact. The Proposed Simulation
(DRL) offers a well-rounded solution to achieve these
objectives, but the final selection should be tailored to the
unique requirements of Alice’s fleet management.
[7] L. Xiong, D. He, Y. He, P. Li, S. Huang, S. Yang, and J. Wang, ‘‘Multi- [26] A. J. Alrubaie, M. Salem, K. Yahya, M. Mohamed, and M. Kamarol,
objective energy management strategy for multi-energy communities ‘‘A comprehensive review of electric vehicle charging stations with solar
based on optimal consumer clustering with multi-agent system,’’ IEEE photovoltaic system considering market, technical requirements, network
Trans. Ind. Informat., early access, doi: 10.1109/TII.2023.3242812. implications, and future challenges,’’ Sustainability, vol. 15, no. 10,
[8] R. Fachrizal, M. Shepero, D. van der Meer, J. Munkhammar, and J. Widén, p. 8122, May 2023.
‘‘Smart charging of electric vehicles considering photovoltaic power [27] M. E. Honarmand, V. Hosseinnezhad, B. Hayes, M. Shafie-Khah, and
production and electricity consumption: A review,’’ eTransportation, P. Siano, ‘‘An overview of demand response: From its origins to the smart
vol. 4, May 2020, Art. no. 100056, doi: 10.1016/j.etran.2020.100056. energy community,’’ IEEE Access, vol. 9, pp. 96851–96876, 2021, doi:
[9] M. Amjad, A. Ahmad, M. H. Rehmani, and T. Umer, ‘‘A review of 10.1109/ACCESS.2021.3094090.
EVs charging: From the perspective of energy optimization, optimization [28] R. R. Mohassel, A. Fung, F. Mohammadi, and K. Raahemifar, ‘‘A survey
approaches, and charging techniques,’’ Transp. Res. D, Transp. Environ., on advanced metering infrastructure,’’ Int. J. Electr. Power Energy Syst.,
vol. 62, pp. 386–417, Jul. 2018, doi: 10.1016/j.trd.2018.03.006. vol. 63, pp. 473–484, Dec. 2014, doi: 10.1016/j.ijepes.2014.06.025.
[10] K. V. S. M. Babu, P. Chakraborty, and M. Pal, ‘‘Planning of fast charging [29] H. Farhangi, ‘‘The path of the smart grid,’’ IEEE Power Energy Mag.,
infrastructure for electric vehicles in a distribution system and prediction vol. 8, no. 1, pp. 18–28, Jan. 2010, doi: 10.1109/MPE.2009.934876.
of dynamic price,’’ 2023, arXiv:2301.06807. [30] J. Mitali, S. Dhinakaran, and A. A. Mohamad, ‘‘Energy storage systems:
[11] G. Raja, G. Saravanan, S. B. Prathiba, Z. Akhtar, S. A. Khowaja, A review,’’ Energy Storage Saving, vol. 1, no. 3, pp. 166–216, Sep. 2022,
and K. Dev, ‘‘Smart navigation and energy management framework for doi: 10.1016/j.enss.2022.07.002.
autonomous electric vehicles in complex environments,’’ IEEE Internet [31] P. Roy, J. He, T. Zhao, and Y. V. Singh, ‘‘Recent advances of wind-
Things J., early access, doi: 10.1109/JIOT.2023.3244854. solar hybrid renewable energy systems for power generation: A review,’’
[12] Md. N. B. Anwar, R. Ruby, Y. Cheng, and J. Pan, ‘‘Time-of-use- IEEE Open J. Ind. Electron. Soc., vol. 3, pp. 81–104, 2022, doi:
aware priority-based multi-mode online charging scheme for EV charging 10.1109/OJIES.2022.3144093.
stations,’’ in Proc. IEEE Int. Conf. Commun., Control, Comput. Technol. [32] O. Krishan and S. Suhag, ‘‘An updated review of energy storage systems:
Smart Grids (SmartGridComm), Singapore, Oct. 2022, pp. 166–171, doi: Classification and applications in distributed generation power systems
10.1109/SmartGridComm52983.2022.9961019. incorporating renewable energy resources,’’ Int. J. Energy Res., vol. 43,
[13] S. Kucuksari and N. Erdogan, ‘‘EV specific time-of-use rates anal- no. 12, pp. 6171–6210, Oct. 2019.
ysis for workplace charging,’’ in Proc. IEEE Transp. Electrific.
[33] K. M. Tan, T. S. Babu, V. K. Ramachandaramurthy, P. Kasinathan,
Conf. Expo (ITEC), Chicago, IL, USA, Jun. 2021, pp. 783–788, doi:
S. G. Solanki, and S. K. Raveendran, ‘‘Empowering smart grid: A
10.1109/ITEC51675.2021.9490039.
comprehensive review of energy storage technology and application with
[14] J. Vuelvas, F. Ruiz, and G. Gruosso, ‘‘A time-of-use pricing strategy for
renewable energy integration,’’ J. Energy Storage, vol. 39, Jul. 2021,
managing electric vehicle clusters,’’ Sustain. Energy, Grids Netw., vol. 25,
Art. no. 102591, doi: 10.1016/j.est.2021.102591.
Mar. 2021, Art. no. 100411, doi: 10.1016/j.segan.2020.100411.
[34] Z. Shi, W. Wang, Y. Huang, P. Li, and L. Dong, ‘‘The capacity joint
[15] W. Wu, Y. Lin, R. Liu, Y. Li, Y. Zhang, and C. Ma, ‘‘Online EV charge
optimization of energy storage and renewable generation based on
scheduling based on time-of-use pricing and peak load minimization:
simulation,’’ in Proc. Int. Conf. Power Syst. Technol. (POWERCON),
Properties and efficient algorithms,’’ IEEE Trans. Intell. Transp. Syst.,
Guangzhou, China, Nov. 2018, pp. 1–4, doi: 10.1109/POWERCON.2018.
vol. 23, no. 1, pp. 572–586, Jan. 2022, doi: 10.1109/TITS.2020.3014088.
8602156.
[16] A. Hussain, V.-H. Bui, and H.-M. Kim, ‘‘A decentralized dynamic pricing
[35] K. M. Muttaqi, Md. R. Islam, and D. Sutanto, ‘‘Future power distri-
model for demand management of electric vehicles,’’ IEEE Access, vol. 11,
bution grids: Integration of renewable energy, energy storage, electric
pp. 13191–13201, 2023, doi: 10.1109/ACCESS.2023.3242599.
vehicles, superconductor, and magnetic bus,’’ IEEE Trans. Appl. Super-
[17] B. Aljafari, P. R. Jeyaraj, A. C. Kathiresan, and S. B. Thanikanti,
cond., vol. 29, no. 2, pp. 1–5, Mar. 2019, doi: 10.1109/TASC.2019.
‘‘Electric vehicle optimum charging-discharging scheduling with
2895528.
dynamic pricing employing multi agent deep neural network,’’
Comput. Electr. Eng., vol. 105, Jan. 2023, Art. no. 108555, doi: [36] J. Daly, L. Zheng, M. Xuan, Y. Yang, M. De Rosa, and F. Pallonetto,
10.1016/j.compeleceng.2022.108555. ‘‘Comparative analyses of forecasting techniques for electricity wholesale
[18] H.-M. Chung, S. Maharjan, Y. Zhang, and F. Eliassen, ‘‘Intelligent price under high penetration of renewable energy systems,’’ in Proc.
charging management of electric vehicles considering dynamic user 13th Mediterranean Conf. Power Generation, Transmiss., Distrib. Energy
behavior and renewable energy: A stochastic game approach,’’ IEEE Trans. Convers., Valletta, Malta, 2022, pp. 520–525, doi: 10.1049/icp.2023.0046.
Intell. Transp. Syst., vol. 22, no. 12, pp. 7760–7771, Dec. 2021, doi: [37] M. Ali, M. A. Abdulgalil, I. Habiballah, and M. Khalid, ‘‘Optimal
10.1109/TITS.2020.3008279. scheduling of isolated microgrids with hybrid renewables and energy
[19] K. Ginigeme and Z. Wang, ‘‘Distributed optimal vehicle-to-grid storage systems considering demand response,’’ IEEE Access, vol. 11,
approaches with consideration of battery degradation cost under pp. 80266–80273, 2023, doi: 10.1109/ACCESS.2023.3296540.
real-time pricing,’’ IEEE Access, vol. 8, pp. 5225–5235, 2020, doi: [38] Z. Shen, Z. Song, H. Zhao, C. Wang, L. Chunhui, E. Hu, Z. Wu,
10.1109/ACCESS.2019.2963692. and Z. Zhi, ‘‘Multi-objective optimization method for low-carbon
[20] T. Kaur and D. Kumar, ‘‘MACO-QCR: Multi-objective ACO-based QoS- development of wind-light-biogas-storage integrated energy microgrids
aware cross-layer routing protocols in WSN,’’ IEEE Sensors J., vol. 21, in remote areas,’’ in Proc. IEEE 6th Conf. Energy Internet Energy
no. 5, pp. 6775–6783, Mar. 2021, doi: 10.1109/JSEN.2020.3038241. Syst. Integr. (EI2), Chengdu, China, Nov. 2022, pp. 327–332, doi:
[21] O. Samuel, N. Javaid, A. Khalid, W. Z. Khan, M. Y. Aalsalem, 10.1109/EI256261.2022.10117046.
M. K. Afzal, and B.-S. Kim, ‘‘Towards real-time energy management of [39] T. M. Karakoyun and M. Kucukvar, ‘‘The role of policy in the development
multi-microgrid using a deep convolution neural network and cooperative of grid-connected photovoltaic power: Case study of Austin, Texas,’’
game approach,’’ IEEE Access, vol. 8, pp. 161377–161395, 2020, doi: Energy Policy, vol. 148, Jan. 2021, Art. no. 111935.
10.1109/ACCESS.2020.3021613. [40] F. Aubeck, G. Birmes, and S. Pischinger, ‘‘V2G connected battery
[22] A. Sadiq, M. U. Javed, R. Khalid, A. Almogren, M. Shafiq, and recharging and refueling driver assistance system to optimize long distance
N. Javaid, ‘‘Blockchain based data and energy trading in Internet of operation,’’ in Proc. IEEE Intell. Transp. Syst. Conf. (ITSC), Auckland,
Electric Vehicles,’’ IEEE Access, vol. 9, pp. 7000–7020, 2021, doi: New Zealand, Oct. 2019, pp. 4425–4430, doi: 10.1109/ITSC.2019.
10.1109/ACCESS.2020.3048169. 8917068.
[23] Q. Xing, Y. Xu, and Z. Chen, ‘‘A bilevel graph reinforcement learning [41] P. W. Shaikh and H. T. Mouftah, ‘‘Connected and autonomous
method for electric vehicle fleet charging guidance,’’ IEEE Trans. Smart electric vehicles charging reservation and trip planning system,’’ in
Grid, early access, doi: 10.1109/TSG.2023.3240580. Proc. Int. Wireless Commun. Mobile Comput. (IWCMC), Harbin
[24] M. B. Rasheed, M. Awais, T. Alquthami, and I. Khan, ‘‘An optimal schedul- City, China, 2021, pp. 1135–1140, doi: 10.1109/IWCMC51323.2021.
ing and distributed pricing mechanism for multi-region electric vehicle 9498849.
charging in smart grid,’’ IEEE Access, vol. 8, pp. 40298–40312, 2020, doi: [42] R. Sudhoff, S. Schreck, S. Thiem, and S. Niessen, ‘‘Achieving grid-friendly
10.1109/ACCESS.2020.2976710. operation of renewable energy communities through smart usage of electric
[25] N. A. Q. Muzir, M. R. H. Mojumder, M. Hasanuzzaman, and J. Selvaraj, vehicle charging and flexibilities,’’ in Proc. CIRED Porto Workshop,
‘‘Challenges of electric vehicles and their prospects in malaysia: A E-mobility power distribution Syst., Porto, Portugal, Jun. 2022,
comprehensive review,’’ Sustainability, vol. 14, no. 14, p. 8320, Jul. 2022. pp. 103–107, doi: 10.1049/icp.2022.0672.
[43] L. Yu, C. Tong, Y. Li, D. Cui, J. Zhang, and Y. Wang, ‘‘Operation [60] C. Zou, ‘‘Multi-objective resource allocation for EV charging infrastruc-
analysis of power distribution system considering demand side response ture in a multi-stakeholder environment,’’ Appl. Energy, vol. 301, 2021.
of multiple types of flexible loads,’’ in Proc. 4th Int. Conf. Electr. Eng. [61] M. Z. Zainudin, ‘‘Multi-objective deep reinforcement learning for optimal
Control Technol. (CEECT), Shanghai, China, Dec. 2022, pp. 602–607, doi: EV charging station management considering multiple stakeholders,’’
10.1109/CEECT55960.2022.10030636. IEEE Access, vol. 9, 2021.
[44] H. Thwany, M. Alolaiwy, M. Zohdy, W. Edwards, and C. J. Kobus, [62] Y. Chu, Z. Wei, X. Fang, S. Chen, and Y. Zhou, ‘‘A multiagent
‘‘Machine learning approaches for EV charging management: A system- federated reinforcement learning approach for plug-in electric vehicle fleet
atic literature review,’’ in Proc. IEEE Int. Conf. Artif. Intell., Blockchain, charging coordination in a residential community,’’ IEEE Access, vol. 10,
Internet of Things (AIBThings), Mount Pleasant, MI, USA, 2023, pp. 1–6, pp. 98535–98548, 2022, doi: 10.1109/ACCESS.2022.3206020.
doi: 10.1109/AIBThings58340.2023.10292487. [63] Y. Ma, ‘‘Context-aware deep reinforcement learning for electric vehicle
[45] M. Rezaeimozafar, M. Eskandari, and A. V. Savkin, ‘‘A self-optimizing charging in smart grids,’’ J. Appl. Energy, vol. 307, 2022.
scheduling model for large-scale EV fleets in microgrids,’’ IEEE [64] W. Zhang, H. Liu, F. Wang, T. Xu, H. Xin, D. Dou, and H. Xiong,
Trans. Ind. Informat., vol. 17, no. 12, pp. 8177–8188, Dec. 2021, doi: ‘‘Intelligent electric vehicle charging recommendation based on multi-
10.1109/TII.2021.3064368. agent reinforcement learning,’’ in Proc. Web Conf. New York, NY,
[46] Y. Zhang, R. Engelhardt, A.-A. Syed, F. Dandl, C. Hardt, and USA: Association for Computing Machinery, 2021, pp. 1856–1867, doi:
K. Bogenberger, ‘‘Simulating charging processes of mobility-on-demand 10.1145/3442381.3449934.
services at public infrastructure: Can operators complement each other?’’ [65] Rme Jacob. (2020). Intelligent Charging Algorithm for Electric Vehi-
in Proc. IEEE 25th Int. Conf. Intell. Transp. Syst. (ITSC), Macau, cles. [Online]. Available: http://www.diva-portal.org/smash/record.jsf?
Oct. 2022, pp. 2200–2205, doi: 10.1109/ITSC55140.2022.9922449. pid=diva2:1466882
[47] J. O. N. Wilson and T. T. Lie, ‘‘Off-grid EV charging stations to reduce [66] H. M. Abdullah, A. Gastli, and L. Ben-Brahim, ‘‘Reinforcement learning
the impact of charging demand on the electricity grid,’’ in Proc. 7th IEEE based EV charging management systems—A review,’’ IEEE Access, vol. 9,
Workshop Electron. Grid (eGRID), Auckland, New Zealand, Nov. 2022, pp. 41506–41531, 2021.
pp. 1–5, doi: 10.1109/eGRID57376.2022.9990019. [67] Y. Wang, D. Qiu, and G. Strbac, ‘‘Multi-agent reinforcement learning for
[48] N. Matanov and A. Zahov, ‘‘Developments and challenges for electric vehicles joint routing and scheduling strategies,’’ in Proc. IEEE
electric vehicle charging infrastructure,’’ in Proc. 12th Electr. 25th Int. Conf. Intell. Transp. Syst. (ITSC), Oct. 2022, pp. 3044–3049, doi:
Eng. Fac. Conf. (BulEF), Varna, Bulgaria, Sep. 2020, pp. 1–5, doi: 10.1109/ITSC55140.2022.9921744.
10.1109/BulEF51036.2020.9326080. [68] (2022). Stakeholders Datasources. [Online]. Available: https://www.
[49] N. D. Toan and K. G. Woo, ‘‘Mapless navigation with deep reinforce- smard.de/home, and http://www.gis-rest.nrw.de/atomFeed/rest/atom/220
ment learning based on the convolutional proximal policy optimization 35f08-7c04-4265-8db9-dd6a848854d4/ad20b6ad-4b53-46e4-b420-b7aeb
network,’’ in Proc. IEEE Int. Conf. Big Data Smart Comput. (BigComp), 9067c54.html, and https://chargemap.com/map, and https://github.
Jan. 2021, pp. 298–301, doi: 10.1109/BigComp51126.2021.00063. com/topics/charging-stations
[50] J. Dong, H. Wang, J. Yang, X. Lu, L. Gao, and X. Zhou, ‘‘Optimal [69] M. Sharif, G. Lückemeyer, and H. Seker, ‘‘Context aware-resource opti-
scheduling framework of electricity-gas-heat integrated energy system mality in electric vehicle Smart2Charge application: A deep reinforcement
based on asynchronous advantage actor-critic algorithm,’’ IEEE Access, learning-based approach,’’ IEEE Access, vol. 11, pp. 88583–88596, 2023,
vol. 9, pp. 139685–139696, 2021, doi: 10.1109/ACCESS.2021.3114335. doi: 10.1109/ACCESS.2023.3305966.
[51] J.-X. Gong, G. Mei, and Y.-M. Liu, ‘‘The real-time optimization of active
distribution system based on deep deterministic policy gradient,’’ in Proc.
8th Renew. Power Gener. Conf., Shanghai, China, Oct. 2019, pp. 1–6, doi:
10.1049/cp.2019.0545.
[52] A. Brim, ‘‘Deep reinforcement learning pairs trading with a double
deep Q-network,’’ in Proc. 10th Annu. Comput. Commun. Workshop
Conf. (CCWC), Las Vegas, NV, USA, Jan. 2020, pp. 0222–0227, doi: MUDDSAIR SHARIF received the master’s
10.1109/CCWC47524.2020.9031159. degree in software technology from Linnaeus
[53] Y. Lu, Y. Liang, Z. Ding, Q. Wu, T. Ding, and W.-J. Lee, ‘‘Deep reinforce- University, Sweden. He is currently pursuing the
ment learning-based charging pricing for autonomous mobility-on-demand Ph.D. degree with the Birmingham City Univer-
system,’’ IEEE Trans. Smart Grid, vol. 13, no. 2, pp. 1412–1426, sity, delving into the research interests. With a
Mar. 2022, doi: 10.1109/TSG.2021.3131804. background as a software technology professional,
[54] X. Zhang, Z. Xi, T. Wang, and X. Liu, ‘‘Source grid load and energy he brings extensive experience in both research
storage management method based on cloud edge cooperation,’’ in Proc.
and development. Furthermore, he holds a special-
7th Asia Conf. Power Electr. Eng. (ACPEE), Hangzhou, China, Apr. 2022,
ization in data science from Stanford University,
pp. 164–169, doi: 10.1109/ACPEE53904.2022.9783876.
[55] B. Wang, Y. Li, W. Ming, and S. Wang, ‘‘Deep reinforcement learning
USA. With over seven years of expertise in
method for demand response management of interruptible load,’’ IEEE research and development, he continues to contribute to the field.
Trans. Smart Grid, vol. 11, no. 4, pp. 3146–3155, Jul. 2020, doi:
10.1109/TSG.2020.2967430.
[56] Z. Zhang, Y. Wan, J. Qin, W. Fu, and Y. Kang, ‘‘A deep RL-based
algorithm for coordinated charging of electric vehicles,’’ IEEE Trans.
Intell. Transp. Syst., vol. 23, no. 10, pp. 18774–18784, Oct. 2022, doi:
10.1109/TITS.2022.3170000. HUSEYIN SEKER is currently a Professor of com-
[57] W. Zhang, Q. Wang, J. Li, and C. Xu, ‘‘Dynamic fleet management
puting sciences and the Associate Dean (Research,
with rewriting deep reinforcement learning,’’ IEEE Access, vol. 8,
Innovation, and Enterprise) with the Faculty of
pp. 143333–143341, 2020, doi: 10.1109/ACCESS.2020.3014076.
[58] N. Sultana, J. Chan, T. Sarwar, and A. K. Qin, ‘‘Learning to optimise
Computing, Engineering and the Built Environ-
routing problems using policy optimisation,’’ in Proc. Int. Joint Conf. ment, Birmingham City University, Birmingham,
Neural Netw. (IJCNN), Shenzhen, China, Jul. 2021, pp. 1–8, doi: U.K. He has both academic and industry experi-
10.1109/IJCNN52387.2021.9534010. ence in artificial intelligence, machine learning,
[59] G. Zhemerov, O. Plakhtii, and A. Mashura, ‘‘Efficiency analysis data science, and emerging and disruptive tech-
of charging station for electric vehicles using the active rectifier nologies/systems. He has published more than
in microgrid system,’’ in Proc. IEEE 4th Int. Conf. Intell. Energy 100 peer-reviewed conference papers and journal
Power Syst. (IEPS), Istanbul, Turkey, Sep. 2020, pp. 37–42, doi: articles.
10.1109/IEPS51250.2020.9263182.