0% found this document useful (0 votes)

16 views22 pages

Adaptive Traffic Light Control Using Deep Reinforcement Learning Technique

This document presents a novel adaptive traffic light control system utilizing deep reinforcement learning to address urban traffic congestion by dynamically adjusting traffic signals based on vehicle classification and weight. The proposed model significantly reduces average waiting times at intersections by up to 91.7% through real-time data processing and communication protocols. The study emphasizes the importance of considering vehicle types and weights in traffic management to enhance efficiency and safety in smart city environments.

Uploaded by

gyanibaba484

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views22 pages

Adaptive Traffic Light Control Using Deep Reinforcement Learning Technique

Uploaded by

gyanibaba484

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

Multimedia Tools and Applications (2024) 83:13851–13872

https://doi.org/10.1007/s11042-023-16112-3

Adaptive traffic light control using deep reinforcement

learning technique

Ritesh Kumar1 · Nistala Venkata Kameshwer Sharma1 · Vijay K. Chaurasiya1

Received: 23 March 2022 / Revised: 6 May 2023 / Accepted: 26 June 2023 /

Published online: 10 July 2023
© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023

Abstract
Smart city growth needs information and communication technology to increase urban sus-
tainability but faces critical traffic congestion and vehicle classification issues. It is crucial
to dynamically change the traffic light on the road network to reduce the delay of vehicles
and avoid congestion in the smart city. Modifying the traffic light should be adaptive, con-
sidering the number of vehicles on the road and the options available to route the vehicles
toward their destination. Our scheme is the first proposed model based on deep learning to
solve the problem of traffic congestion in the urban environment. This model classifies the
vehicle’s type on the road and assigns different vehicle weights. We assign 0.0 for no vehi-
cles, and 1.0, 2.0, 3.0 for light-weight, moderate-weight, and heavy-weight vehicles respec-
tively. The proposed work has trained using experience replay and target network based
on a deep double-Q learning mechanism. Our resultant model applies in a real-time traffic
network that uses Dedicated-Short-Range-Communication (DSRC) protocol for wireless
communication. The simulation of this work uses SUMO (Simulation in Urban MObil-
ity) with the data generated on SUMO using a random function. The results show that the
traffic light of a certain traffic intersection becomes adaptive, aligning with the goals men-
tioned above. The proposed model efficiently reduces the average waiting time up to 91.7%
at the intersection points of the road which is shown in the graph in the result section.

Keywords Deep Reinforcement Learning · Traffic Control Interface(TraCI) · Simulation in

Urban MObility(SUMO) · Dedicated-Short-Range-Communication (DSRC)

1 Introduction

The growth of vehicles worldwide is making the roads congested, resulting in a signifi-
cant delay for an average vehicle to commute from one place to another. The road traffic
system is still handled and managed in a static round-robin manner despite machine

* Ritesh Kumar
[email protected]
Vijay K. Chaurasiya
[email protected]
1
Indian Institute of Information Technology, Allahabad, India

13
Vol.:(0123456789)
13852 Multimedia Tools and Applications (2024) 83:13851–13872

learning and artificial intelligence techniques. The existing traditional methods like
round-robin and the longest queue first methods do not take the classification of differ-
ent types of vehicles into account, which certainly increases the traffic and makes these
algorithms inefficient. Traffic lights are typically used to regulate intersections along
busy roads or major highways. However, their ineffective regulation results in a number
of issues, including significant energy waste as well as traveler delays. Even worse, inef-
ficient light control could result in auto accidents [17, 23]. Current traffic signal control
either deploy fixed programs without taking real-time traffic into account or takes traf-
fic into account only very minimally [2]. The fixed programs set the traffic signals to
have variable cycle times depending on historical data rather than equal cycle times. To
determine whether there are any vehicles in front of traffic signals, some control algo-
rithms use input from sensors like subterranean inductive loop detectors. To estimate
the length of green/red lights, the inputs are processed extremely roughly [34]. SUMO
is a simulation tool to simulate real-time traffic and analyze the results after taking the
decisions made by the machine learning model. The acronym TraCI stands for "Traffic
Control Interface."It allows to recover the value of the simulated objects and modifies
their behavior "online" by giving access to a running road traffic simulation. Existing
traffic signal control systems occasionally function, albeit inefficiently. The traffic light
control systems, however, become paralyzed in several other situations, such as in a
sports event or a more typical situation during a peak hour of traffic. Instead, we fre-
quently see a skilled police officer directly controlling the crossroads by waving signs.
In situations with high traffic, a controller observes the actual traffic situation on the
crossing roads and, utilizing his or her extensive knowledge of the crossing cleverly
calculates the length of the permitted passing of time with each direction (Figs. 1, 2, 3,
4, 5, 6, 7, 8, 9, 10).
This discovery prompts us to suggest an intelligent intersection traffic signal man-
agement system that can learn how to operate the intersection like a human operator by
incorporating real-time traffic conditions as input.

Fig. 1 DSRC infrastructure

13
Multimedia Tools and Applications (2024) 83:13851–13872 13853

Fig. 2 SUMO simulated road network

Many technologies can help a vehicle communicate with every other vehicle and with
the infrastructure node placed on the roadside. However, the automation of traffic flow
is still a challenging research problem that can reduce traffic congestion and is adaptive,
dynamic, and automated. Machine learning approaches can be used to solve such prob-
lems. Many machine learning models can solve traffic problems of road networks. Some of
such models are presented below: (1) Supervised Learning (2) Semi-supervised Learning
(3) Unsupervised Learning (4) Reinforcement Learning [26].
Further reinforcement learning is a sort of machine learning in which a model learns
from a sequence of actions and then determines which step to do depending on the cur-
rent state, maximizing the reward [15, 31]. In simpler terms, reinforcement learning means
a goal-oriented algorithm that learns how to attain a final complicated objective(goal) or
maximize the goal targets in a particular dimension. Deep learning algorithms are also
available to develop adaptive traffic light control systems. But many existing solutions have

Fig. 3 The proposed workflow diagram in which agent architecture is shown in Fig. 8

13
13854 Multimedia Tools and Applications (2024) 83:13851–13872

Fig. 4 Vehicle State position representation of Bike lightweight (1.0), Car moderate weight (2.0) and Truck
heavyweight (3.0) vehicles are represented in the form of grid matrix

not considered the randomness of the traffic and the different kinds of vehicles’ impact on
traffic congestion. DRL method is used [13] for adaptive traffic signal control framework
which explicitly considers realistic traffic scenarios sensors and physical constraints but
it does not consider intersections with left and right turns. Hence the four-phase scheme
is considered in [20] which uses PPO (Proximal Policy Optimization) for improving con-
vergence of the model but it fails under multiple intersection scenarios. Further, a deep
actor-critic method is designed to provide efficient traffic signal plans using a series of tem-
porally sequential images in [19] but it also does not consider the implementation of mul-
tiple intersections. In The continuous values states of the model cannot be done by rein-
forcement learning. In this case, the states have to go through as an input to the function
approximator derived from neural networks like Convolutional Neural Networks (CNN),
Artificial Neural Networks (ANN), etc. These machine learning models, in particular, are
called Deep Reinforcement Learning.

Fig. 5 State position repre-

sentation of lightweight (1.0),
moderate weight (2.0), and
heavyweight (3.0) vehicles are
represented in the form of grid
matrix

13
Multimedia Tools and Applications (2024) 83:13851–13872 13855

Fig. 6 State Velocity repre-

sentation of corresponding
lightweight, moderate weight,
and heavyweight vehicle’s state
position representation are shown
in form of grid matrix

With the continuous and rapid growth of the automotive industry, tools to automate traf-
fic junctions are still missing. Traffic congestion is a real problem because time has become
more critical than ever. Since a human manages every vehicle in the present situation, it
is paint-staking work. Therefore, an automatic and adaptive traffic light control system is
required to reduce traffic congestion and vehicle wait times at traffic crossings.
With the recent development in artificial intelligence and machine learning, many auto-
mation applications are coming into the picture and utilized in automatic traffic light con-
trol applications. DSRC protocol achieves security which is crucial for communication
between vehicle to vehicle (V2V) and infrastructure. And when DSRC is combined with
the RL algorithm, it can efficiently reduce the average waiting time of vehicles at an inter-
section, even with a low detection rate, thus reducing the travel time of vehicles [41]. But
the drawback of this method is that it is not more efficient with multi-intersections. Further

Fig. 7 Architecture of CNN in which two layer convolution is used

13
13856 Multimedia Tools and Applications (2024) 83:13851–13872

Fig. 8 Agent Architecture

other deep learning algorithms like MADQN (Multi-agent deep-Q network) work better
and investigates its use to further address the curse of dimensionality under traffic network
scenarios with high traffic volume and disturbances [28] but it is also not as good as in
abrupt traffic collision disturbance. Hence deep -Q learning model with experience replay
and target network is used to combat the above problems [29] and tries to optimize the
traffic light cycle using the Markov decision process also but the main drawback of this
method is that it fails in case of complex road network scenarios. None of the above-said
methods have taken advantage of the different weights of different vehicles which are done
in this paper.
The main contributions of our paper are 1) To the best of our knowledge, this paper
first uses an algorithm to extract different vehicles from the traffic of a specific traffic inter-
section is implemented by considering their weights and impacts on the traffic delays to
classify them into different categories. For example, different weights for different types
of vehicles have been assigned (e.g., for lightweight = 1, moderate weight = 2, heavy-
weight = 3, and no vehicle = 0) for controlling traffic lights. 2) The deep reinforcement
learning algorithm along with the Target network and Experience replay improves effi-
ciency and makes simulations more realistic. 3) Further, this paper explores and solves
automating dynamic and adaptive traffic lights for a traffic intersection using reinforcement
learning, and the complete model is deployed into a simulation using SUMO and Traffic

Fig. 9 The left part shows a deep reinforcement learning model for traffic light control in the left part. The
right part shows the phase transition diagram. Taken from multi-agent [37]

13
Multimedia Tools and Applications (2024) 83:13851–13872 13857

Fig. 10 Figure shows the average waiting time delay for different methods used in base paper [18] the traffic
while training where the x-axis represents the no. of episodes and the y-axis represents the average waiting
time of vehicle’s staying time at the intersection in seconds

Control Interface (TraCI). 4) Also, the best Markov decision process is used to make the
traffic light adaptive and dynamic. The agent decides on the reward it gets through pre-
vious steps and learning through the environment. 5) The last important contribution of
this paper is that DSRC vehicle to infrastructure (V2I) communication is used for knowing
about the exact location of each vehicle and henceforth that location data is used for a sub-
stantial deduction on traffic congestion.
The remaining part is organized in the following section as stated below. Further Sec-
tion 2 describes related work and Section 3 describes the Problem definition. Background
and Proposed Methodology are described in Sections 4 and 5. Section 6 describes the
implementation and result and Section 7 represents conclusion and result. Further declara-
tion and acknowledgment are made in Sections 8 and 9 respectively.

2 Related work

With the recent development in artificial intelligence and machine learning, many automa-
tion applications are coming into the picture, and traffic light control is among them. Secu-
rity concerns have also been solved through DSRC, a crucial blockage for communication
between V2V and V2I. So the data and technical tools are available to solve the automation
of adaptive traffic light management through which less traffic congestion and minimiza-
tion of the time delay of vehicles waiting at a particular traffic intersection can be achieved.
Deep learning algorithms taking care of adaptive traffic light control systems are avail-
able in today’s world. However, the lack of randomness of the traffic and the different kinds
of vehicles still impact traffic congestion. All the methods and algorithms proposed until
now have not considered these cases. That is why they lag behind the real world’s traffic.
The problems that have been solved through deploying deep reinforcement learning are
discussed in this section. An adaptive agent is made, which diminishes the traffic conges-
tion of a particular traffic junction.
In [8], a state called DTSE (Discrete Traffic State Encoding) has been proposed, which
is a vector having the information of a cell containing a vehicle or not, the vehicle’s speed
at a cell includes a vehicle and the current traffic light. Experience replay has been used
in the deep Q-learning agent, with CNN as a function approximator containing one hid-
den layer. It has been found that the agent can minimize 82 % of average cumulative delay,
66 % of average queue length, and 20 % of travel time [8, 9]. "Further, a deep Q-learning

13
13858 Multimedia Tools and Applications (2024) 83:13851–13872

algorithm is proposed to learn the Q-function of the traffic system state inputs and the cor-
responding traffic system outputs in" [14]. The benefits provided by the system are also
analyzed over regular traffic system control. The queue length as the state is used and is
stacked autoencoders as the function approximator. The reward suggested is the difference
between flows in two directions, which is further minimized over training sampled data.
In [35] the position as the state is also proposed for the deep Q-learning algorithm,
which then can be used to lessen the traffic congestion and subsequently help the traffic to
be adaptive and intelligent. The approach is to collect the data, divide the entire intersec-
tion into smaller grids, and quantify complex traffic scenarios into states. "A traffic light’s
timing adjustments are the acts that are modeled as a high-dimension Markov decision
process"[16]. "Experience replay and target network into the deep reinforcement learning
algorithm is used in " [5, 6]. In which input states are used as velocity and position of
vehicles in the traffic network. The incentive instilled in this paper is the change in the
cumulative delay time of vehicles in the traffic network. Convolutional neural networks
have been used as function approximators. The result is compared with two other road sign
algorithms, the maximum fast-queue algorithm, and the fixed-time rotation algorithm.
In [41] the effect caused by the partial detection of the vehicles present in the traffic
junction is explored. The idea is that one cannot have every vehicle united under a similar
source of communication and hence would only use those technically supported vehicles
with the vehicle to infrastructure communication. The system can reduce vehicles’ accu-
mulated delay time at traffic intersections even if the detection rate is low.
The summary of the work with previous existing related work to show the better result
obtained by the proposed algorithm is shown in the above Tables 1, 2, 3.

3 Problem definition

In recent years, there have been many advancements in the Q-learning approach like expe-
rience replay and the target network. In this paper, the following problems are proposed to
solve.

– To classify what kind of vehicle it is and then assign a weight to it.

– To solve the problem of automatic, dynamic, and adaptive traffic lights for a traffic
intersection using the deep-Q learning model.
– Optimize the green traffic light timing to reduce congestion.
– Making the traffic light intersection adaptive using different vehicle states and velocity
positions.

After doing this, further, the agent is trained based on a deep double-Q learning mecha-
nism using experience replay such that it is deployed into a real-time traffic network that
uses DSRC to communicate between vehicle and infrastructure and make the traffic light of
a certain traffic intersection adaptive, aligning with the goal of the agent.

4 Background

In this section, the proposed solution to the problem is discussed along with the discussion
of the action, state, and reward function:

13
Table 1 Comparison for related work
Author’s name and Ref.number Objective Techniques Simulation tools Drawback

Seyed Sajad Mousavi(2017) [23] Traffic light control using DRL Policy-gradient and valued -function SUMO Not applicable for multi-agent cases
based
Rusheng Zhang (2020) [41] Traffic light control using partial DSRC SUMO Not applicable for more than 5 inter-
detection section
Li,li (2016) [14] Traffic light control within appropri- Deep Reinforcement learning SUMO Not worked in unstructured traffic
ate time Algorithm format
Xiaorong Hu (2020) [10] Dynamic traffic light control using GP light technique SUMO Not effective in long and heavy traffic
GNN
Hua Wei (2018) [36] Traffic light control on real-time data Deep reinforcement learning tested SUMO Not suitable for multi-phase traffic
with real-time data light
Bingquan Yu (2020) [38] Traffic light control DDPG-based DRL technique SUMO Not suitable with large scale road
network
Deepeka Garg (2018) [7] Traffic light control using optimally Policy-based gradient SUMO Not work in dynamic and diverse traf-
Multimedia Tools and Applications (2024) 83:13851–13872

simulator fic situation well

Mustafa kun (2018) [4] TLO (traffic light optimizer) to fix Policy gradient-based algorithm SUMO Blocking of minor roads and multia-
the phase and duration of traffic gent scenario
light
Chenghao Li (2020) [13] Control of fairness of all driver’s Fairness control algorithm SUMO Not work in large road networks
waiting time
Neetesh Kumar (2020) [12] Traffic control using priority mode DITLCS (Dynamic and Intelligent Fuzzy algorithm Not work in Heavy traffic
Traffic Light Control System)
Tong Wu (2020) [37] Traffic light control in multiagent MARDDPG (multi-agent deep deter- SUMO and VANET Not consider pedestrian and bus traffic
system ministic policy gradient)
Satya Prakash Sahu (2021) [29] Traffic light cycle optimization Double Q-network (DQN) and SUMO Not work in complex road with heavy
Markov decision traffic flow
Moein Raeisi (2021) [27] Urban intersection traffic light Deep reinforcement learning in low SUMO Not work for more than 4 way traffic
control and high traffic scenario
13859

13
Table 1 (continued)
13860

Author’s name and Ref.number Objective Techniques Simulation tools Drawback

13
Dongfang Ma(2021) [20] To develop a deep actor-critic Deep reinforcement learning with DRL it does not explicitly consider the
method that can provide efficient a series of temporally sequential implementation on multiple intersec-
traffic signal plans images tions
Kai Liang Tan (2019) [33] To propose a DRL-based adaptive Deep reinforcement learning in low DRL To extend the DRL framework towards
traffic signal control framework and high traffic scenario intersections with left and right turns
that explicitly considers realistic and arterial corridors is still needed
traffic scenarios, sensors, and
physical constraints
Zibo Ma, (2021) [19] Urban intersection traffic light timing Proximal Policy Optimization (PPO) SUMO the traffic light scheme designed uses
optimization is used to improve the convergence the classic four-phase scheme and
speed of the model does not design multiple-phase
schemes for tidal traffic flow
Faizan Rasheed (2020) [28] To investigate the curse of dimen- (Multi-Agent Deep Q-Network) SUMO and Matlab to prioritize the experiences during
sionality under traffic network MADQN technique is used experience replay for faster learning,
scenarios with high traffic volume and take account of other kinds of
and disturbances traffic disturbances, such as traffic
collisions, that can increase the
queue length of vehicles significantly
Muhammad Saleem(2022) [30] To provide innovative services to A fusion-based intelligent traffic FITCCS-VN using The system accuracy may be improved
drivers that enable a view of traffic congestion control system for ML techniques more by using federated learning and
flow and the volume of vehicles VNs (FITCCS-VN) using ML Alexnet.
available on the road remotely, techniques that collect traffic data
intending to avoid traffic jams and route traffic on available routes
to alleviate traffic congestion
Multimedia Tools and Applications (2024) 83:13851–13872
Table 2 Comparison for related work
Author’s name and Ref.number Objective Techniques Simulation tools Drawback

Saeed Maadi(2022) [21] To develop a real-time RL RL technique for CAV (Con- PTV VISSIM microsimulation Offset optimization could be
(Reinforcement Learning)- nected and Automated Vehi- platform added to the signal timing
based adaptive traffic signal cles) is used optimization for reducing
control that optimizes a signal computational time for the RL
plan to minimize the total training.
queue length
Zahra Zeinaly(2023) [40] To develop a reliable controller Deep Q-learning technique with Deep-Q learning and SUMO It is not suitable in a complex
for such a high environment experience replay is used network and also for automated
and investigate the resilience vehicles
of these controllers to a variety
Multimedia Tools and Applications (2024) 83:13851–13872

of environmental disruptions,
such as accidents, weather
conditions, or special events
Alfonso Navarro-Espi- To predict traffic flow at an machine-learning (ML) and deep The Multilayer Perceptron Neu- It does not give much better result
noza(2022) [24] intersection learning (DL) algorithms are ral Network (MLP-NN) in a four-lane cross-section road
used for predicting traffic flow
at an intersection, thus laying
the groundwork for adaptive
traffic control, either by remote
control of traffic lights or by
applying an algorithm that
adjusts the timing according to
the predicted flow
13861

13
13862 Multimedia Tools and Applications (2024) 83:13851–13872

Table 3 Vehicle types and Class Vehicle type weight assigned

corresponding weight assigned
1 Lightweight vehicle 1.0
2 Moderate weight Vehicle 2.0
3 Heavyweight Vehicle 3.0
4 No Vehicle 0.0

4.1 Markov decision process

Markov decision-making is a mathematical model that helps to make better decisions based
on the environmental situation. It’s an environment to get an agent to the desired state. Now,
MDP (Markov Decision Process) constructs on a set of environment states, spanning over
some set of actions among which the agent has to decide to choose one, a reward function
which determines a reward over the input of specified states and action and lastly a transition
function which determines the change in the environment while taking a specified action on
specified state. Markov Property is only satisfied by a Markov decision process if the transi-
tion function T only depends on the current status s while taking an action a, in simple terms
the probability of transition from a state s to s’ has to be only depending upon the state s and
the action a. This can be mathematically shown as [32]:
P(st+1 |st , at , st−1 , at−1 , ....) = P(st+1 |st , at ) (1)
Mathematically a Markov Decision Process can be formulated as a four-tuple:
< S, A, R, T > where.

– S: stands for states of in the environment.

– A: stands for the action of the agent in that environment.
– R: stands for a reward of the function when s is the state, and a is the action.
– T: stands for transformation function, which produces the transformation of the state from
s to s’ depending upon the agent’s action with the state s as the state of the environment.

"The goal of the agent is to prioritize the short-term goal at first, compared to the long-
term goal. While time progresses, it has discounted the reward for every next step by a fac-
tor of 𝛾 . This can be represented by the following mathematical expression" [32].
∞
∑
Rt = 𝛾 k rt + k + 1 (2)
k=0

where 𝛾 is a discount factor such that 0 < 𝛾 ≤ 1, meaning that future rewards are discounted
exponentially.[?] "To align with the plan, the agent finds a policy 𝜋 which is a strategy to
select an action a with input as state r. Now there are two types of policies" [35]

– Stochastic: It is defined as a distribution of probabilities to actions pending on the state,

that is, 𝜋(s) is a probability distribution over action set a.
– Deterministic: In this policy, there is a one–one mapping that pairs an action and state.

"Now a V function, is defined as the expected return of reaching a state s’ from a state s
following a policy 𝜋 , mathematically V function can be expressed as" [32].

13
Multimedia Tools and Applications (2024) 83:13851–13872 13863

V 𝜋 (s) = 𝔼𝜋 [Rt |St = s] (3)

"Now to get an optimized action in a state, an agent requires a function which is same
as V but defined over states and actions, which is further defined as Q-value function
Q ∶ 𝕊x𝔸 → ℝ which calculates the expected value of action a in state s and following 𝜋
afterwards" [32]
Q𝜋 (s) = 𝔼𝜋 [Rt |St = s, at = a] (4)
The above equation can also be rewritten using dynamic programming [1] and can be
expressed as:
∞
∑
Q𝜋 (s) = 𝔼𝜋 [ 𝛾 k rt + k + 1|st = s, at = a] (5)
k=0

which then can be written as [32]:

Q𝜋 (s) = Σs� 𝜀S 𝜏ssa � [ℜass� + 𝛾V 𝜋 (s� )] (6)

Here the prospect of dynamic programming comes into the picture, which helps achieve
optimal policy 𝜋 , given that 𝜏 and ℜ are known. Value iteration is one of the planning algo-
rithms which finds the optimal approach.

4.1.1 Partial observability

In some cases, the agent cannot determine s’ given a state s and an action a but only could
observe the proxy for a form, which could be called an observation o. This is called partial
observability [11].

4.2 Tabular Q‑learning

When the states and actions are discrete, there is a one–one mapping between a state and
action. Every state-action pair has a so-called Q-value, which the agent/model chooses
every time it needs according to the policy 𝜋 . In traditional tabular Q-learning [3], there is
a look-up table that the model is using to maximize the reward function keeping in mind
the long-term reward aligning with the policy 𝜋 . Since the Q-value in the look-up table is
not available upfront, so the agent iteratively updates the Q-value estimates in the look-up
table using this, which converges after some samples:

Qt+1 (s, a) = Qt (s, a) + 𝛼[rt + 𝛾[max𝛼� Qt (s − t + 1, a� ∶ 𝜃t ] − Qt (s, a)] (7)

The above Eq. 7 is again derived from dynamic programming methods.

4.3 Q‑learning with function approximator

With the advancement in the domains of state and action, there are some scenarios where
a state-action pair cannot be asserted as in tabular Q-learning because states cannot be dis-
crete everywhere. Hence the problem of getting a model to know what exact action should

13
13864 Multimedia Tools and Applications (2024) 83:13851–13872

it choose for a given input state comes into the picture. In this case, the Q-value cannot be
determined through a look-up table.
Here function approximator helps to determine the function with the help of the learned
weight 𝜃 . The weights of the function approximator can be updated to converge to a spe-
cific value following a policy 𝜋 . The mean squared error between the current estimate and
the target is generally minimized, which is then defined as the true Q-value of the state-
action pair under policy Q𝜋.

4.3.1 Issues regarding convergence

There are some issues regarding the convergence of the function approximator for the use-
case, which are [35]:

Correlation between consecutive samples: While sampling in this kind of uncertain envi-
ronment, a lot of correlation occurred because of sampling the successive data input, which
minimizes the probability of the distribution of samples from being mutually independent.
Sampling data distribution: Since Qt is continuously changing, the sampling data
changes with every distribution, which leads to a sampling bias and hence needs to be
examined and solved. Since sampling data is biased and neither identical nor identically
distributed caused, the scope of training the agent is diminished and correlated.

4.3.2 Batch gradient descent

It takes the whole sampling data at once at every step and then iteratively optimizes the
weights according to the error. This makes it very slow to train the model.

4.3.3 Stochastic gradient descent

This solves the problem with batch gradient descent, where the model with the whole train-
ing data is trained. It calculates the gradient using a random training data instance at each
step, making it significantly quicker than batch gradient descent. Since training the model
has become very tedious and time taking, a new solution of optimization is proposed:

4.3.4 RMSProp

"The problem solved by Adagrad as adaptively tuning the learning rate per parameter. But
this diminishes the learning rate with time because of the growing sum with time" [35].

4.3.5 Back propagation

After using gradient descent, an update of the weights according to the errors is required.
It is a method to send the error back from the output layer. The chain rule calculates the
derivative of the error function w.r.t to the neural network weights.

4.4 Deep reinforcement learning

Deep reinforcement learning is the branch of machine learning where a deep neural net-
work coupled with reinforcement learning is used as its function approximator. In this

13
Multimedia Tools and Applications (2024) 83:13851–13872 13865

paper, a CNN network as a function approximator for the Q-learning model is tried to use,
making it a deep Q-Learning model [22]. Although some errors might be while training the
model using a neural network, the same mistakes come into play while coupling CNN as a
function approximator.

4.5 Solutions to convergence issues

Now the solutions which would help in solving convergence issues have been discussed
below:

4.5.1 Experience replay

Experience replay is used as a tuple (s, a, r, s’) in a replay memory 𝔻. There are two versions
where one store all experiences tuples and the other stores N number of transitions in a slid-
ing window. After each of the iterations, the agent takes a sample of a small batch out of the
replay memory D and further uses this mini-batch to update the weights of the value network
Qt. [35] Since the sampling data is being taken randomly from the batch, the problem of co-
relativity of the samples is solved, and the sampling bias problem is also solved. The next
thing that experience replay solves is that since it takes a batch at once and updates the Qt, it
makes the sampling data uniform too. The drawback of using experience replay is that if the
data changes over a particular pattern with time, then the agent would continuously update the
Qt, making the agent interpret wrong.

4.5.2 Target network

The problem of moving targets, i.e., rapidly changing states, has been discussed above. This
problem is solved by using a target network. There are two different networks in the target net-
work; one gets trained after every step, and the other gets trained based on the former network.

5 Proposed methodology

The main aim of this work is to classify the vehicles to reduce traffic congestion. Traffic
light management is the challenge to achieving urbanization in smart cities. All the problems
are discussed in Section 1. Our proposed methodology has been divided into five phases to
address all the issues.

5.1 States

The model takes one input value: a < Position > matrix, and the other input state that the
model captures < velocity > of the current intersection. Three classes of vehicles are light-
weight, moderate-weight, and heavy-weight vehicles. The position grid of every respective
vehicle is assigned. If a single grid of the position matrix does not contain any vehicle, it has
been given a 0 weight; similarly, 1, 2, and 3 weights are assigned for a light-weight, moderate-
weight, and heavy-weight vehicle. The table shows vehicle type and its corresponding weight
assigned in table ??. The velocity matrix of the vehicle at the intersection is a normalized vehi-
cle velocity which is Vcurrent ∕10.

13
13866 Multimedia Tools and Applications (2024) 83:13851–13872

5.2 Action

The agent certainly takes action according to the Q-learning algorithm. It is allowed for a
green light to enable a particular lane to pass and all the three leftover states to block, mak-
ing vehicles in that particular lane stop propagating. The agent, at a given time, only allows
one lane to pass. A vehicle has three ways to pass if standing in front of the green light
lane. This gives the agent get four sets of options to choose from.
A two-bit representation of this action is considered: 00, 01, 10, and 11 representing the
north lane’s green light, the west lane’s green light, the south lane’s green light, and the
east lane’s green light, respectively, and all other lanes to have a red light in all the configu-
rations possible.

5.3 Reward function

The reward function for the deep Q-learning algorithm has been discussed in this section.
The agent’s goal is to reduce the time it takes for all vehicles in the intersection to travel.
The traffic delay for vehicles waiting at the intersection can be calculated as ts—tmin. ts is
the time a vehicle takes to complete its journey from starting point to its destination, and
tmin, the amount of time it takes to complete a trip by vehicle Vmax.
Mathematically, the reward function can be represented as -
rt = ts − tmin (8)

5.4 Q‑learning

Signal control problems are solved using Q- learning. Because the state is not discrete in
the Q-learning algorithm, a function approximator is necessarily utilized in deep reinforce-
ment learning. The function approximator used is a Convolutional Neural Network [25]. If
the agent knows the optimal Q values for all the state-action pairs, then the only task for
the agent would be choosing the correct and optimal action policy 𝜋 ∗ action for any of the
occurring states of the traffic intersection. So to select the optimal Q∗ value, the following
dynamic programming-based recursive expression is used -

Q∗ (s, a) = 𝔼{Rt + 𝛾maxa� Q∗ (St+1 , a� )|St = s, At = a} (9)

A function approximator is used to approximate it. Since CNN is also used in this
research, so it is necessary to discuss CNN’s architecture -

5.4.1 Convolutional neural network

A grid of 60 × 60 and the first convolutional layer having 32 filters in CNN are used. Each
is 4 × 4 in size, and it strides for 2 × 2 throughout the first input layer separately and then
undergoes a ReLU (Rectified Linear Unit) function. Because the position and velocity of
a vehicle as states are taken in the agent’s input, a 60 × 60x2 grid is taken as an input state.
The second layer has 64 filters, and this time each filter has a size of 2 × 2 and moves for a
stride of 2 × 2. The output of the third convolutional layer has a 15 × 15x128 tensor which
gets transferred into a fully connected tensor of 128 × 1 matrix and undergoes a ReLU
function. This is separated into two components, one of which is used to compute the value

13
Multimedia Tools and Applications (2024) 83:13851–13872 13867

and the other to calculate the advantage. Here advantage means how well an action can
achieve while choosing that action above the different options available [15]. The convolu-
tion network parameters like no. of filters, grid size and layer numbers are provided in the
below Table 4 [15].

5.5 Architecture

The proposed deep-Q learning uses two additional advantages: experience replay and
the target network. Here M is the replay memory which stores observed experience
Et = {St , At , Rt , St+1 } into the replay memory M = {E1 , E2 , ...., En } which further gets
randomly sampled while training, this replay memory has a type value as a queue data
structure. Exactly 32 samples are taken randomly and used as a mini-batch for sampling
and learning the Q-network. To make the agent learn the Deep Neural Network (DNN)
parameters 𝜃 such that the best Q∗ (s, a) can be approximated with the help of the output
Q(s, a;𝜃), the agent must need training data. The input data has been retrieved (St , At ) from
the before-mentioned replay memory M because of no clue about Q∗ (St , At ) the input data
is estimated by using.

Rt + 𝛾maxa� Q∗ (St+1 , a� ;𝜃 � )|St = s, At = a (10)

where Q∗ (St+1 , a� ;𝜃 � ) is actually a result of another target network with the feature param-
eter 𝜃 ′ [6].
After collecting the data, the 𝜃 values are approximated by training the Deep Neural
Network to reduce the Mean Square Error (MSE):
m
1 ∑
MSE(𝜃) = {(Rt + 𝛾maxa� Q∗ (St+1 , a� ;𝜃 � )) − Q(St , At ;𝜃)}2 (11)
m t=1

where m stands for the size of the input data.

After updating 𝜃 values which are the deep neural network parameters, the target net-
work’s parameter are also updated, which is as follows:

𝜃 � = 𝛽𝜃 + (1 − 𝛽)𝜃 (12)
where 𝛽 is update rate is always << 1.

6 Implementation and result

In this section, we have discussed the implementation details and the results of the experi-
mentation of the proposed solution.

Table 4 CNN parameters and I/O No. of filters Grid Size Layers
corresponding layers
Input State 32 60 × 60x2 First Layer
Input State 64 60 × 60x2 Second layer
Output State 128 15 × 15x128 Third Layer

13
13868 Multimedia Tools and Applications (2024) 83:13851–13872

6.1 Implementation details

The traffic is simulated using SUMO, and the details of the simulation are as follows:

6.1.1 Intersection and traffic rules

The intersection of 4 ways with every road’s length of 500 m is simulated. Every cell has a
length of 8 m, and the limit for velocity is 40 kmph for each vehicle at the traffic intersec-
tion. Each vehicle’s length is 5 m, and every vehicle is separated by at least 2 m.
Every vehicle has three routes to choose from at an intersection. Once a lane gets a
green light, the vehicles of that particular intersection can opt for any of the three options
present to them, and all the vehicles in the other lanes have to stop and wait for the traffic
light of that lane to turn from red to green.

6.1.2 Traffic generation and distribution

For traffic generation, SUMO has a tool called random.py, which randomizes the traffic
as much as possible to match the real-time traffic. Some additional parameters are also
set for the traffic generation while using the tool randomly; the same Poisson process has
been followed but is tweaked. Different rates for the different kinds of vehicles have been
set to match the real-time traffic, for e.g., Plight_weight = 1/2, PModerate_weight = 4/5 and lastly
PHeavy_weight = 1/5.

6.1.3 Agent parameters and "‑greedy method

For N = 2000 episodes, the agent has been trained. Each episode equates to 0.25 h of traf-
fic. The 𝜀-greedy method is discussed in the next section; the value used for 𝜀 is 0.2 for all
episodes. The discount factor 𝛾 is 0.90, and the value of 𝛽 is to update the target network set
to 0.001, along with the capacity of the replay memory set to 100 episodes.
The agent has four options to choose from and which action it needs to give as input for
attaining an optimal result in the long run. It decides the best optimal action available at the
moment, called exploitation vs to try for something which is not the best action available
now. Still, it could give more optimized results called exploration in the long run. In the ini-
tial stages, the agent doesn’t care about the optimal action and explores more. After training
the model, the agent increases to choose the exploitative action. The equation is as follows-
h
𝜀=1− (13)
H
where 𝜀 is the tendency of the agent to choose explorative action, h shows the present-time num-
ber of episodes and H shows the whole number of episodes [39]. The above Table 5 shows the
parameters taken during the experiment and their corresponding values assumed at that time.

6.2 Results

In this section, the results collected after generating different classes of vehicles and then
applying the deep Q-learning algorithm are shown in the Fig. 11 below. The average

13
Multimedia Tools and Applications (2024) 83:13851–13872 13869

Table 5 The agent parameters Parameters taken Assumed values

and their corresponding assumed
values
No. of intersections 4
Length of each road 500 m
Vehicle length 5m
Velocity 40 kmph
Traffic light timing 10 s
N 2000 episodes
𝜀 0.2
capacity of replay memory 100 episodes
𝛾, 𝛽 0.90, 0.001

waiting time is calculated for Double Deuling Deep Q-Network, Deep Q-Network, Adap-
tive traffic light signal control (ATSC) and fixed time approach in the Fig. 10. The result
is compared with a traffic simulation where the same kind of traffic is generated, but the
lights follow the traditional round-robin method and stay green for 10 s for every lane. The
traffic rules are the same for this simulation too.

7 Conclusion and future scope

The resultant model got trained through deep double Q-learning using target network and
experience replay, and the average cumulative sum of waiting time for every vehicle has
minimized substantially. Hence, the traffic congestion is minimized, which is the primary
goal of this work. It is allowed for a green light to enable a particular lane to pass and all
the three leftover states to block, making vehicles in that specific lane stop propagating.
The agent, at a given time, only allows one lane to pass. A vehicle has three ways to pass
if standing in front of the green light lane. This provides the agent with four options to
choose from, i.e., the north lane’s green light, the west lane’s green light, the south lane’s

Fig. 11 Figure shows the cumu-

lative delay of the traffic while
training where the x-axis repre-
sents the no. of episodes and the
y-axis represents the average of
the sum of vehicle’s staying time
at the intersection in seconds

13
13870 Multimedia Tools and Applications (2024) 83:13851–13872

green light, and the east lane’s green light. Thus after learning, the agent finds a reward
function by calculating the vehicle’s waiting time at the intersections. The agent’s goal is to
reduce the time vehicle takes in the intersection. The simulation of this work uses SUMO
(Simulation in Urban MObility) with the data generated on SUMO using a random func-
tion. The results show that the traffic light of a certain traffic intersection becomes adap-
tive, aligning with the goals mentioned above. The proposed model efficiently reduces the
average waiting time up to 91.7% at the intersection points of the road which is shown in
the graph in the result section.
This methodology can also be scaled to a more significant extent and use multiple
agents that would synchronize among themselves and minimize the traffic of a specific area
through vector minimization techniques. Even though there can be a drastic drop-off in
the cumulative waiting time of the traffic intersection, we still are behind in deploying the
DSRCs into every vehicle available in the traffic network. With the rapid development of
communication protocols and technology, a mechanism must be designed to be adaptive,
readily available, and cheap in the future. Henceforth, it makes it more accessible to the
general population’s vehicles and could be deployed in more and more vehicles. We still
need to see the effects caused by the disruptions and congestion caused by unforeseeable
accidents and how the trained model deals with them. More research must be done on how
much the reinforcement learning algorithm depends upon time. A communication network
that is more reliable than DSRC has to be achieved.
Acknowledgements I am highly thankful to my co-author Mr. Nistala Venkata Kameshwer Sharma, for his
important contribution. After that, I thank my supervisor, Dr. Vijay K Chaurasiya, for guiding me in this
research work. At last, I am also thankful to Dr. Shishupal Kumar for his direction from time to time.

Data availability The data is generated through a random function on SUMO.

Code availability Custom code using SUMO.

Declarations
Conflicts of interest/Competing interests No conflict.

References
1. Bellman R, Kalaba R (1957) Dynamic programming and statistical communication theory. Proc Natl
Acad Sci USA 43(8):749
2. Casas N (2017) Deep deterministic policy gradient for urban traffic light control,” arXiv preprint
arXiv:1703.09035
3. Christopher J (1992) Watkins and peter dayan. Q-Learn Mach Learn 8(3):279–292
4. Coşkun M, Baggag A, Chawla S (2018) Deep reinforcement learning for traffic light optimization. In:
2018 IEEE International Conference on Data Mining Workshops (ICDMW). IEEE, 564–571
5. Farazi NP, Ahamed T, Barua L, Zou B (2020) Deep reinforcement learning and transportation
research: A comprehensive review. https://doi.org/10.48550/arXiv.2010.06187
6. Gao J, Shen Y, Liu J, Ito M, Shiratori N (2017) Adaptive traffic signal control: Deep reinforcement
learning algorithm with experience replay and target network. arXiv preprint arXiv:1705.02755
7. Garg D, Chli M, Vogiatzis G (2018) Deep reinforcement learning for autonomous traffic light con-
trol. In: 2018 3rd IEEE international conference on intelligent transportation engineering (ICITE),
IEEE, Singapore, pp 214–218. https://doi.org/10.1109/ICITE.2018.8492537
8. Genders W, Razavi S (2016) Using a deep reinforcement learning agent for traffic signal con-
trol. https://doi.org/10.48550/arXiv.1611.01142

13
Multimedia Tools and Applications (2024) 83:13851–13872 13871

9. Gong Y, Abdel-Aty M, Cai Q, Rahman MS (2019) Decentralized network level adaptive signal control
by multi-agent deep reinforcement learning. Transp Res Interdiscip Perspect 1:100020
10. Hu X, Zhao C, Wang G (2020) A traffic light dynamic control algorithm with deep reinforcement
learning based on GNN Prediction. https://doi.org/10.48550/arXiv.2009.14627
11. Kaelbling LP, Littman ML, Cassandra AR (1998) Planning and acting in partially observable stochas-
tic domains. Artif Intell 101(1–2):99–134
12. Kumar N, Rahman SS, Dhakad N (2020) Fuzzy inference enabled deep reinforcement learning-based
traffic light control for intelligent transportation system. IEEE Trans IntellTransp Syst
13. Li C, Ma X, Xia L, Zhao Q, Yang J (2020) Fairness control of traffic light via deep reinforcement
learning. In: 2020 IEEE 16th International Conference on Automation Science and Engineering
(CASE). IEEE, Hong Kong, China, pp 652–658. https://doi.org/10.1109/CASE48305.2020.9216899
14. Li L, Lv Y, Wang F-Y (2016) Traffic signal timing via deep reinforcement learning. IEEE/CAA J
Automat Sin 3(3):247–254
15. Liang X (2019) Applied deep learning in intelligent transportation systems and embedding explora-
tion, Ph.D. thesis, New Jersey Institute of Technology
16. Liang X, Du X, Wang G, Han Z (2018) Deep reinforcement learning for traffic light control in vehicu-
lar networks. arXiv preprint arXiv:1803.11115
17. Liang X, Yan T, Lee J, Wang G (2018) A distributed intersection management protocol for safety, effi-
ciency, and driver’s comfort. IEEE Internet Things J 5(3):1924–1935
18. Liang X, Du X, Wang G, Han Z (2019) A deep reinforcement learning network for traffic light cycle
control. IEEE Trans Veh Technol 68(2):1243–1253
19. Ma Z, Cui T, Deng W, Jiang F, Zhang L (2021) Adaptive optimization of traffic signal timing via deep
reinforcement learning. J Adv Transp 2021
20. Ma D, Zhou B, Song X, Dai H (2021) A deep reinforcement learning approach to traffic signal control
with temporal traffic pattern mining. IEEE Trans Intell Transp Syst
21. Maadi S, Stein S, Hong J, Murray-Smith R (2022) Real-time adaptive traffic signal control in a con-
nected and automated vehicle environment: optimisation of signal planning with reinforcement learn-
ing under vehicle speed guidance. Sensors 22(19):7501
22. Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fid-
jeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature
518(7540):529–533
23. Mousavi SS, Schukat M, Howley E (2017) Traffic light control using deep policy‐gradient and value‐
function‐based reinforcement learning. IET Intel Transport Syst 11(7):417–423
24. Navarro-Espinoza A, López-Bonilla OR, García-Guerrero EE, Tlelo-Cuautle E, López-Mancilla D,
Hernández-Mejía C, Inzunza-González E (2022) Traffic flow prediction for smart traffic lights using
machine learning algorithms. Technologies 10(1):5
25. Pang H, Gao W (2019) Deep Deterministic policy gradient for traffic signal control of single inter-
section. In: 2019 Chinese Control And Decision Conference (CCDC), Nanchang, China, pp 5861–
5866. https://doi.org/10.1109/CCDC.2019.8832406
26. Prosper HB (2017) Deep learning and Bayesian methods. EPJ Web Conf 137:9. https://doi.org/10.
1051/epjconf/201713711007
27. Raeisi M, Mahboob AS (2021) Intelligent control of urban intersection traffic light based on reinforce-
ment learning algorithm. In: 2021 26th International Computer Conference, Computer Society of Iran
(CSICC). IEEE, 1–5
28. Rasheed F, Yau K-LA, Low Y-C (2020) Deep reinforcement learning for traffic signal control under
disturbances: A case study on Sunway city, Malaysia. Futur Gener Comput Syst 109:431–445
29. Sahu SP, Dewangan DK, Agrawal A, Priyanka TS (2021) Traffic light cycle control using deep rein-
forcement technique. In: 2021 International Conference on Artificial Intelligence and Smart Systems
(ICAIS). IEEE, Coimbatore, India, pp 697–702. https://doi.org/10.1109/ICAIS50930.2021.9395880
30. Saleem M, Abbas S, Ghazal TM, Khan MA, Sahawneh N, Ahmad M (2022) Smart cities: Fusion-
based intelligent traffic congestion control system for vehicular networks using machine learning tech-
niques. Egypt Inf J 23(3):417–426
31. Schneider C (2020) Intelligent signalized intersection management for mixed traffic using Deep
Q-Learning. Not applicable
32. Sutton RS, Barto AG et al (1998) Introduction to reinforcement learning, volume 135. MIT press
Cambridge
33. Tan KL, Poddar S, Sarkar S, Sharma A (2019) Deep reinforcement learning for adaptive traffic signal
control. In: Dynamic Systems and Control Conference, volume 59162. American Society of Mechani-
cal Engineers, V003T18A006

13
13872 Multimedia Tools and Applications (2024) 83:13851–13872

34. Tong W, Hussain A, Bo WX, Maharjan S (2019) Artificial Intelligence for Vehicle-to-Everything: A
Survey. IEEE Access 7:10823–10843. https://doi.org/10.1109/ACCESS.2019.2891073
35. van der Pol E (2016) Deep reinforcement learning for coordination in traffic light control. Master’s
thesis, University of Amsterdam
36. Wei H, Zheng G, Yao H, Li Z (2018) Intellilight: A reinforcement learning approach for intelligent
traffic light control. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowl-
edge Discovery & Data Mining, 2496–2505
37. Wu T, Zhou P, Liu K, Yuan Y, Wang X, Huang H, Wu DO (2020) Multi-agent deep reinforcement
learning for urban traffic light control in vehicular networks. IEEE Trans Veh Technol 69(8):8243–8256
38. Yu B, Guo J, Zhao Q, Li J, Rao W (2020) Smarter and safer traffic signal controlling via deep rein-
forcement learning. In: Proceedings of the 29th ACM International Conference on Information &
Knowledge Management, 3345–3348
39. Yuan X (2021) Faster Finding of Optimal Path in Robotics Playground Using Q-Learning with
“Exploitation-Exploration Trade-Off”. J Phys Conf Ser 1748(2):022008
40. Zeinaly Z, Sojoodi M, Bolouki S (2023) A resilient intelligent traffic signal control scheme for acci-
dent scenario at intersections via deep reinforcement learning. Sustainability 15(2):1329
41. Zhang R, Ishikawa A, Wang W, Striner B, Tonguz OK (2020) Using reinforcement learning with par-
tial vehicle detection for intelligent traffic signal control. IEEE Trans Intell Transp Syst 22(1):404–415

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under
a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted
manuscript version of this article is solely governed by the terms of such publishing agreement and applicable
law.

Traffic Light Control with Reinforcement Learning
No ratings yet
Traffic Light Control with Reinforcement Learning
8 pages
Ieee 2022 DQL
No ratings yet
Ieee 2022 DQL
6 pages
IEEE 2022 DQL Intersection
No ratings yet
IEEE 2022 DQL Intersection
5 pages
Deep Reinforcement Learning For Traffic Signal Control A Review - 2020
No ratings yet
Deep Reinforcement Learning For Traffic Signal Control A Review - 2020
29 pages
10 1109@tits 2020 2984033
No ratings yet
10 1109@tits 2020 2984033
10 pages
Sensors 22 08732 v3
No ratings yet
Sensors 22 08732 v3
21 pages
Researchbased Synopsis 2
No ratings yet
Researchbased Synopsis 2
3 pages
Wjaets 2024 0437
No ratings yet
Wjaets 2024 0437
6 pages
Distributed Traffic Light Control at Uncoupled Intersections With Real-World Topology by Deep Reinforcement Learning
No ratings yet
Distributed Traffic Light Control at Uncoupled Intersections With Real-World Topology by Deep Reinforcement Learning
9 pages
Deep RL for Autonomous Vehicles at Intersections
No ratings yet
Deep RL for Autonomous Vehicles at Intersections
19 pages
Electronics: Optimization Control of Adaptive Traffic Signal With Deep Reinforcement Learning
No ratings yet
Electronics: Optimization Control of Adaptive Traffic Signal With Deep Reinforcement Learning
20 pages
IEEE DQL Regional Network
No ratings yet
IEEE DQL Regional Network
5 pages
Comparative Study of Reinforcement Learning Algorithms On Traffic Light Control System
No ratings yet
Comparative Study of Reinforcement Learning Algorithms On Traffic Light Control System
13 pages
AI-Based Adaptive Traffic Signal Control For Congestion Mitigation
No ratings yet
AI-Based Adaptive Traffic Signal Control For Congestion Mitigation
7 pages
Trafiq Smart Traffic Signal Optimizatiion
No ratings yet
Trafiq Smart Traffic Signal Optimizatiion
38 pages
Improving Traffic Light Systems Using Deep Q-Networks
No ratings yet
Improving Traffic Light Systems Using Deep Q-Networks
13 pages
Simulation of Intelligent Traffic Control For Autonomous Vehicles
No ratings yet
Simulation of Intelligent Traffic Control For Autonomous Vehicles
7 pages
Electronics 10 02363 v2
No ratings yet
Electronics 10 02363 v2
32 pages
Pone 0298417
No ratings yet
Pone 0298417
22 pages
Xedig, 2 - 116-Louw - Kopie
No ratings yet
Xedig, 2 - 116-Louw - Kopie
29 pages
Implementation
No ratings yet
Implementation
11 pages
Enhancing Traffic Flow Through Multi-Agent Reinforcement Learning For Adaptive Traffic Light Duration Control
No ratings yet
Enhancing Traffic Flow Through Multi-Agent Reinforcement Learning For Adaptive Traffic Light Duration Control
16 pages
Literature Review
No ratings yet
Literature Review
3 pages
An Adaptive Traffic Light Control System Based On Artificial 1a9i8uk03y
No ratings yet
An Adaptive Traffic Light Control System Based On Artificial 1a9i8uk03y
12 pages
Intelligent Traffic Light Control
No ratings yet
Intelligent Traffic Light Control
3 pages
Infrastructures 10 00114
No ratings yet
Infrastructures 10 00114
41 pages
Sensors 24 03987 v2
No ratings yet
Sensors 24 03987 v2
19 pages
Overview of Traffic Light Control
No ratings yet
Overview of Traffic Light Control
6 pages
Synopsis
No ratings yet
Synopsis
4 pages
1 s2.0 S1877050923005719 Main
No ratings yet
1 s2.0 S1877050923005719 Main
8 pages
Deep Reinforcement Learning Based Approach For Tra C Signal Control Deep Reinforcement Learning Based Approach For Tra C Signal Control
No ratings yet
Deep Reinforcement Learning Based Approach For Tra C Signal Control Deep Reinforcement Learning Based Approach For Tra C Signal Control
8 pages
IEEE 2022 Coordination Minimizing Pressure Difference
No ratings yet
IEEE 2022 Coordination Minimizing Pressure Difference
6 pages
SIH2024
No ratings yet
SIH2024
6 pages
Machine Learning Technque Research Paper
No ratings yet
Machine Learning Technque Research Paper
11 pages
Iosb Ina Traffic Lights Controlled Using Artificial Intelligence
No ratings yet
Iosb Ina Traffic Lights Controlled Using Artificial Intelligence
4 pages
Adaptive Traffic Signaling Control Using SUMO Simulator
No ratings yet
Adaptive Traffic Signaling Control Using SUMO Simulator
13 pages
Ieee 2022 I DQL
No ratings yet
Ieee 2022 I DQL
6 pages
A Survey of Reinforcement and Deep Reinforcement Learning For Coordination in Intelligent Traffic Light Control
No ratings yet
A Survey of Reinforcement and Deep Reinforcement Learning For Coordination in Intelligent Traffic Light Control
15 pages
Intelligent Traffic Signal Automation Based On Computer Vision Techniques Using
No ratings yet
Intelligent Traffic Signal Automation Based On Computer Vision Techniques Using
6 pages
Deep Reinforcement Learning Algorithm With Experience Replay and Target Network
No ratings yet
Deep Reinforcement Learning Algorithm With Experience Replay and Target Network
10 pages
Adaptive Real Time Traffic Prediction Using Deep Neural Networks
No ratings yet
Adaptive Real Time Traffic Prediction Using Deep Neural Networks
13 pages
AI-Based Smart Traffic Light Control
No ratings yet
AI-Based Smart Traffic Light Control
28 pages
Survey:: Control Using Deep Learning
No ratings yet
Survey:: Control Using Deep Learning
5 pages
AI-Driven Traffic Light Optimization
No ratings yet
AI-Driven Traffic Light Optimization
8 pages
May 2024 - 03-1
No ratings yet
May 2024 - 03-1
1 page
Using A Deep Reinforcement Learning Agent For Traffic Signal Control
No ratings yet
Using A Deep Reinforcement Learning Agent For Traffic Signal Control
9 pages
Journal 1
No ratings yet
Journal 1
3 pages
Adaptability and Sustainability of Machine Learning Approaches To Traffic Signal Control
No ratings yet
Adaptability and Sustainability of Machine Learning Approaches To Traffic Signal Control
12 pages
Sharma 2021 IOP Conf. Ser.: Mater. Sci. Eng. 1022 012122
No ratings yet
Sharma 2021 IOP Conf. Ser.: Mater. Sci. Eng. 1022 012122
17 pages
Smart Traffic Solutions
No ratings yet
Smart Traffic Solutions
20 pages
Rewrite of Naret Layne - Draft Senior Thesis
No ratings yet
Rewrite of Naret Layne - Draft Senior Thesis
14 pages
Cyber-Physical System For Smart Traffic Light Cont
No ratings yet
Cyber-Physical System For Smart Traffic Light Cont
25 pages
Ieee 2023 DQL
No ratings yet
Ieee 2023 DQL
6 pages
Paper 3 AI & ML Based
No ratings yet
Paper 3 AI & ML Based
6 pages
RL Traffic Signal Control with Railway Data
No ratings yet
RL Traffic Signal Control with Railway Data
12 pages
IEEE
No ratings yet
IEEE
3 pages
DFOT 2025 Technolympics Guidelines
No ratings yet
DFOT 2025 Technolympics Guidelines
20 pages
P0405-96 P0406-96 Taken From 2KD Manual - To Be Checked
No ratings yet
P0405-96 P0406-96 Taken From 2KD Manual - To Be Checked
4 pages
Samsung MX f850 Manual Do Utilizador
No ratings yet
Samsung MX f850 Manual Do Utilizador
16 pages
Sampling and Quantization
No ratings yet
Sampling and Quantization
7 pages
Megger DLRO200 Digital Microhmmeter
No ratings yet
Megger DLRO200 Digital Microhmmeter
3 pages
WinMax Mill Documentation PDF
No ratings yet
WinMax Mill Documentation PDF
490 pages
4.1 Fuzzy Inference Systems (Mamdani) : Figure 4-1
No ratings yet
4.1 Fuzzy Inference Systems (Mamdani) : Figure 4-1
7 pages
HPE - A00028947en - Us - ArubaOS-Switch and ArubaOS-CX Transceiver Guide (Edition 10)
No ratings yet
HPE - A00028947en - Us - ArubaOS-Switch and ArubaOS-CX Transceiver Guide (Edition 10)
98 pages
2nd Venn Diagrams - Verbal Reasoning Questions and Answers Page 5
No ratings yet
2nd Venn Diagrams - Verbal Reasoning Questions and Answers Page 5
3 pages
Shindengen F072-15
No ratings yet
Shindengen F072-15
47 pages
TCAS 65 Portfolio Template Guide
No ratings yet
TCAS 65 Portfolio Template Guide
8 pages
Electrical System PDF
No ratings yet
Electrical System PDF
95 pages
Product Data Sheet: Motor-Mechanism - MT250 - 110..130 V AC 50/60Hz
No ratings yet
Product Data Sheet: Motor-Mechanism - MT250 - 110..130 V AC 50/60Hz
1 page
Deep Learning - IIT Ropar - Unit 11 - Week 8
No ratings yet
Deep Learning - IIT Ropar - Unit 11 - Week 8
4 pages
Mod Menu Log - Com - TSH014.bag - Fight.stickman - Shadow.hero - Puzzle
No ratings yet
Mod Menu Log - Com - TSH014.bag - Fight.stickman - Shadow.hero - Puzzle
114 pages
Denver MAXX: Downloaded From Manuals Search Engine
No ratings yet
Denver MAXX: Downloaded From Manuals Search Engine
20 pages
Computer Aided Design (CAD) : 7CCSMCAD
No ratings yet
Computer Aided Design (CAD) : 7CCSMCAD
2 pages
MCU Flashloader Reference Manual
No ratings yet
MCU Flashloader Reference Manual
87 pages
TOGAF Training Overview & Case Study
No ratings yet
TOGAF Training Overview & Case Study
4 pages
Fixed End Moments PDF
No ratings yet
Fixed End Moments PDF
1 page
AWS Cloud Engineer Resume: Vinay Kumar
No ratings yet
AWS Cloud Engineer Resume: Vinay Kumar
1 page
Description Features: Ltc3588-1 Nanopower Energy Harvesting Power Supply
No ratings yet
Description Features: Ltc3588-1 Nanopower Energy Harvesting Power Supply
20 pages
Syllabus ME02000361
No ratings yet
Syllabus ME02000361
4 pages
Ronak Malik's Tech Resume Overview
No ratings yet
Ronak Malik's Tech Resume Overview
1 page
Mx2-Series V1 Type: Ethercat
No ratings yet
Mx2-Series V1 Type: Ethercat
48 pages
Fitbit Wearables Product Guide
No ratings yet
Fitbit Wearables Product Guide
11 pages
Impact - of - AI - On - Startups - Report (2) GHHH
No ratings yet
Impact - of - AI - On - Startups - Report (2) GHHH
30 pages
Module 3 - Data Warehousing
No ratings yet
Module 3 - Data Warehousing
3 pages
Modifications in Modern X-Ray Tubes
100% (6)
Modifications in Modern X-Ray Tubes
11 pages
Exercise 3 - Account Titles Classification
No ratings yet
Exercise 3 - Account Titles Classification
2 pages