Deep Reinforcement Learning for Traffic Signal Control with Consistent
State and Reward Design Approach
1. How they have managed the traffic system
They proposed a Deep Reinforcement Learning (DRL) framework using a Double
Deep Q-Network (DDQN) with Priori�zed Experience Replay (PER) to manage
traffic signals. The core idea is to design consistent and simple defini�ons for
both state and reward, making the agent learn op�mal policies quickly and
effec�vely.
2. Gap analysis
• Exis�ng Issues: Prior works o�en used hand-cra�ed or inconsistent state
and reward designs, which harmed convergence and real-world
applicability.
• Gap Filled: This paper proposes three consistent state-reward pairs,
designed to directly reflect and op�mize traffic metrics like vehicle count,
queue length, and wai�ng �me.
3. Methodology used
• Reinforcement Learning (RL) using DDQN with PER.
• Three state-reward design approaches:
1. Number of vehicles (State) ↔ Vehicle count (Reward)
2. Queue length (State) ↔ Queue length (Reward)
3. Wai�ng �me (State) ↔ Wai�ng �me (Reward)
• Penalty mechanism for subop�mal ac�ons to improve learning.
• Abla�on studies to evaluate the role of components like PER and DDQN.
4. How they collected the dataset
• Synthe�c traffic flow data was generated using SUMO (Simula�on of
Urban Mobility).
• Flows were based on Weibull and Normal distribu�ons to simulate high
and low traffic densi�es.
• Each simula�on had origin-des�na�on rou�ng with realis�c parameters
(e.g., speed, turn ra�os).
5. Research area
• Urban traffic signal control under Intelligent Transporta�on Systems (ITS).
• Focused on dynamic and adap�ve op�miza�on using AI/ML techniques.
6. Did they use simula�on to generate data?
Yes, the SUMO simulator was used to:
• Create a four-way intersec�on environment.
• Generate and manage vehicular flows.
• Evaluate agent performance in various scenarios.
7. Results (Final outputs from research)
• The proposed CSRD agents (QSR, NSR, WSR) outperformed:
o Tradi�onal Fixed-Time controls
o Benchmarks like DTSE, PressLight, and LIT
• NSR (Number of vehicles) achieved the best performance across all
metrics.
• Metrics improved:
o Average Travel Time (ATT)
o Queue Length (QL)
o Wai�ng Time (WT)
8. Limita�ons (from conclusion)
• Results are based on synthe�c simula�ons—not tested on real-world traffic
systems.
• Only considered single intersec�ons, not mul�-intersec�on networks.
• State and reward design assumes accurate sensor data, which may not
always be available.
9. Future implementa�on plan (from conclusion)
• Extend the framework to mul�-intersec�on traffic networks.
• Integrate real-world traffic data for training and valida�on.
• Explore hardware deployment using edge compu�ng and real-�me sensor
integra�on.
Deep Reinforcement Q-Learning for Intelligent Traffic Signal Control with Par�al
Detec�on
1. How they managed the traffic system
• They developed a Deep Q-Learning (DQN) agent for traffic signal control at
a single intersec�on.
• The agent makes decisions based on par�ally observed data from
connected vehicles (CVs) using image-like state representa�ons (par�al
DTSE).
• The goal is to minimize total squared vehicle delay by selec�ng traffic signal
phases that adapt to real-�me traffic condi�ons.
2. Gap Analysis
• Exis�ng works assume full vehicle detec�on, which is unrealis�c.
• Few models are designed for low CV penetra�on rates.
• No standard exists for state representa�ons or reward func�ons in DQN for
TSC.
• Lack of real-world viability due to expensive infrastructure and
reproducibility issues in RL.
3. Which methodology they used
• Dueling Double Deep Q-Network (3DQN).
• The model was tested using SUMO (Simula�on of Urban Mobility) on three
scenarios with increasing complexity.
• The agent selects signal phases based on microscopic data from CVs only.
4. How they collected the dataset
• Data was synthe�cally generated using SUMO:
o Random traffic flows (Poisson distribu�on).
o Varied CV penetra�on rates (0–100%).
o Simulated across 3600-second episodes with different intersec�on
designs.
5. Research Area
• Falls under Intelligent Transporta�on Systems (ITS).
• Subfield: Adap�ve Traffic Signal Control using Deep Reinforcement
Learning (DRL).
• Emphasis on par�ally observable environments and low-cost
implementa�on with CVs.
6. Did they use simula�on to generate data?
• Yes. All data was generated in SUMO, a widely used microscopic traffic
simula�on tool.
7. Results / Final Outputs
• Outperformed tradi�onal algorithms (Max Pressure, SOTL) in scenarios
with 4-phase programs.
• Showed robustness to diverse traffic condi�ons.
• Effec�ve at 20% CV penetra�on rate (acceptable), with op�mal
performance at ≥40%.
• Introduced a fairness-aware reward to avoid favoring heavily used lanes
only.
8. Limita�ons (From Conclusion)
• Assumes perfect data from CVs, which isn't realis�c.
• Trained separately for each scenario — lacks generalizability.
• Not op�mized for low CV rates (<20%) — model performance drops
significantly.
• Only tested on synthe�c data — no real-world valida�on yet.
9. Future Implementa�on Plan
• Improve robustness under imperfect CV data using probabilis�c methods.
• Generalize model across various intersec�on types using techniques like
zero-padding.
• Redesign reward to enable real-world learning a�er deployment using CV-
only data.
• Explore mul�-agent coordina�on for city-wide traffic networks.
• Validate using realis�c datasets, e.g., Luxembourg SUMO Traffic (LuST).
10. Comparison Between Methodologies (Current vs. Previous)
Previous Methods (Max This Research (3DQN with Par�al
Aspect
Pressure, SOTL) Detec�on)
Detec�on Full detec�on needed Par�al detec�on from CVs (low-
Requirement (expensive) cost, prac�cal)
Learns from experience (adap�ve
Adap�vity Rule-based (sta�c logic)
& op�mal)
Superior in complex, mul�-phase
Performance Good in simple scenarios
intersec�ons
Fairness Designed reward promotes
None
Considera�on fairness among all direc�ons
Deployment Promising, but requires real-world
Already in use
Readiness tes�ng and tuning
11. Main Objec�ve: New Idea Genera�on
You can explore the following new ideas based on this work:
• Hybrid reward design: Combine CV-only data reward with es�ma�ons of
non-CV behavior.
• Federated learning for intersec�ons: Enable distributed learning without
centralized data sharing.
• Transfer learning: Train in simula�on, adapt model to real-world
intersec�ons with minimal tuning.
• Incorpora�ng weather/pedestrian data: Add more environmental factors
for improved realism.
• Integra�on with vehicle rou�ng apps: Use real-�me rou�ng info to predict
near-future inflows.
A Reinforcement Learning Approach for Reducing Traffic Conges�on Using Deep
Q Learning
1. How they managed the traffic system
They used a Deep Q-Learning (DQL) model to control traffic signals at
intersec�ons dynamically. Their system:
• Focused on op�mizing the queue length and rewards.
• Trained an RL agent in a simulated environment to select the best traffic
signal ac�on based on real-�me traffic data.
• Integrated state, ac�on, and reward logic to adjust signals adap�vely
depending on vehicle flow and conges�on.
� 2. Gap analysis
• Tradi�onal systems (fixed-�me or sta�c signal controls) fail to adapt in real-
�me.
• Earlier reinforcement learning approaches suffered from:
o Large state-space issues
o Limited adaptability to dynamic environments
o Inadequate handling of vehicle behaviors
• This paper addresses these by combining Deep Q Networks (DQN) with
hyperparameter tuning, improving efficiency and learning from sparse,
dynamic inputs.
� 3. Methodology used
• Deep Reinforcement Learning (DRL) approach using Deep Q-Network
(DQN)
• Components:
o State: Vehicle posi�ons, veloci�es, distances.
o Ac�ons: Signal control (e.g., North-South, East-West green lights).
o Rewards: Feedback based on queue length reduc�on and traffic flow
improvement.
• Simula�on over 30 episodes, 240 steps per episode, 1000 vehicles, and
training over 800 epochs.
� 4. How they collected the dataset
They used two XML datasets:
• Dataset 1: Environmental data (Vehicle ID, route, speed, etc.)
• Dataset 2: Route data (edge ID, lane ID, shape, etc.) These datasets were
merged to simulate traffic at a junc�on and train the RL agent.
� 5. Research area
Focused on intersec�on-based urban traffic management within the domain of:
• Smart ci�es
• Urban sustainability
• Intelligent Traffic Systems (ITS)
• Adap�ve Traffic Signal Control (ATSC)
� 6. Did they use simula�on to generate data?
Yes.
• They simulated the environment using SUMO-like structured intersec�ons
with vehicle flow and signal dynamics.
• Tes�ng involved ar�ficially generated vehicle traffic (1000 cars) and
simulated ac�ons over 240 �me steps.
� 7. Final results (outputs of the research)
• Queue length reduc�on: 49%
• Rewards (incen�ves) increased: 9%
• Training queue length: 852 → Tes�ng queue length: 418
• Training reward: −944992 → Tes�ng reward: −8520
This shows significant performance improvement during tes�ng a�er
training.
� 8. Limita�ons (from conclusion/discussion)
• High computa�onal cost due to large state-ac�on space in DQL.
• Limited to simulated environments, not yet applied in real-world systems.
• S�ll relies on sta�c vehicle flow assump�ons—real traffic is more chao�c.
� 9. Future implementa�on plans
• Integrate real-�me traffic data via internet connec�vity and sensors.
• Implement dimensionality reduc�on to decrease computa�onal
complexity.
• Adopt distributed compu�ng and parallel processing to scale the model for
large ci�es.
• Use experience replay and efficient architectures to stabilize learning in
complex environments.
� 10. Comparison between current and previous methods
Criteria Previous Methods Proposed DQL Method
Fixed/heuris�c or
Signal Control Adap�ve and real-�me via RL
tradi�onal
Queue Reduc�on Limited (14–30%) Significant (49%)
Reward
Rarely focused Explicitly op�mized
Op�miza�on
Challenging for large Managed using hyperparameter
Scalability
states tuning
Criteria Previous Methods Proposed DQL Method
Fully dynamic with con�nuous
Learning Capability Sta�c or semi-dynamic
updates
Techniques Used Gene�c, Fuzzy, MARL Deep Q-Learning with tuning
� Main Objec�ve: New Idea Genera�on
This research lays the groundwork for further innova�ons like:
• Real-�me adap�ve signal control using cloud-connected sensors.
• Mul�-agent systems for city-wide traffic op�miza�on.
• Hybrid approaches combining DQL with graph neural networks or
computer vision (camera feeds).
• Personalized traffic management for emergency vehicles or public transit