Report Updt
Report Updt
A PROJECT REPORT
Submitted by
BACHELOR OF TECHNOLOGY
in
of
May 2025
SRM INSTITUTE OF SCIENCE AND TECHNOLOGY
(Deemed to be University U/S 3 of UGC Act, 1956)
BONAFIDE CERTIFICATE
SIGNATURE SIGNATURE
ii
SRM INSTITUTE OF SCIENCE AND TECHNOLOGY
RAMAPURAM, CHENNAI - 89
DECLARATION
We hereby declare that the entire work contained in this project report titled
“SPATIOTEMPORAL PREDICTION OF TRAFFIC ACCIDENT
HOTSPOT USING AI-POWERED ANALYTICS” has been carried out
by ANSHUMAN JANA [REG NO: RA2111003020104], EESWARA
KOSIREDDI [REG NO: RA2111003020094], S P SHARVESH [REG
NO: RA2111003020080] at SRM Institute of Science and Technology,
Ramapuram, Chennai- 600089, under the guidance of Mr. Santhosh
Kumar C , Assistant Professor, Department of Computer Science and
Engineering.
Place: Chennai
Date: ANSHUMAN JANA
EESWARA KOSIREDDI
S P SHARVESH
iii
Own Work Declaration
DECLARATION:
We are aware of and understand the University’s policy on Academic misconduct and plagiarism
and we certify that this assessment is our own work, except where indicated by referring, and that
we have followed the good academic practices noted above.
iv
ACKNOWLEDGEMENT
v
ABSTRACT
Traffic accidents pose a critical challenge to urban mobility and safety. This research develops a
take proactive measures. The study employs classification models such as Random Forest, KNN,
ANN, and Naïve Bayes, selecting the best-performing model through evaluation metrics.
ADASYN is used for data balancing, while PCA optimizes feature selection, enhancing accuracy
and computational efficiency. The system provides a real-time accident risk assessment, aiding
urban planners, emergency responders, and traffic authorities in decision-making. With AI-
powered analytics, this research enhances predictive accuracy, contributing to safer roads and
vi
TABLE OF CONTENTS
Page. No
ABSTRACT vi
LIST OF FIGURES x
LIST OF TABLES xi
1 INTRODUCTION 1
1.1 Problem Statement 1
1.2 Aim of the Project 2
1.3 Project Domain 2
1.4 Scope of the Project 2
1.5 Methodology 3
1.6 Organization of the Report 3
2 LITERATURE REVIEW 5
3 PROJECT DESCRIPTION 9
3.1 Existing System 9
3.2 Proposed System 9
3.2.1 Advantages 10
3.3 Feasibility Study 10
3.3.1 Economic Feasibility 10
3.3.2 Operational Feasibility 10
vii
3.3.3 Technical Feasibility 11
3.3.4 Social Feasibility 11
3.4 System Specification 11
3.4.1 Hardware Specification 11
3.4.2 Software Specification 11
4 PROPOSED WORK 13
4.1 General Architecture 13
4.2 Design Phase 25
4.2.1 Data Flow Diagram 26
4.2.2 UML Diagram 27
4.2.3 Use Case Diagram 28
4.2.4 Sequence Diagram 28
4.3 Module Description 29
4.3.1 Module 1: Data Collection Module 29
4.3.2 Module 2: Data Preprocessing Module 29
4.3.3 Module 3: Machine Learning Model Training Module 30
4.3.4 Step 2: Processing of Data 30
4.3.5 Step 3: Split the Data 31
4.3.6 Dataset Sample 31
4.3.7 Step 4: Building the Model 32
4.3.8 Step 5: Compiling and Training the Model 33
viii
5.2 Testing 36
5.2.1 Types of Testing 36
5.2.2 Unit testing 36
5.2.3 Integration testing 37
5.2.4 Functional testing 38
5.2.5 Test Result 38
References 46
A. Sample screenshots
ix
LIST OF FIGURES
8 Sample Code 42
x
LIST OF TABLES
xi
LIST OF ACRONYMS AND ABBREVIATIONS
xii
CHAPTER 1
INTRODUCTION
As cities continue to grow and roads become busier, the risk of traffic accidents rises sharply. These
accidents don’t just lead to tragic loss of life or injury—they also create traffic jams, increase economic
burdens, and make everyday commuting more stressful. In today’s fast-paced urban environments,
ensuring road safety has become more important than ever.
For years, authorities have relied on traditional methods to analyze accident data, often using static
models and past records. While helpful, these approaches can fall short when it comes to adapting to the
fast-changing and complex nature of urban traffic. That’s where machine learning comes in—offering
the power to process large amounts of data, recognize patterns we might miss in real time.
This study introduces a machine learning-based system designed to spot accident-prone areas in cities
before incidents occur. By working with a mix of classification models—including Random Forest, K-
Nearest Neighbors (KNN), Artificial Neural Networks (ANN), and Naïve Bayes—we’re able to compare
their performance and choose the one that does the job best. To ensure fairness in prediction, we use a
data-balancing technique called ADASYN.
The goal is to build a system that doesn’t just react to accidents—but helps prevent them. By offering
real-time risk insights, this system can support urban planners, emergency responders, and traffic
authorities in making smarter, safer decisions.
With the rapid growth of urban populations and vehicle usage, cities are facing an increasing number
of traffic accidents. These incidents not only cause injuries and fatalities but also lead to traffic
congestion, economic losses, and delays in emergency response. Traditional methods of identifying
accident-prone areas rely heavily on past data and static analysis, which often fail to adapt to real-time
changes in traffic patterns and road conditions. As a result, authorities are left reacting to accidents rather
than preventing them. There is a growing need for an intelligent, data-driven system that can accurately
predict where accidents are more likely to happen.
1
1.2 AIM OF THE PROJECT
● Identify accident-prone areas in urban environments using machine learning, so authorities can act
before accidents happen.
● Leverage smart algorithms like Random Forest, KNN, ANN, and Naïve Bayes to accurately predict
high-risk zones.
● Handle unbalanced data using ADASYN, ensuring the model gives fair attention to both frequent
and rare accident cases.
● Support the creation of smarter, safer cities by using AI to turn raw traffic data into meaningful,
actionable safety strategies.
● Provide real-time accident risk insights to support better decisions for traffic control, emergency
response, and city planning.
This project belongs to the field of Artificial Intelligence, specifically applied within Intelligent
Transportation Systems. It uses machine learning to help make our roads safer and to support smarter,
more efficient traffic management in modern cities.
This project aims to build a smart system that can help predict where traffic accidents are more likely
to happen in a city. By using different machine learning models, the goal is to figure out which one gives
the most accurate results. To make sure the system works well with real-world data—where accident
cases are often uneven or imbalanced—we use techniques like ADASYN to balance things out. The
system is designed to give city planners, traffic authorities, and emergency services real-time insights so
they can take action before accidents occur. Although the project is built using a specific dataset, it’s
flexible enough to be adapted to other cities or regions. In the bigger picture, this system contributes to
creating safer roads and supports the development of smarter, more responsive urban traffic management.
2
1.5 METHODOLOGY
The project starts by collecting data about past traffic accidents, including information like where and
when they happened, and any factors that might have contributed to them. Once the data is cleaned and
organized, we noticed that some types of accidents or locations appear much more often than others,
which could lead to unfair or inaccurate predictions. To fix this, we used a method to balance the data so
the system could learn more evenly from all kinds of cases. Then, we built and tested different machine
learning models to find out which one could best predict where accidents are most likely to happen. We
checked how well each model worked by measuring things like how accurate and reliable the predictions
were. After choosing the best-performing model, we used it to create a system that can provide real-time
alerts for high-risk areas. The goal is to give traffic authorities and city planners helpful insights so they
can take action before accidents occur, making city roads safer for everyone.
Chapter 2 presents the literature review of existing research on traffic accident prediction,
spatiotemporal analysis, and AI-driven analytics. It covers key methodologies and technologies such as
time-series modeling, geospatial heatmaps, ensemble learning, and the use of ADAS (Advanced Driver
Assistance Systems) data. The strengths and limitations of existing models are highlighted, providing the
motivation for the proposed approach.
Chapter 3 explores the challenges faced in predicting traffic accident hotspots, especially with sparse
or noisy data. It addresses issues like data imbalance, location variability, and the integration of temporal
dynamics. The chapter discusses data cleaning strategies, feature engineering from raw ADAS data, and
methods to handle missing or corrupted entries. The importance of spatiotemporal context in accurately
predicting accident zones is emphasized.
Chapter 4 describes the proposed methodology for spatiotemporal accident hotspot prediction. The
chapter begins with the system architecture (Figure 4.1), which integrates data preprocessing, feature
extraction, and machine learning classifiers such as Gradient Boosting, XGBoost, and Decision Trees.
The data flow diagram (Figure 4.2) illustrates the pipeline from ADAS data ingestion to hotspot
prediction. UML diagrams (Figure 4.3) are used to depict the modular structure of the application, which
includes Data Collection, Preprocessing, Model Training, and Visualization components.
3
Chapter 5 focuses on the implementation and testing of the proposed system. A random data sample
is taken from the dataset and passed through the model to identify accident-prone areas. Input-output
evaluations, illustrated in Figures 5.1 and 5.2, show the processed data and its predicted class. Various
testing types are performed:
Unit Testing checks individual preprocessing and feature engineering modules (Figure 5.3).
Integration Testing evaluates the coordination of preprocessing, model training, and evaluation
components (Figure 5.4).
Functional Testing validates that the final predictions match the expected outputs (Figures 5.6
and 5.7).
This comprehensive testing ensures that the model is both robust and reliable.
Chapter 6 presents the results and discusses the model's performance across various metrics such as
accuracy, precision, recall, and F1-score. Comparisons are made between different classifiers, as
visualized in Figures 6.1 and 6.2 (e.g., confusion matrices for Gradient Boosting and Decision Tree
classifiers). The chapter highlights the superiority of ensemble methods in handling high-dimensional,
imbalanced data.
Chapter 7 concludes the project and outlines future improvements. While the model demonstrates high
accuracy in predicting accident hotspots, future work could include real-time data integration, deployment
on edge devices, and expanding the dataset to multiple cities for broader applicability. Additionally,
combining AI predictions with live traffic feeds and IoT devices could enhance early-warning systems.
Chapter 8 includes the source code, implementation details, and supporting materials such as a project
poster and sample code snippets. It serves as a reference for replication and further development of the
system.
4
CHAPTER 2
LITERATURE REVIEW
As cities continue to grow and roads become busier, ensuring traffic safety has become more important
than ever. Researchers around the world have been exploring smarter ways to reduce accidents using data
and technology. One major area of focus has been the use of machine learning to spot patterns in traffic
behavior and predict where accidents are more likely to happen. These studies show that the quality of
data and how it’s prepared plays a big role in making accurate predictions. There's also growing interest
in using real-time traffic and location-based data to make road safety systems more responsive and
effective. Over time, the focus has shifted from simply reacting to accidents to preventing them before
they occur. Many researchers agree that intelligent systems should be flexible enough to adapt to different
cities and traffic conditions. This growing body of work supports the idea that combining AI with traffic
management can make roads safer. It also forms the foundation for building smarter systems like the one
proposed in this project.
Kumar and Singh (2020) [2] conducted a study integrating Geographic Information Systems (GIS) with
machine learning models to detect accident-prone areas in urban regions. Their research relied heavily on
spatial data such as road curvature, intersection density, and proximity to traffic signals. When combined
with historical accident data, the system generated highly accurate risk maps, which local governments
could use for strategic planning. The model’s success lay in its spatial analysis capabilities, offering a
visual representation of hazardous zones. However, they identified a major challenge in the inconsistency
of publicly available traffic data and the irregularity of data collection intervals. Their model was tested
in a mid-sized Indian city and validated using ground truth accident reports. Results showed that spatially
aware models are more effective than those using statistical averages alone. The study advocates for
regular updates to GIS layers to maintain prediction accuracy over time. This integration of geospatial
technology marks a step forward in AI-based urban safety systems.
5
Ali and Fatima [4] approached accident prediction from the perspective of developing and low-resource
regions. Their goal was to build a model that could still produce accurate results without access to high-
end infrastructure or extensive datasets. Using basic information such as historical accident locations, road
types, and weather reports, they built a lightweight predictive tool. Their system showed strong
performance in semi-urban and rural areas where data availability is limited. The study stressed the need
for adaptable models that can work even when real-time or large-scale data is unavailable. One key insight
from their research is that with the right techniques, even minimal input can yield useful predictions. Their
findings have significant implications for policymakers in under-resourced areas. Moreover, their tool was
developed using open-source platforms, making it easier for regional governments to adopt and customize
it. The study also pointed out the challenges in standardizing accident report formats, which affected
model training.
Rana et al. (2021) [6] focused on the temporal component of traffic accident data, using time-series
analysis to detect recurring trends. Their work revealed patterns such as higher accident rates during
weekends, public holidays, and adverse weather conditions. They analyzed accident frequency over
multiple years and used statistical smoothing techniques to highlight seasonal peaks and troughs. This
analysis helped identify times of the year when accident likelihood was highest, providing valuable
insights for scheduling traffic safety campaigns or increasing patrol presence. The model was also capable
of generating short-term forecasts, which could aid emergency services in planning resources. Their study
highlighted the importance of time-based planning in traffic safety management. Furthermore, they
emphasized the need for integrating external temporal factors like events and festivals into prediction
models. Rana et al. concluded that understanding time-dependent behavior enhances both preventive and
reactive measures in urban traffic systems.
Chen and Li (2020) [7] explored the fusion of satellite imagery with structured accident data to develop
a visual risk detection platform. Their system used high-resolution satellite images to detect features such
as road types, pedestrian crossings, and intersections. Combined with accident records, the model
produced detailed maps identifying high-risk zones. The main advantage was the visual output that non-
technical stakeholders could interpret easily. City planners and traffic authorities could use the interactive
maps for urban safety assessments and infrastructure planning.
6
Zhang et al. (2021) [1] introduced a spatiotemporal deep learning framework specifically designed to
predict urban traffic accident hotspots. The model captures both spatial configurations and temporal
changes, considering factors such as the time of day, day of the week, and traffic flow levels. Their study
demonstrated that incorporating these temporal variables significantly enhances prediction accuracy when
compared to traditional statistical models. The deep learning model also proved scalable, enabling its use
in large metropolitan areas. They applied their framework to datasets from multiple cities, observing a
notable improvement in hotspot detection. In particular, high-risk intersections and peak-hour accident
spikes were effectively forecasted. Their work underscores the importance of modeling both space and
time simultaneously for reliable predictions. By providing a real-time risk heatmap, the system allows city
planners to make more informed decisions about infrastructure and law enforcement deployment. The
authors also noted challenges in managing large datasets and the computational load required for deep
learning training.
Lee et al. (2022) [3] proposed a hybrid AI-based approach combining decision tree classifiers with
artificial neural networks (ANNs) to achieve real-time accident prediction. Their model is adaptive,
meaning it continuously learns from new traffic data and updates its prediction logic accordingly. The
goal was to create a system capable of responding to changing conditions on the road, such as construction
zones or weather changes. They deployed this system in a city with high traffic density and tested it using
live feeds from traffic cameras and speed sensors. Their results indicated a 20% increase in early accident
risk detection compared to static models. One of the most valuable contributions of this study was the
emphasis on real-time response and flexibility. The researchers also discussed the technical requirement
of robust data infrastructure and how delays or errors in data transmission can reduce the system’s
effectiveness. This hybrid approach paves the way for the implementation of real-time traffic risk alert
systems in urban management frameworks.
Gomez et al. (2023) [5] leveraged ensemble learning methods to enhance traffic accident prediction
accuracy. By combining the outputs of multiple models—such as random forests, gradient boosting, and
logistic regression—they created a more reliable predictive system. Their findings showed that ensemble
models reduce variance and improve generalization, particularly in heterogeneous urban traffic settings.
The study used a large dataset from three metropolitan regions and tested various configurations of
ensemble algorithms. Results indicated that ensemble learning could outperform individual models by a
margin of 10-15% in terms of prediction accuracy.
7
The study also provided guidelines on selecting base learners and tuning hyperparameters for optimal
performance. Their approach proved particularly effective in capturing rare but severe accident cases.
Additionally, the model offered transparency in terms of feature importance, aiding decision-makers in
understanding which variables influenced the results. Gomez et al. highlighted that model robustness and
interpretability are key for practical deployment. Their contribution reinforces the value of collaborative
models in traffic risk analysis.
Ahmed et al. (2018) [10] explored the impact of environmental variables such as fog, rain, and daylight
on traffic accident occurrences in rural areas. Their machine learning model incorporated weather
forecasts, light conditions, and road surface types to improve prediction reliability. Their results showed
that adding environmental context to accident data significantly enhanced model accuracy. This study is
especially important for areas with less traffic monitoring infrastructure, where environmental data can
act as proxies for real-time sensors. Ahmed et al. advocated for incorporating weather APIs and
environmental sensors into future traffic prediction systems. They also pointed out challenges in aligning
datasets from different sources. This study makes a strong case for a holistic approach that goes beyond
vehicle and road data.
8
CHAPTER 3
PROJECT DESCRIPTION
Across the world, many systems have been developed to make roads safer and help prevent traffic
accidents using the power of data and technology. For instance, in the U.S., there's a tool called Safety
Analyst that helps traffic authorities find dangerous spots on the road and figure out whether safety
improvements are working, all based on past accident data. Cities like Toronto and London are using
platforms like Vision Zero, which rely on AI and location data to study accident patterns and make roads
safer before crashes happen. In India, some smart cities have started using Intelligent Traffic Management
Systems (ITMS) that bring together cameras, sensors, and AI to monitor traffic in real time and spot
potential risks. Tools like Inrix Roadway Analytics also help city planners by combining real-time traffic
flow with past data to predict where issues might arise. All of these systems aim to do what this project
is trying to achieve—use technology to understand traffic risks better and take action before accidents
happen, ultimately leading to safer and smarter cities.
This project aims to build a smart system that can help predict where traffic accidents are most likely
to happen in a city. It starts by collecting real accident data, like where the crashes happened, when they
occurred, and what might have caused them. Once the data is cleaned and organized, we make sure the
system learns fairly by balancing out the number of accident and non-accident cases.
Then, we try out different machine learning models to see which one can make the most accurate
predictions. The best model is chosen and used to power a system that can give real-time warnings about
accident-prone areas. This way, traffic officials and city planners can take action—like improving road
signs, adjusting traffic rules, or placing emergency services nearby—before accidents happen.
The goal is to create a tool that doesn't just react to accidents but helps prevent them. By using data
and smart technology, the system supports safer roads and better traffic management in growing cities.
9
3.2.1 ADVANTAGES
● Predicts accident-prone areas before incidents occur.
● Help authorities make informed, data-driven decisions.
● Enhance road safety for all users.
● It can be adapted to any city or region with traffic data.
● Offers real-time risk assessment for quick action.
● Supports smart city and intelligent traffic goals.
This project is practically and technically achievable using available data and tools. It offers a valuable
solution for improving road safety through predictive insights.
• Economic Feasibility
• Operational Feasibility
• Technical Feasibility
• Scalability Feasibility
The system can be built using tools and technologies that are already widely available, like
Python and machine learning libraries. It doesn’t require any high-end or expensive equipment, which
makes it easy to develop and test even on regular computers.
This system can be smoothly integrated into how traffic authorities currently work. It’s designed
to be simple and useful, giving real-time insights that help planners and emergency teams act quickly
and efficiently.
10
3.3.3 TECHNICAL FEASIBILITY
The system can be built using tools and technologies that are already widely available, like Python
and machine learning libraries. It doesn’t require any high-end or expensive equipment, which makes
it easy to develop and test even on regular computers.
Once the system is up and running, it can easily be applied to other cities or expanded with more data.
It’s flexible enough to grow without having to rebuild everything from scratch.
An effective system is crucial for any computational task. It's important to have the correct hardware
and software components to ensure everything runs smoothly. From strong processors to essential
software packages, each part helps create an efficient environment for data analysis and machine learning
tasks
11
● Visualization Tools
● Matplotlib, Seaborn, Power BI for mapping accident hotspots
● Deployment & API Services
● Flask/Django for web-based model deployment
12
CHAPTER 4
PROPOSED WORK
Figure 4.1 The workflow of a traffic accident prediction system, starting from data collection to
preprocessing, model training, risk prediction, and finally deployment for decision-making.
During the design phase, diverse diagrams and models are crafted to depict various elements of the
system, such as its components, interactions, and data flow. These diagrams, including UML, sequence,
use case, and data flow diagrams, aid in conveying the system's design and functionality to stakeholders
and development teams. In essence, the design phase is pivotal for ensuring that the software solution
achieves its objectives in a proficient and effective manner.
13
4.2.1 DATA FLOW DIAGRAM
This Figure 4.2 illustrates the complete machine learning pipeline for traffic accident prediction—
from data collection, cleaning, and preprocessing to training various models, evaluating them with
performance metrics, and selecting the best model for deployment. It highlights key techniques like
normalization, feature selection, and evaluation metrics such as accuracy and recall.
14
4.2.2 UML DIAGRAM
The Figure 4.3 is a UML diagram outlines the structure of the system for traffic accident hotspot
prediction using AI-powered analytics. It includes classes like DataCollector, Preprocessor,
ModelTrainer, Predictor, and Visualizer, each responsible for specific tasks such as collecting data,
cleaning it, training machine learning models, making predictions, and displaying results. The
relationships between these classes ensure smooth data flow and functionality, helping the system operate
in a modular and efficient way.
15
4.2.3 USE CASE DIAGRAM
Figure 4.4 This use case diagram shows how different users—such as traffic authorities and urban
planners—interact with the traffic accident prediction system to perform tasks like uploading data,
running predictions, and viewing accident-prone zones. It captures the system’s core functionalities and
the roles of its primary users.
This project is designed with a modular approach to accurately predict traffic accident hotspots using
AI-driven analytics. Each module plays a specific role in transforming raw accident data into meaningful,
actionable insights. From collecting and preprocessing data to training intelligent models and visualizing
predictions, every part of the system works together to help authorities identify high-risk areas across
both space and time.
The Data Collection Module plays a crucial role in building a reliable foundation for the traffic
accident prediction system. It gathers information from a variety of sources to ensure a comprehensive
understanding of driving conditions and accident patterns. Advanced Driver Assistance Systems
(ADAS) sensors provide real-time insights into driver behavior, including speed, braking habits, and
sudden maneuvers. Traffic cameras contribute by monitoring ongoing road conditions and detecting
incidents as they happen. Historical accident reports offer valuable details about past accidents,
including their locations and severity, which help in identifying high-risk zones. Additionally, weather
data such as rain, fog, and temperature is incorporated to understand how environmental conditions
impact road safety. By integrating diverse data types, this module ensures a rich dataset that reflects
both human and environmental factors. The collected data serves as the foundation for training and
refining the machine learning models.
The Data Preprocessing Module is essential for preparing raw traffic data before it's used to train any
prediction model. Real-world data often comes with issues like missing entries, duplicates, and
inconsistencies, which can affect accuracy. To tackle this, the module first cleans the data by filling in
missing values using averages or common values and removing anything irrelevant. Next, it focuses
on feature engineering—pulling out important information like the time of day, road type, and traffic
17
volume that can help the model make better predictions. Since traffic accidents are relatively rare
events, the dataset can be skewed. To balance it, a technique called ADASYN is used to generate
additional data for accident cases, so the model gets to learn from both common and rare events equally.
After this, numerical values like speed and severity are scaled for uniformity using tools like
StandardScaler. Finally, to avoid overwhelming the system with too much data, techniques like PCA
may be used to reduce complexity while keeping the data meaningful. This entire process ensures that
the model is trained on clean, balanced, and useful data for better prediction results.
The neural network is regarded as an adept feature extractor, comprising two primary components
integral to its functioning. Firstly, the feature extractor incorporates convolutional and pooling layers,
tasked with autonomously discerning and assimilating key attributes from raw data. Subsequently, the
fully connected layer employs the acquired features to execute classification tasks.The input layer
serves to ingest individual data values, while the output layer yields results corresponding to the number
of distinct categories requiring classification.Within the convolutional layer, localized regions of the
data undergo scrutiny to extract pertinent features, while pooling layers serve to streamline
computational complexity by reducing parameter quantities.
• The dataset is grouped by ["Lat", "Long", "Hour"] to count the number of accidents at each location
per hour.
• .size() calculates the number of entries per group.
• .reset_index(name="Alert_Count") resets the group index and assigns the count to a new column
named "Alert_Count".
19
4.3.7 STEP 4: BUILDING THE MODEL
Our model of choice is an encoder-decoder architecture-based model implemented.
INPUT LAYER:
which takes in structured data derived from the ADAS system, including spatiotemporal features
(like time, latitude, longitude), vehicle-specific metrics (such as speed and alert severity), and
contextual parameters (such as road type, weather conditions, and traffic density). This input layer
acts as the foundation for feeding relevant and preprocessed data into the learning architecture.
ENCODER SECTION:
This section is designed to learn compressed representations of the input features. It typically
includes multiple dense layers that gradually reduce the dimensionality of the data, enabling the
model to capture essential patterns while ignoring redundant information.
DECODER SECTION:
which takes the compressed representation from the encoder and attempts to reconstruct the
original input or predict a specific outcome, such as accident risk classification. The decoder can
consist of dense layers that mirror the encoder’s structure in reverse, allowing the model to learn
how changes in latent features reflect in the output. This setup is especially helpful in architectures
like autoencoders or when applying a reconstruction-based learning method.
MODEL CREATION:
Finally, the encoder input and decoder output is used to create a Pipeline model.
Once the model architecture is finalized, it is compiled by specifying the loss function,
optimizer, and evaluation metrics. For binary classification tasks such as accident hotspot
prediction, commonly used configurations include the binary cross-entropy loss, Adam
optimizer, and metrics such as accuracy, precision, recall, and F1-score. After compilation, the
model is trained using the preprocessed training dataset. During training, the model iteratively
adjusts its internal parameters (weights and biases) to minimize the loss function and improve
prediction accuracy. Techniques such as early stopping, cross-validation, and hyperparameter
tuning (via GridSearchCV or RandomizedSearchCV) are employed to enhance model
generalization and prevent overfitting. Once trained, the model is evaluated using a separate
test dataset to ensure its performance holds on unseen data.
21
CHAPTER 5
A specific region within Chennai has been selected from the dataset collected by the
ADAS system installed on public transport buses.
The input includes GPS coordinates, timestamps, speed, alert severity, and
environmental parameters such as weather and road type.
This data is fed into the AI model, which has been trained to detect spatiotemporal
patterns of accident occurrence.
Figure 5.1 below illustrates Decision Tree Classifier - Confusion Matrix with
Performance Metrics.
Figure 5.1: Decision Tree Classifier - Confusion Matrix with Performance Metrics
After processing the input through the trained model, the system predicts high-risk zones for
potential accidents based on historical patterns and environmental context.
Figure 5.2 below displays Gradient Boosting Classifier - Confusion Matrix with Performance
Metrics.
22
Figure 5.2: Gradient Boosting Classifier - Confusion Matrix with Performance Metrics
5.2 TESTING
In the context of spatiotemporal prediction of traffic accident hotspots, testing plays a vital role in
verifying the performance and reliability of the predictive model. The objective is to ensure that the model
correctly identifies accident-prone zones based on historical and real-time data, such as location, time,
alert types, vehicle speed, traffic density, and weather conditions. Testing confirms that the AI-powered
system can accurately predict potential accident zones and assists authorities in proactive decision-making.
Unit testing involves testing individual components such as preprocessing steps, data transformation, and
23
INPUT:
TEST RESULT
• Categorical data (e.g., weather, road type) was encoded using label encoding and one-hot encoding.
• Numerical features like speed and number of alerts were normalized using MinMaxScaler.
24
5.2.3 INTEGRATION TESTING
Integration testing checks whether different components—like the preprocessing pipeline, model
training module, and evaluation functions—work together seamlessly.
INPUT:
TEST RESULT
• Preprocessed data was correctly fed into multiple machine learning models (e.g., XGBoost,
Random Forest).
• Output metrics such as accuracy, precision, recall, and F1-score were successfully computed.
25
5.2.4 FUNCTIONAL TESTING
Functional testing validates the complete functionality of the accident hotspot prediction model—
from data loading to final visualization—ensuring the model serves its intended purpose.
INPUT
TEST RESULT
• The dataset was successfully divided using a stratified train-test split to maintain class balance.
• The best-performing model (XGBoost Classifier) achieved high accuracy and reliability.
26
•
CHAPTER 6
RESULTS AND DISCUSSIONS
The proposed system is designed to be both time-efficient and resource-friendly. It processes large
volumes of traffic data quickly and accurately, making real-time accident risk predictions possible
without heavy computational needs. By using optimized machine learning techniques and balanced
datasets, the system minimizes errors and improves the reliability of results. It also reduces manual work
for traffic authorities by automatically identifying high-risk areas, helping them respond faster and plan
smarter. Overall, the system strikes a balance between speed, accuracy, and practicality—making it a
valuable tool for improving road safety with minimal delay and effort.
Existing traffic safety systems often rely on historical accident reports and static analysis, which makes
them more reactive than proactive. These systems typically identify accident-prone zones only after
multiple incidents have occurred, limiting their ability to prevent future crashes. Additionally, some rely
heavily on manual data analysis, which can be time-consuming and may not scale well for large urban
areas.
In contrast, the proposed system uses real-time data analysis powered by machine learning to predict
high-risk areas before accidents happen. It automatically learns from patterns in the data, continuously
improving its accuracy over time. The system also provides faster and more dynamic insights, helping
traffic authorities and planners take quick, informed decisions. It reduces human effort, adapts to
changing traffic conditions, and fits well into the vision of smart, safe cities.
27
CHAPTER 7
7.1 CONCLUSION
In today’s fast-growing cities, ensuring road safety has become more important than ever. This
project takes a proactive approach by using machine learning to predict accident-prone areas before
incidents occur. By analyzing real-world traffic data and identifying hidden patterns, the system helps
authorities make smarter, faster decisions to prevent accidents. It not only improves public safety but
also supports better planning and resource management. With its ability to adapt to different locations
and growing data, the system shows great potential for building safer, more intelligent transportation
systems. Overall, this project highlights how technology can be used for a meaningful cause—saving
lives and making our cities smarter and safer.
While the current system offers accurate predictions and valuable insights, there are several ways it
can be improved in the future. One key enhancement is the integration of live traffic data, weather
conditions, and road construction updates to make predictions even more precise and real-time. The
system can also be extended to cover rural and highway areas, not just urban zones. Adding a user-
friendly dashboard for traffic authorities could make the information easier to understand and act upon.
Additionally, using deep learning techniques and more advanced models could further boost prediction
accuracy. As more data becomes available over time, the system can continue to learn and improve,
making it an even more powerful tool for ensuring road safety and supporting smarter cities.
28
7.3 RESULTS:
Figure 7.1: Confusion Matrix of XGBoost Classifier for Accident Hotspot Prediction
Figure 7.2: Confusion Matrix of Gradient Boosting Classifier for Accident Hotspot
Prediction
29
CHAPTER 8
30
31
32
33
REFERENCES
[1] Lu, T., Dunyao, Z., Lixin, Y., & Pan, Z. (2015). The traffic accident hotspot prediction is based on
the logistic regression method. The 3rd International Conference on Transportation Information
and Safety (ICTIS), Wuhan, China .
[2] Agoylo, J. C. (2024). GIS-based traffic accident hotspot prediction using machine learning .
[3] Al-Omari, F., & Tarawneh, T. (2020). Prediction of traffic accident hotspots using fuzzy logic and
GIS. Applied Geomatics, 12(3), 229-240. https://doi.org/10.1007/s1.2518-019-00290-7
[4] Khan, A., & Hussain, Z. (2024). Utilizing GIS and machine learning for traffic accident prediction
in urban environments. Civil Engineering Journal, 10(5), 1-12.
https://doi.org/10.28999/cej2024v10i5
[5] Mourad, M., Ali, H., & Khaled, M. (2020). Impact of road geometry on accident risk using GIS and
ML models. Journal of Urban Planning, 45(5), 722734. https://doi.org/10.1016/j.jup2020.12.006
[6] Liu X.M., Ren F.T., Duan H.L., (1995). “Road accident forecasting in time series” China journal of
Highway and Transport. 8(S1), 125-130 .
[7] Li X.Y., Zhang N., and Jiang G.F., (2003). “Grey-Markov Model for Forecasting Road Accidents.”
Journal of highway and transportation research and development, 20(4), 98-100
[8] . Fang Y.R., and Shen F.M., (2012). “Development trend analysis and prediction of traffic accident.”
Journal of Safety Science and Technology. 8(3), 141-146.
[9] Qin X.H., Liu L., and Zhang Y., (2005). “A traffic accident prediction method based on Bayesian
network model.” Computer Simulation. 22(11), 230-232 .
34
[10] Pan, L., Shen, J., & Guo, J. (2021). A comparative study of machine learning models for traffic
accident prediction. Expert Systems with Applications, 163, 113818.
https://doi.org/10.1016/j.eswa.2020.113818
[11] J. Zhang, Y. Sun, and W. Liu, “Spatiotemporal deep learning for urban accident hotspot
prediction,” IEEE Transactions on Intelligent Transportation Systems, vol. 22, no. 3, pp. 1345–1356,
Mar. 2021.
[12] R. Kumar and A. Singh, “GIS and machine learning integration for road accident hotspot
detection,” International Journal of Geographical Information Science, vol. 34, no. 8, pp. 1604–
1620, 2020.
[13] H. Lee, M. Kim, and S. Choi, “A hybrid AI model for dynamic traffic accident prediction,”
Sensors, vol. 22, no. 5, pp. 2144–2155, 2022.
[14] S. Ali and T. Fatima, “Low-cost traffic accident prediction in developing regions using
simplified data,” Journal of Safety Research, vol. 70, pp. 45–52, 2019.
[15] L. Gomez, R. Chen, and K. Wang, “Enhancing traffic accident prediction using ensemble
learning,” Transportation Research Part C: Emerging Technologies, vol. 134, pp. 103478, 2023.
[16] M. Rana, A. Javed, and N. Ashraf, “Time-series analysis for spatiotemporal traffic accident
prediction,” Procedia Computer Science, vol. 181, pp. 907–914, 2021.
[17] Y. Chen and B. Li, “Integrating satellite imagery in AI-based accident risk models,” ISPRS
Journal of Photogrammetry and Remote Sensing, vol. 168, pp. 292–303, 2020.
[18] H. Bashir, M. Khalid, and L. Hu, “Real-time accident risk prediction in smart cities using
35
traffic sensor data,” Smart Cities, vol. 5, no. 4, pp. 1123–1137, 2022.
[19] M. Johnson and A. Taylor, “Comparative evaluation of machine learning models for traffic
accident prediction,” IEEE Access, vol. 9, pp. 34567–34578, 2021.
[20] A. Ahmed, F. Hussain, and S. Bano, “Accident prediction in rural highways using
environmental data and machine learning,” Accident Analysis & Prevention, vol. 117, pp. 1–8, 2018.
[21] Li, S.; Li, G.; Cheng, Y.; Ran, B. Urban arterial traffic status detection using cellular data
without cellphone GPS information.Transp. Res. Part C Emerg. Technol. 2020, 114, 446–462.
[22] European Commission. Handbook on the External Costs of Transport; European Commission:
Brussels, Belgium, 2020.
[23] Calatayud, A.; Sánchez González, S.; Bedoya Maya, F.; Giraldez Zúñiga, F.; Márquez, J.M.
Congestión Urbana en América Latina y el Caribe: Características, Costos y Mitigación; Inter-
American Development Bank: Washington, DC, USA, 2021.
[24] Chatterjee, K.; Chng, S.; Clark, B.; Davis, A.; De Vos, J.; Ettema, D.; Handy, S.; Martin, A.;
Reardon, L. Commuting and wellbeing:A critical overview of the literature with implications for
policy and future research. Transp. Rev. 2020
[25] Wang, X.; Rodríguez, D.A.; Sarmiento, O.L.; Guaje, O. Commute patterns and depression:
Evidence from eleven Latin American cities. J. Transp. Health 2019, 14, 100607.
[26] Retallack, A.E.; Ostendorf, B. Current understanding of the effects of congestion on traffic
accidents. Int. J. Environ. Res. Public Health 2019, 16, 3400.
[27] Cavallo, E.A.; Powell, A.; Serebrisky, T. From Structures to Services: The Path to Better
Infrastructure in Latin America and the Caribbean; Inter-American Development Bank:
36
Washington, DC, USA, 2020.
[28] Basu, R.; Ferreira, J. Sustainable mobility in auto-dominated Metro Boston: Challenges and
opportunities post-COVID-19. Transp. Policy 2021, 103, 197–210.
[29] Waze. Waze 130 Million Reasons to Say Thanks toWazers. Available online:
https://medium.com/waze/130-million-reasonsto-say-thanks-to-wazers-bcc9f9521378 (accessed on
6 November 2020).
[30] Goodall, N.; Lee, E. Comparison ofWaze crash and disabled vehicle records with video
ground truth. Transp. Res. Interdiscip. Perspect. 2019, 1, 100019.
[31] Hoseinzadeh, N.; Liu, Y.; Han, L.D.; Brakewood, C.; Mohammadnazar, A. Quality of
location-based crowdsourced speed data on surface streets: A case study of Waze and Bluetooth
speed data in Sevierville, TN. Comput. Environ. Urban. Syst. 2020, 83, 101518.
[32] Y. Tian, C. Wei, D. Xu, Traffic flow prediction based on stack autoencoder and long short-
term memory network, in: 2020 IEEE 3rd International Conference on Automation, Electronics and
Electrical Engineering (AUTEEE), 2020.
[33] X. Luo, D. Li, Y. Yang, S. Zhang, Spatiotemporal traffic flow prediction with knn and lstm,
J. Adv. Transp. (2019).
[34] D. mei Zhai, C. hui Shi, H. Zhao, Short-term traffic flow prediction based on deep learning,
DEStech Trans. Eng. Technol. Res. (2020).
[35] W. Wei, H. Wu, H. Ma, An autoencoder and LSTM-based traffic flow prediction method,
Sensors 19 (13) (2019) 2946.
[36] A.A. Mansour, A. Tilioua, M. Touzani, Bi-lstm, gru and 1d-cnn models for short-term
37
photovoltaic panel efficiency forecasting case amorphous silicon grid-connected pv system, Results
Eng. (2024) 101886.
[37] P. Redhu, K. Kumar, et al., Short-term traffic flow prediction based on optimized deep
learning neural network: Pso-bi-lstm, Phys. A, Stat. Mech. Appl. 625 (2023) 129001.
[38] M. Zhang, H. Shi, Y. Zhang, Y. Yu, M. Zhou, Deep learning-based damage detection of
mining conveyor belt, Measurement (2021) 109130.
[39] E. Azimirad, N. Pariz, M.B.N. Sistani, A novel fuzzy model and control of single intersection
at urban traffic network, IEEE Syst. J. 4 (1) (2010) 107–111.
[40] J. Lin, Study on the prediction of urban traffic flow based onarima model, in: Third
International Conference on Engineering Technology & Application (ICETA 2016), 2016, pp. 418–
422.
38
SRM INSTITUTE OF SCIENCE AND TECHNOLOGY
(Deemed to be University u / s 3 of UGC Act, 1956)
38
6 Faculty Engineering and Technology
8 Whether the above a) If the project / dissertation is done in group, then how many
project / dissertation is students together completed the project : 03
done by b) Mention the Name and Register number of
other candidates:
JACK ANDRE J [RA2011026020147], ABISHEK RAJ M
[RA2011026020131], CHARUDEVE KS [RA2011026020139]
ADDRESS OF GUIDE
9 Name and address Name: Santhosh Kumar C
of the Supervisor /
Guide Mail ID: [email protected]
NA
10 Name and address of
the Co- Supervisor Mail ID: NA Mobile Number: NA
/Guide
13 Plagiarism Details: (to attach the final report from the software)
38
Percentage of Percentag % of plagiarism after
Ch similarity e of excluding Quotes,
apt index Bibliography, etc.,
Title of the Report similarity
er
(including index
self citation) (Excludin
g self
citation)
TITLE
1 NA NA 10%
Appendices NA NA NA
We declare that the above information has been verified and found true to the best of our knowledge.
Name and Signature of the Supervisor / Name and Signature of the Co-Supervisor / Co-Guide
Guide
Dr. J. SUTHA
Name and Signature of the HOD
38
.
38
38
38
38
38
38