0% found this document useful (0 votes)
70 views6 pages

Prediction of Road Accidents in The Different States of India Using Machine Learning Algorithms

This document discusses predicting road accidents in different states of India using machine learning algorithms. The authors developed Random Forest Regressor and Decision Tree Regressor models to predict collisions based on attributes like collision type, road type, location, and weather. The models were trained on collision records from different Indian states. The accuracy of the Random Forest model was higher than the Decision Tree model for attributes like head-on collisions, foggy weather, single-lane roads, and pedestrian crossings. The paper aims to help plan solutions for preventing road accidents.

Uploaded by

Anurag Anand
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views6 pages

Prediction of Road Accidents in The Different States of India Using Machine Learning Algorithms

This document discusses predicting road accidents in different states of India using machine learning algorithms. The authors developed Random Forest Regressor and Decision Tree Regressor models to predict collisions based on attributes like collision type, road type, location, and weather. The models were trained on collision records from different Indian states. The accuracy of the Random Forest model was higher than the Decision Tree model for attributes like head-on collisions, foggy weather, single-lane roads, and pedestrian crossings. The paper aims to help plan solutions for preventing road accidents.

Uploaded by

Anurag Anand
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

2023 IEEE International Conference on Integrated Circuits and Communication Systems (ICICACS)

Prediction of Road Accidents in the Different States


of India using Machine Learning Algorithms
1st Mahendra G 2nd Roopashree H R
Department of CSE Department of CSE
GSSSIETW, Mysuru GSSSIETW, Mysuru
VTU, Belagavi VTU, Belagavi
[email protected]
2023 IEEE International Conference on Integrated Circuits and Communication Systems (ICICACS) | 979-8-3503-9846-5/23/$31.00 ©2023 IEEE | DOI: 10.1109/ICICACS57338.2023.10099519

Abstract—Analysis of road accidents acting a significant task in the In current practice, there are several problems in the
road transport system. This article, predicting road accidents based on prevention of accidents on the ground. The most suitable
four attributes, that is., collision type, road type, location, and weather.
A machine learning model with a Random Forest Regressor and
algorithm and assessments are useful for analyzing and
Decision Tree Regressor is developed and working to predict collisions determining the damage and causes of an accident. It would
based on collision records that have taken place in the different states of also be helpful to provide background information on roads
India. The hit and run, head-on collision, hit pedestrian, fog, cloudy, and bridges to avoid similar problems encountered earlier. The
rainy, single lane, two-lane, four-lane, school, pedestrian crossing, predictions made will be very helpful in planning a solution to
market, and other parameters considered for analyzing and
these problems.
visualization of accidents in different states of India. The correlation
between mortal rate and other features including different road
conditions, conditions of weather, location, nature of collision and time
II. RELATED WORKS
of occurrence, kinds of motor vehicles involved in accidents were Tessa K. Anderson et al. [1] introduced a strategy for
analyzed. The Mean Absolute Error (MAE), Mean Absolute Percentage distinguishing high-density casualty areas, which makes a
Error (MAPE), and Root Mean Square Error (RMSE) metrics are
considered for predicting road accidents based on a greater number of clustering procedure that establishes that stochastic files are
parameters. The accuracy of the Random Forest Regressor model and bound to exist in certain clusters, and can in this manner
Decision Tree Regressor model based on head-on collision is 94.2% and measure up in existence. Sachin Kumar et al. [2] utilized data
84.8%, 92.8% and 96.9% for fog weather conditions, 80% and 81.5% mining methods to recognize the areas where high-recurrence
based on single-lane accidents, and 90.3% and 86.1% based on casualties happen and afterward examine them to distinguish
pedestrian crossing attribute. The outcomes of the relative analysis
proved that the Random Forest Regressor (RFR) model does better than
the elements that affect road casualties at those areas. The
the Decision Tree Regressor (DTR) model. association rule mining and different prediction algorithms are
applied and consequently, the best exactness is accomplished
Keywords—Accident, collision, decision tree regressor, by Gradient Boosted Classification and Random Forest.
machine learning model, prediction, random forest regressor, Association rule mining is applied to discover different
road, visualization, weather. variables of casualties [3]. The connection between mortal
casualties and the seriousness of casualties has been dissected.
I. INTRODUCTION The result of the investigation shows the reason for the
There are many vehicles taking place on the roads each casualty, the natural issues, resident obligation, vehicle type,
day and road collisions can occur anytime, anyplace. Some and casualty time [4]. Log-straight model, driver attributes,
accidents are fatal, which means that a person dies as an walker qualities, street traffic, vehicle typologies have been
outcome of the accident. And in some other accidents, people considered in the venture, so this gives an unmistakable
are seriously injured. To learn how to ride safely, machine thought of what is influencing the casualties in the school
learning can be used to a dataset of accidents to find important regions [5].
facts and therefore provide driving advice. Machine learning Classification methods help to foresee the seriousness of
employs a variety of methods and procedures to identify path casualties. Naïve Bayes classifier, Decision Tree
relationships around large amounts of data. Various classifier, AdaBoostM1 Meta classifier, PART Rule classifier,
conditions of weather such as heavy rain, smog, etc. acting a and Random Forest are looked at for portraying the
significant task in increasing the chance of an accident. The seriousness level street casualties. The final result uncovers
complete assessment of collisions and knowledge of the that the Random Forest method beats the other four models
critical points of collisions and their features will help reduce [6]. The investigation of seriousness in casualties assists with
them. Delivering a well-timed crisis assistance level in the examining the connection between the seriousness, pack of
event of an injury is precise important, and it requires an in- boundaries which incorporate driver subtleties, vehicle
depth study of accidents. Models created using crash data subtleties, street example, and reason for casualties. The effect
profiles be able to better comprehend the features of many of various variables on the reality of the injury like day, time,
attributes such as driver manners, road situations, lighting speed limit, traffic subtleties, climatic conditions, and driver
states, conditions of weather, and more. This can help users subtleties were additionally utilized in the examination [7–9].
calculate useful safety measures to avoid accidents. De Ona et al. [10] utilized Latent class clustering and Bayesian
The main purpose of a traffic accident forecasting system organizations in the investigation of auto collisions to
is to analyze incidents that have occurred in a given area, recognize the primary components of casualty seriousness.
which will help to identify the area’s most prone to accidents The consolidated utilization of the two methods is incredibly
and assistance to set up the instant assistance needed for fascinating because it uncovers additional data. Mahendra G
intended and mark predictions based on constrictions such as et al. [11] developed an instrument to locate rash driving on
climate, smog, road construction, etc. roadways and alert the traffic authorities just in case of any
speed violation.

979-8-3503-9846-5/23/$31.00 ©2023 IEEE

Authorized licensed use limited to: Indian Inst of Inform Technology Guwahati. Downloaded on November 22,2023 at 07:34:46 UTC from IEEE Xplore. Restrictions apply.
Jamal Raiyn et al. [12] depict a model that distinguishes few insights from the dataset to show the essential qualities of
traffic episodes by thinking about the speed variety of the fatal mishaps. Then, at that point, applied random forest
vehicles that are situated prior and then afterward (upstream regressor and decision tree regressor to predict the number of
and downstream) a specific point on the interstate. Alkheder fatal accidents among the different transportation parameters.
et al. [13] introduced Neural Network (NN) techniques to
foresee the seriousness (minor, moderate, extreme, passing) of
auto collisions in Abu Dhabi. The investigation was upheld
5973 casualty records that had happened over 6 years. The
general exactness of the model is 81.6% respectively.
Mahendra G et al. [14] analyzed different transportation data,
machine learning methods used for analyzing different
transportation data, and different challenges and big data
applications in the intelligent transportation system. X. Gao
et al. [15] proposed a Weighted Random Forest (WQRF)
algorithm to predict employee salary turnover in industries.
Employee’s salary turnover is predicted based on overtime,
age, monthly income, distance from home, and years at the
company. M Chen et al. [16] presented Random Forest (RF),
Classification And Regression Tree (CART), and Logistic
Regression (LR) models and analysis, and comparisons of
these models to predict accidents based on different levels of
input variables. The random forest model outperforms other
models with an accuracy of 73.38% for 15 original variables.
Fig. 1. The flow of the Proposed Model.
Rabia Emhamed et al. [17] presented AdaBoost, Random
Forest, Logistic Regression, and Naïve Bayes for the severity
1) Random Forest Algorithm
of traffic accident prediction and collected traffic accident data
At this point, the random forest is controlled by a subset of
from Michigan Traffic Agencies. The accuracy of the Random
artificial intelligence known as machine learning, rather than
Forest algorithm outperforms other methods by 75.5%. J H
a collection of numerous decision trees. The random forest
Kim et al. [18] depicts machine learning models to predict
approach creates a set of decision trees using a subset of the
accidents at a container port. Comparing the accuracy of the
training data that is arbitrarily chosen and then selects
Deep Neural Network Model, Random Forest Model, and
predictions from each tree. The best option is then chosen
Gradient Boosting Model for time series data sets with various
through voting by the random forest algorithm. Because the
time intervals. Gradient Boosting and Deep Neural Network
random forest technique uses several decision trees, it
models are the best in predicting accidents at the container
decreases the impact of noisy findings, whereas a single
port.
decision tree's prediction result can be affected by noise.
III. METHODS Models for classification and regression can be built using
A. System Architecture random forest methods.
The approach for study is shown in Figure 1. The Each decision tree in a random forest classification model
architecture of the system is built using incident datasets that will vote, and the most common prediction class will be
can help understand the characteristics of many attributes such chosen as the outcome. The mean of all decision tree outcomes
as road conditions, weather conditions, locations, etc. It can is used as the final result in the random forest regression
help the user calculate useful safety measures to avoid model.
accidents.
The steps for creating a random forest regressor model are
B. Data Preparation as follows:
Readiness of information is performed before each model Step 1: Import the dataset and packages.
development. All records with missing facts in the selected Step 2: Define the features and the goal.
features were taken out. Data for this study was obtained from Step 3: Split the original dataset into train and test sets.
the Department of Transportation's government of India [19]. Step 4: Create a random forest regression model with the
The datasets of the year 2016 across different locations of random forest regressor function.
India were collected. Step 5: Confirm that the random forest regression model
C. Modeling is correct.
The data collected for this study is around 300 records of 2) Decision Tree Regression
road accident data in Comma Separated Value (CSV) format. Decision tree regression examines an object's
Preprocessing raw data to make readiness for developing the characteristics and trains a model within a tree's structure to
proposed model. Splitting data set into 70% training data set forecast data in the future and provide meaningful continuous
and 30% for testing. The proposed Random Forest Regressor output. The output/result isn't discrete, that is, it isn't
model is developed using Random Forest algorithm and represented solely by a discrete, known set of numbers or
Decision Tree Regressor model using Decision Tree values.
Regressor to predict fatal accidents based on several accidents Step by step Implementation in stages:
that happened in different locations of India based on different
road accident attributes such as head-on collision, fog weather Step 1: Import the libraries.
condition, single-lane, and pedestrian crossing. Visualized a Step 2: Create the Dataset and load it.

Authorized licensed use limited to: Indian Inst of Inform Technology Guwahati. Downloaded on November 22,2023 at 07:34:46 UTC from IEEE Xplore. Restrictions apply.
Step 3: Split the original dataset into train and test sets. other (clear). The majority of the accidents occurred in
Step 4: Fitting the decision tree regressor to the dataset. fog weather, as opposed to clear weather.
Step 5: Predicting a new value. iii) Figure 2(c) depicts the number of accidents that
Step 6: Visualize the outcome. occurred on various types of roads. The majority of
IV. RESULTS collisions occurred in single lanes. This is obvious
because the most common form of road condition is a
The results analysis incorporates the use of a random forest one-lane road.
regressor and a decision tree regressor to forecast the number
of fatal accidents across India. iv) Figure 2(d) depicts the number of accidents that
occurred in various places. The majority of the
The following data types were evaluated for the study: accidents occurred in the marketplace and at schools
i) Figure 2(a) depicts the number of accidents that and colleges. This is understandable because the most
occurred as a result of various collision types. common examples of locales are schools and markets.
Surprisingly, the majority of incidents in transportation V. DISCUSSION
do not involve motor vehicles. The number of people
and fatalities involved in front-to-front or head-on A. Statistics
crashes is substantially higher than in other motor Figure 3 depicts the timing of the accident. The greatest
vehicle collisions in transportation. accidents occurred between 9 a.m. and 12 p.m., while the least
ii) Figure 2(b) depicts the number of accidents that occurred between 6 p.m. and 9 p.m. Figure 4 depicts the many
occurred under various weather conditions. The types of vehicles engaged in collisions: two-wheelers, three-
majority of the collisions occurred in other (clear) wheelers, four-wheelers, heavy vehicles, and other vehicles.
meteorological conditions. This is acceptable because
the most common situation of weather conditions is

(a) Accidents based on different collisions. (b) Accidents are based on different weather conditions.

(c) Accidents that occurred on various types of roads. (d) Accidents are based on different locations.
Fig. 2. Visualization of accidents based on four attributes.

Authorized licensed use limited to: Indian Inst of Inform Technology Guwahati. Downloaded on November 22,2023 at 07:34:46 UTC from IEEE Xplore. Restrictions apply.
For performance evaluation, compare the Random Forest
Regressor (RFR) and Decision Tree Regressor (DTR) of the
two developed models. 70 percent of statistics from various
Indian states are used as training data, while 30 percent of data
is used for model evaluation. Meanwhile, compare the
predicted and original values to calculate the metric MAE,
MAPE, and RMSE.
The results in Table 1 show that the random forest
regressor model outperforms the decision tree regressor model
in the task of predicting head-on collision and pedestrian
crossing accidents in different states of India. The random
forest regressor has a training score of 91.8 percent and a
testing score of 94.2 percent for head-on collision and training
score of 90.4 percent and a testing score of 90.3 percent for
pedestrian crossing. Similarly, the decision tree regressor's
Fig. 3. Timing of accident. training and testing scores are 99.9 percent and 84.8 percent
for a head-on collision, and training and testing scores are 100
percent and 86.1 percent for pedestrian crossing, respectively.

TABLE I. COMPARISON OF THE DIFFERENT PREDICTIVE MODELS.


Model Feature MAE MAPE RMSE
Head-On
110 29.19 163
Collision
Random Forest
Single-Lane 440 44.95 756
Regressor
Fog Weather 70 19.48 149
(Proposed Model)
Pedestrian
123 24.24 265
Crossing
Head-On
174 34.95 264
Collision
Decision Tree
Single-Lane 407 42.18 727
Regressor
Fog Weather 56 18.11 98
(Proposed Model)
Pedestrian
156 38.33 318
Crossing
Yisheng Lv et al.
Fig. 4. Types of vehicles engaged in collisions. - 122.8 - 183.9
[20] (2015)
Wei Li et al. [21]
B. Accident Prediction - 286.19 5.76 322.09
(2020)
Given data D = {(xi, yi), i = 1,…,n} (1)
Notes. Bold text indicates the performance of the best performing model/s.
Where xi = (xi1,…, xip), The decision tree regressor model outperforms the random
build a model f-hat so that Y-hat = f-hat(X) for random forest regressor model in the task of predicting fog weather
variables X = (X1,…, Xp) and Y. conditions and single-lane accidents in different states of
Then f-hat will be used for: India. The random forest regressor has a training score of 98.3
Predicting the value of the response from the predictors: y0- percent and a testing score of 92.8 percent for fog weather
hat = f-hat(x0) where x0=(x01,…,x0p). conditions, and a training score of 97.6 percent, and a testing
score of 80 percent for single-lane accidents. Similarly, the
C. Predictive accuracy
decision tree regressor’s training and testing scores are 100
To estimate the effectiveness of the proposed model, three percent and 96.9 percent for fog weather conditions, and
performance indicators, which are the Mean Absolute Error training and testing scores are 100 percent and 81.5 percent
(MAE), Mean Absolute Percentage Error (MAPE), and Root for single-lane accidents, respectively.
Mean Square Error (RMSE). They are defined as
The performance indicators MAE, MAPE, RMSE values
N
1 are normal for weather data and these values are higher than
MAE =
N
 | A − Fi | i
(2)
normal values but an acceptable accurate model for the nature
i =1
of the collision, types of roads, and pedestrian crossing. The
1 N
Ai − Fi reasons for the higher values of MAE, MAPE, and RMSE are
MAPE =
N
| i =1 Ai
| (3) big variations in the number of accidents and the number of
fatal accidents of nature of the collision, types of roads, and
RMSE = MSE (4) pedestrian crossing.

where In terms of training and testing accuracies, the comparative


analysis revealed that the random forest model outperformed.
1 N On the nature of collision and nature of the location, the
Mean Square Error (MSE) =
N
 (A − Fi )
i
2
(5) random forest regressor model outperformed the decision tree
i =1
regressor (Figure 5(a) and 5(d)). On the types of road and
N is the number of times the summation iteration happens, weather conditions, the decision tree regressor model
Ai is an actual value, and Fi is a forecast value. outperformed the random forest regressor model (Figure 5(b)

Authorized licensed use limited to: Indian Inst of Inform Technology Guwahati. Downloaded on November 22,2023 at 07:34:46 UTC from IEEE Xplore. Restrictions apply.
and 5(c)). Figure 5 compares the visualization and
performance of the random forest regressor model and the
decision tree regressor model.

(a) Comparative analysis of nature of the collision. (b) Comparative analysis of weather conditions.

(c) Comparative analysis of types of roads. (d) Comparative analysis of nature of the location.
Fig. 5. Comparative analysis of accidents based on four attributes.

data. IGIGI global. International Journal of Information Retrieval


VI. CONCLUSION Research. 2018.
This paper examined accidents in various Indian states. [3] K Jayasudha and C Chandrasekar. An overview of data mining in road
Accidents on the road are caused by a variety of factors. After traffic and accident analysis. Journal of Computer Applications. 2009;
2(4):32–37.
reviewing all of the research papers, it is possible to conclude
that factors such as vehicle types, weather conditions, road [4] X. Xu, S. Yan, W. Yixuan, and M. Lin. Researching on traffic accident
based on relevance analysis. IEEE International Conference on Power
structure, and so on have a significant impact on road accident Intelligent Computing and Systems (ICPICS). 2019; 629–632.
cases. The analysis of the results revealed that a greater [5] A. Briz-Redon, F. Martinez-Ruiz, and F. Montes. Estimating the
number of accidents occurred on single-lane roads, two-lane occurrence of traffic accidents near school locations: A case study from
roads, clear weather conditions, and areas such as Valencia (Spain) including several approaches. Elsevier. Accident
schools/colleges and market places, and that a greater number Analysis & Prevention. 2019;132.
of accident cases are heavily influenced by 2-wheelers, and [6] S. Krishnaveni and M. Hemalatha. A perspective analysis of traffic
that a greater number of accidents occurred between 9 a.m. accident using data mining techniques. International Journal of
Computer Applications. 2011;23(7):40–48.
and 12 p.m. The metrics used for predicting road accidents
[7] D. Delen, R. Sharda, and M. Bessonov. Identifying significant
are Mean Absolute Error (MAE), Mean Absolute Percentage predictors of injury severity in traffic accidents using a series of
Error (MAPE), and Root Mean Square Error (RMSE). Used artificial neural networks. Elsevier. Accident Analysis & Prevention.
random forest regressor and decision tree regressor to predict 2006;38(3);434–444.
the number of fatal accidents. For this datasets, random forest [8] H. Manner and L. Winsch-Ziegler. Analyzing the severity of accidents
regressor is more accurate than decision tree regressor. The on the German Autobahn. Elsevier. Accident Analysis & Prevention.
limitation of this study is that, the variations in the number of 2013;57:40–48.
accidents and the number of fatal accidents, performance [9] P.T. Savolainen, F.L. Mannering, D. Lord, and M.A. Quddus. The
statistical analysis of highway crash-injury severities: A review and
indicator values are higher than normal values but acceptable assessment of methodological alternatives. Elsevier. Accident Analysis
range. In future research, the greater number of parameters & Prevention. 2011;43(5):1666–1676.
i.e., age of the vehicle, age of driver, type of traffic control, [10] De Ona. J, Lopez. G, Mujalli. R, Calvo.F. J. Analysis of traffic
type of junction, etc., considered for predicting road accidents accidents on rural highways using latent class clustering and bayesian
in different states of India, and prediction accuracy could be networks. Elsevier. Accident Analysis and Prevention. 2013;51:1-10.
analyzed for a greater number of parameters. [11] Mahendra G, Dayananda R B. Vehicle rash drive control system.
International Journal for Research in Engineering Application and
REFERENCES Management. 2018;04(03):676-681.
[1] Tessa KA. Kernel density estimation and K-means clustering to profile [12] Jamal Raiyn, Tomer Toledo. Real-time road traffic anomaly detection.
road accident hotspots. Elsevier. Accident Analysis and Prevention. Journal of Transportation Technologies. 2014; 4(3):256-266.
2009; 41:359–364. [13] Alkheder.S, Taamneh.M, Taamneh.S. Severity prediction of traffic
[2] Sachin Kumar, Prayag Tiwari, Kalitin VD. Augmenting classifiers accident using an artificial neural network. Journal of Forecasting.
performance through clustering: A comparative study on road accident 2017;36(1):100–108.

Authorized licensed use limited to: Indian Inst of Inform Technology Guwahati. Downloaded on November 22,2023 at 07:34:46 UTC from IEEE Xplore. Restrictions apply.
[14] Mahendra G, Roopashree H R, Yogeesh A C. Analysis of the big data
methods, challenges, and applications in intelligent transportation
systems. International Journal of Advanced Trends in Computer
Science and Engineering. 2020; 9(5):7478-7486.
[15] X. Gao, J. Wen, C Zhang. An improved random forest algorithm for
predicting employee turnover. Hindawi. 2019; 4140707:12 p.
[16] Mu-Ming Chen and Mu-Chen Chen. Modeling road accident severity
with comparisons of logistic regression, decision tree and random
forest. MDPI. Information 2020; 11(270):23p.
[17] Rabia Emhamed A, Keneth Morgan K wayu, Maha Reda Alkasisbeh,
Abdulbaset Ali Frefer. Comparison of machine learning algorithms for
predicting traffic accident severity. IEEE. Jordan International Joint
Conference on Electrical Engineering and Information Technology.
2019:272-276.
[18] Jae Hun Kim, Juyeon Kim, Gunwoo Lee, Juneyoung Park. Machine
learning-based models for accident prediction at a korean container
port. MDPI. Sustainability 2021;13(9137):14p.
[19] Department of transportation. Government of India. India.
[20] Yisheng Lv, Yanjie Duan, Wenwen Kang, Zhengxi Li, and Fei-Yue
Wang. Traffic flow prediction with big data: A deep learning approach.
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION
SYSTEMS. 2015; 16(2):865-873.
[21] Wei Li, Xujian Zhao, and Shiyu Liu. Traffic accident prediction based
on multivariable grey model. Information. 2020; 11(184):12p.

Authorized licensed use limited to: Indian Inst of Inform Technology Guwahati. Downloaded on November 22,2023 at 07:34:46 UTC from IEEE Xplore. Restrictions apply.

You might also like