Prediction of Road Accidents in The Different States of India Using Machine Learning Algorithms
Prediction of Road Accidents in The Different States of India Using Machine Learning Algorithms
Abstract—Analysis of road accidents acting a significant task in the In current practice, there are several problems in the
road transport system. This article, predicting road accidents based on prevention of accidents on the ground. The most suitable
four attributes, that is., collision type, road type, location, and weather.
A machine learning model with a Random Forest Regressor and
algorithm and assessments are useful for analyzing and
Decision Tree Regressor is developed and working to predict collisions determining the damage and causes of an accident. It would
based on collision records that have taken place in the different states of also be helpful to provide background information on roads
India. The hit and run, head-on collision, hit pedestrian, fog, cloudy, and bridges to avoid similar problems encountered earlier. The
rainy, single lane, two-lane, four-lane, school, pedestrian crossing, predictions made will be very helpful in planning a solution to
market, and other parameters considered for analyzing and
these problems.
visualization of accidents in different states of India. The correlation
between mortal rate and other features including different road
conditions, conditions of weather, location, nature of collision and time
II. RELATED WORKS
of occurrence, kinds of motor vehicles involved in accidents were Tessa K. Anderson et al. [1] introduced a strategy for
analyzed. The Mean Absolute Error (MAE), Mean Absolute Percentage distinguishing high-density casualty areas, which makes a
Error (MAPE), and Root Mean Square Error (RMSE) metrics are
considered for predicting road accidents based on a greater number of clustering procedure that establishes that stochastic files are
parameters. The accuracy of the Random Forest Regressor model and bound to exist in certain clusters, and can in this manner
Decision Tree Regressor model based on head-on collision is 94.2% and measure up in existence. Sachin Kumar et al. [2] utilized data
84.8%, 92.8% and 96.9% for fog weather conditions, 80% and 81.5% mining methods to recognize the areas where high-recurrence
based on single-lane accidents, and 90.3% and 86.1% based on casualties happen and afterward examine them to distinguish
pedestrian crossing attribute. The outcomes of the relative analysis
proved that the Random Forest Regressor (RFR) model does better than
the elements that affect road casualties at those areas. The
the Decision Tree Regressor (DTR) model. association rule mining and different prediction algorithms are
applied and consequently, the best exactness is accomplished
Keywords—Accident, collision, decision tree regressor, by Gradient Boosted Classification and Random Forest.
machine learning model, prediction, random forest regressor, Association rule mining is applied to discover different
road, visualization, weather. variables of casualties [3]. The connection between mortal
casualties and the seriousness of casualties has been dissected.
I. INTRODUCTION The result of the investigation shows the reason for the
There are many vehicles taking place on the roads each casualty, the natural issues, resident obligation, vehicle type,
day and road collisions can occur anytime, anyplace. Some and casualty time [4]. Log-straight model, driver attributes,
accidents are fatal, which means that a person dies as an walker qualities, street traffic, vehicle typologies have been
outcome of the accident. And in some other accidents, people considered in the venture, so this gives an unmistakable
are seriously injured. To learn how to ride safely, machine thought of what is influencing the casualties in the school
learning can be used to a dataset of accidents to find important regions [5].
facts and therefore provide driving advice. Machine learning Classification methods help to foresee the seriousness of
employs a variety of methods and procedures to identify path casualties. Naïve Bayes classifier, Decision Tree
relationships around large amounts of data. Various classifier, AdaBoostM1 Meta classifier, PART Rule classifier,
conditions of weather such as heavy rain, smog, etc. acting a and Random Forest are looked at for portraying the
significant task in increasing the chance of an accident. The seriousness level street casualties. The final result uncovers
complete assessment of collisions and knowledge of the that the Random Forest method beats the other four models
critical points of collisions and their features will help reduce [6]. The investigation of seriousness in casualties assists with
them. Delivering a well-timed crisis assistance level in the examining the connection between the seriousness, pack of
event of an injury is precise important, and it requires an in- boundaries which incorporate driver subtleties, vehicle
depth study of accidents. Models created using crash data subtleties, street example, and reason for casualties. The effect
profiles be able to better comprehend the features of many of various variables on the reality of the injury like day, time,
attributes such as driver manners, road situations, lighting speed limit, traffic subtleties, climatic conditions, and driver
states, conditions of weather, and more. This can help users subtleties were additionally utilized in the examination [7–9].
calculate useful safety measures to avoid accidents. De Ona et al. [10] utilized Latent class clustering and Bayesian
The main purpose of a traffic accident forecasting system organizations in the investigation of auto collisions to
is to analyze incidents that have occurred in a given area, recognize the primary components of casualty seriousness.
which will help to identify the area’s most prone to accidents The consolidated utilization of the two methods is incredibly
and assistance to set up the instant assistance needed for fascinating because it uncovers additional data. Mahendra G
intended and mark predictions based on constrictions such as et al. [11] developed an instrument to locate rash driving on
climate, smog, road construction, etc. roadways and alert the traffic authorities just in case of any
speed violation.
Authorized licensed use limited to: Indian Inst of Inform Technology Guwahati. Downloaded on November 22,2023 at 07:34:46 UTC from IEEE Xplore. Restrictions apply.
Jamal Raiyn et al. [12] depict a model that distinguishes few insights from the dataset to show the essential qualities of
traffic episodes by thinking about the speed variety of the fatal mishaps. Then, at that point, applied random forest
vehicles that are situated prior and then afterward (upstream regressor and decision tree regressor to predict the number of
and downstream) a specific point on the interstate. Alkheder fatal accidents among the different transportation parameters.
et al. [13] introduced Neural Network (NN) techniques to
foresee the seriousness (minor, moderate, extreme, passing) of
auto collisions in Abu Dhabi. The investigation was upheld
5973 casualty records that had happened over 6 years. The
general exactness of the model is 81.6% respectively.
Mahendra G et al. [14] analyzed different transportation data,
machine learning methods used for analyzing different
transportation data, and different challenges and big data
applications in the intelligent transportation system. X. Gao
et al. [15] proposed a Weighted Random Forest (WQRF)
algorithm to predict employee salary turnover in industries.
Employee’s salary turnover is predicted based on overtime,
age, monthly income, distance from home, and years at the
company. M Chen et al. [16] presented Random Forest (RF),
Classification And Regression Tree (CART), and Logistic
Regression (LR) models and analysis, and comparisons of
these models to predict accidents based on different levels of
input variables. The random forest model outperforms other
models with an accuracy of 73.38% for 15 original variables.
Fig. 1. The flow of the Proposed Model.
Rabia Emhamed et al. [17] presented AdaBoost, Random
Forest, Logistic Regression, and Naïve Bayes for the severity
1) Random Forest Algorithm
of traffic accident prediction and collected traffic accident data
At this point, the random forest is controlled by a subset of
from Michigan Traffic Agencies. The accuracy of the Random
artificial intelligence known as machine learning, rather than
Forest algorithm outperforms other methods by 75.5%. J H
a collection of numerous decision trees. The random forest
Kim et al. [18] depicts machine learning models to predict
approach creates a set of decision trees using a subset of the
accidents at a container port. Comparing the accuracy of the
training data that is arbitrarily chosen and then selects
Deep Neural Network Model, Random Forest Model, and
predictions from each tree. The best option is then chosen
Gradient Boosting Model for time series data sets with various
through voting by the random forest algorithm. Because the
time intervals. Gradient Boosting and Deep Neural Network
random forest technique uses several decision trees, it
models are the best in predicting accidents at the container
decreases the impact of noisy findings, whereas a single
port.
decision tree's prediction result can be affected by noise.
III. METHODS Models for classification and regression can be built using
A. System Architecture random forest methods.
The approach for study is shown in Figure 1. The Each decision tree in a random forest classification model
architecture of the system is built using incident datasets that will vote, and the most common prediction class will be
can help understand the characteristics of many attributes such chosen as the outcome. The mean of all decision tree outcomes
as road conditions, weather conditions, locations, etc. It can is used as the final result in the random forest regression
help the user calculate useful safety measures to avoid model.
accidents.
The steps for creating a random forest regressor model are
B. Data Preparation as follows:
Readiness of information is performed before each model Step 1: Import the dataset and packages.
development. All records with missing facts in the selected Step 2: Define the features and the goal.
features were taken out. Data for this study was obtained from Step 3: Split the original dataset into train and test sets.
the Department of Transportation's government of India [19]. Step 4: Create a random forest regression model with the
The datasets of the year 2016 across different locations of random forest regressor function.
India were collected. Step 5: Confirm that the random forest regression model
C. Modeling is correct.
The data collected for this study is around 300 records of 2) Decision Tree Regression
road accident data in Comma Separated Value (CSV) format. Decision tree regression examines an object's
Preprocessing raw data to make readiness for developing the characteristics and trains a model within a tree's structure to
proposed model. Splitting data set into 70% training data set forecast data in the future and provide meaningful continuous
and 30% for testing. The proposed Random Forest Regressor output. The output/result isn't discrete, that is, it isn't
model is developed using Random Forest algorithm and represented solely by a discrete, known set of numbers or
Decision Tree Regressor model using Decision Tree values.
Regressor to predict fatal accidents based on several accidents Step by step Implementation in stages:
that happened in different locations of India based on different
road accident attributes such as head-on collision, fog weather Step 1: Import the libraries.
condition, single-lane, and pedestrian crossing. Visualized a Step 2: Create the Dataset and load it.
Authorized licensed use limited to: Indian Inst of Inform Technology Guwahati. Downloaded on November 22,2023 at 07:34:46 UTC from IEEE Xplore. Restrictions apply.
Step 3: Split the original dataset into train and test sets. other (clear). The majority of the accidents occurred in
Step 4: Fitting the decision tree regressor to the dataset. fog weather, as opposed to clear weather.
Step 5: Predicting a new value. iii) Figure 2(c) depicts the number of accidents that
Step 6: Visualize the outcome. occurred on various types of roads. The majority of
IV. RESULTS collisions occurred in single lanes. This is obvious
because the most common form of road condition is a
The results analysis incorporates the use of a random forest one-lane road.
regressor and a decision tree regressor to forecast the number
of fatal accidents across India. iv) Figure 2(d) depicts the number of accidents that
occurred in various places. The majority of the
The following data types were evaluated for the study: accidents occurred in the marketplace and at schools
i) Figure 2(a) depicts the number of accidents that and colleges. This is understandable because the most
occurred as a result of various collision types. common examples of locales are schools and markets.
Surprisingly, the majority of incidents in transportation V. DISCUSSION
do not involve motor vehicles. The number of people
and fatalities involved in front-to-front or head-on A. Statistics
crashes is substantially higher than in other motor Figure 3 depicts the timing of the accident. The greatest
vehicle collisions in transportation. accidents occurred between 9 a.m. and 12 p.m., while the least
ii) Figure 2(b) depicts the number of accidents that occurred between 6 p.m. and 9 p.m. Figure 4 depicts the many
occurred under various weather conditions. The types of vehicles engaged in collisions: two-wheelers, three-
majority of the collisions occurred in other (clear) wheelers, four-wheelers, heavy vehicles, and other vehicles.
meteorological conditions. This is acceptable because
the most common situation of weather conditions is
(a) Accidents based on different collisions. (b) Accidents are based on different weather conditions.
(c) Accidents that occurred on various types of roads. (d) Accidents are based on different locations.
Fig. 2. Visualization of accidents based on four attributes.
Authorized licensed use limited to: Indian Inst of Inform Technology Guwahati. Downloaded on November 22,2023 at 07:34:46 UTC from IEEE Xplore. Restrictions apply.
For performance evaluation, compare the Random Forest
Regressor (RFR) and Decision Tree Regressor (DTR) of the
two developed models. 70 percent of statistics from various
Indian states are used as training data, while 30 percent of data
is used for model evaluation. Meanwhile, compare the
predicted and original values to calculate the metric MAE,
MAPE, and RMSE.
The results in Table 1 show that the random forest
regressor model outperforms the decision tree regressor model
in the task of predicting head-on collision and pedestrian
crossing accidents in different states of India. The random
forest regressor has a training score of 91.8 percent and a
testing score of 94.2 percent for head-on collision and training
score of 90.4 percent and a testing score of 90.3 percent for
pedestrian crossing. Similarly, the decision tree regressor's
Fig. 3. Timing of accident. training and testing scores are 99.9 percent and 84.8 percent
for a head-on collision, and training and testing scores are 100
percent and 86.1 percent for pedestrian crossing, respectively.
Authorized licensed use limited to: Indian Inst of Inform Technology Guwahati. Downloaded on November 22,2023 at 07:34:46 UTC from IEEE Xplore. Restrictions apply.
and 5(c)). Figure 5 compares the visualization and
performance of the random forest regressor model and the
decision tree regressor model.
(a) Comparative analysis of nature of the collision. (b) Comparative analysis of weather conditions.
(c) Comparative analysis of types of roads. (d) Comparative analysis of nature of the location.
Fig. 5. Comparative analysis of accidents based on four attributes.
Authorized licensed use limited to: Indian Inst of Inform Technology Guwahati. Downloaded on November 22,2023 at 07:34:46 UTC from IEEE Xplore. Restrictions apply.
[14] Mahendra G, Roopashree H R, Yogeesh A C. Analysis of the big data
methods, challenges, and applications in intelligent transportation
systems. International Journal of Advanced Trends in Computer
Science and Engineering. 2020; 9(5):7478-7486.
[15] X. Gao, J. Wen, C Zhang. An improved random forest algorithm for
predicting employee turnover. Hindawi. 2019; 4140707:12 p.
[16] Mu-Ming Chen and Mu-Chen Chen. Modeling road accident severity
with comparisons of logistic regression, decision tree and random
forest. MDPI. Information 2020; 11(270):23p.
[17] Rabia Emhamed A, Keneth Morgan K wayu, Maha Reda Alkasisbeh,
Abdulbaset Ali Frefer. Comparison of machine learning algorithms for
predicting traffic accident severity. IEEE. Jordan International Joint
Conference on Electrical Engineering and Information Technology.
2019:272-276.
[18] Jae Hun Kim, Juyeon Kim, Gunwoo Lee, Juneyoung Park. Machine
learning-based models for accident prediction at a korean container
port. MDPI. Sustainability 2021;13(9137):14p.
[19] Department of transportation. Government of India. India.
[20] Yisheng Lv, Yanjie Duan, Wenwen Kang, Zhengxi Li, and Fei-Yue
Wang. Traffic flow prediction with big data: A deep learning approach.
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION
SYSTEMS. 2015; 16(2):865-873.
[21] Wei Li, Xujian Zhao, and Shiyu Liu. Traffic accident prediction based
on multivariable grey model. Information. 2020; 11(184):12p.
Authorized licensed use limited to: Indian Inst of Inform Technology Guwahati. Downloaded on November 22,2023 at 07:34:46 UTC from IEEE Xplore. Restrictions apply.