0% found this document useful (0 votes)
43 views6 pages

Leveraging Computer Vision For Emergency Vehicle Detection-Implementation and Analysis

This paper explores the implementation of computer vision techniques, specifically object detection and instance segmentation, for emergency vehicle detection in Intelligent Transportation Systems. The study evaluates the effectiveness of Faster RCNN and Mask RCNN algorithms in recognizing emergency vehicles amidst traffic congestion, highlighting their potential applications in smart traffic signals and autonomous vehicles. The research emphasizes the need for tailored algorithms to address the unique challenges of emergency vehicle detection in the Indian context.

Uploaded by

ABHINAV SAHU
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views6 pages

Leveraging Computer Vision For Emergency Vehicle Detection-Implementation and Analysis

This paper explores the implementation of computer vision techniques, specifically object detection and instance segmentation, for emergency vehicle detection in Intelligent Transportation Systems. The study evaluates the effectiveness of Faster RCNN and Mask RCNN algorithms in recognizing emergency vehicles amidst traffic congestion, highlighting their potential applications in smart traffic signals and autonomous vehicles. The research emphasizes the need for tailored algorithms to address the unique challenges of emergency vehicle detection in the Indian context.

Uploaded by

ABHINAV SAHU
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

IEEE - 49239

LEVERAGING COMPUTER VISION FOR EMERGENCY VEHICLE DETECTION-


IMPLEMENTATION AND ANALYSIS
Kaushik S Abhishek Raman Dr. Rajeswara Rao K.V.S
Student, Dept. of Industrial Student, Dept. of Industrial Associate Professor
Engineering and Management, Engineering and Management, Industrial Engineering and Management
RV College of Engineering, RV College of Engineering, RV College of Engineering
Bangalore, INDIA Bangalore, INDIA Bangalore, INDIA
[email protected] [email protected] [email protected]

Abstract- Recent advances in Computer Vision system is more robust with higher detection accuracies
technology have revolutionized the field of Intelligent as well as swift response times, the proviso being that it
Transportation Systems. The applications are far has to be programmed in a comprehensive manner to
reaching- right from traffic monitoring systems to self- accommodate the possible scenarios of detection (under
driving cars. Most applications entail at least simple, if high traffic, interference, noise etc.). This paper
not advanced image or video analytics at a explores state of the art computer vision algorithms as
fundamental level. This paper is an attempt to examine tools to automate the detection of emergency vehicles
the use of object detection and instance segmentation i.e. the first mechanism of the two necessary
for emergency vehicle detection, which is indispensable mechanisms explained previously. The efficacy of these
to any Intelligent Transportation System. More
algorithms is assessed vis a vis two use cases-
particularly, emergency vehicle detection can be
intelligent traffic signal and autonomous vehicles.
programmed into autonomous vehicles as well as
traffic signal controllers for preferential signal II. RELATED WORK AND MOTIVATION
switching upon encountering emergency vehicles. The
architectures implemented are Faster RCNN for Rajeshwari Sundar et al [1] have implemented a Radio
object detection and Mask RCNN for instance Frequency Identification (RFID) based detection
segmentation. The computational results of these system in which emergency vehicles equipped with
implementations, their accuracies and most RFID tags are recognized by an RFID reader installed
importantly, their suitability for emergency vehicle at a signal or a junction. Another RFID based
detection in disordered traffic conditions are emergency signal detection system was devised by
deliberated. Additionally, the object detection model is Shajnush Amir et al. [2] using Programmable Logic
contrasted with instance segmentation and the merits Controllers (PLCs). An alternate technology is to
and demerits of each are identified, again in the context deploy sophisticated microphones to detect audio signal
of emergency vehicle detection. or emergency siren. Bruno Fazenda et al [3]
investigated a cross microphone array system for
Keywords- Emergency Vehicle Detection, Computer acoustic signal detection.
Vision, Object Detection, Instance Segmentation, With the advent of computer vision, image processing
Convolutional Neural Network, Faster RCNN, Mask has found applications in emergency vehicle detection
RCNN as well. Shuvendu Roy and Md. Sakif Rahman [4]
I. INTRODUCTION employed YOLO object detection for isolating an
emergency vehicle in an image. [5] demonstrates the
To realize an unimpeded movement of emergency superiority of computer vision over RFID. However,
vehicles, two synchronous mechanisms must be computer vision in the domain of emergency vehicle
constructed viz. a mechanism to detect the presence of detection is still in its fetal stages, especially in a
an emergency vehicle in a traffic congestion and a post developing economy like India. India, in fact presents a
detection mechanism that clears the traffic for the unique setting for computer vision, as eccentricities like
emergency vehicle. An array of diverse frameworks has heavy traffic, lane indiscipline, haphazard and non-
been proposed envisaging these mechanisms, involving homogeneous movement must be factored into any
various levels of automation i.e. manual, partially algorithm. It is safe to say that an algorithm modelled
automated and completely automated. The primary on foreign conditions will be rendered ineffective in
drawbacks of manual and partially automated systems India. There is very limited literature on computer
are lapses in detection and poor responsiveness (higher vision techniques for emergency vehicle detection,
response time post detection of emergency vehicle). especially factoring in the eccentricities of the Indian
However, an autonomous or completely automated context and hence the goal of this research is to execute

11th ICCCNT 2020


Authorized licensed use limited to: MIT-World Peace University. Downloaded on March 24,2025 at 09:52:36 UTC from IEEE Xplore. Restrictions apply.
July 1-3, 2020 - IIT - Kharagpur
Kharagpur, India
IEEE - 49239

and contrast a couple of image processing algorithms computations as well as the number of images required
i.e. object detection and instance segmentation in an for the custom dataset. They are also made fully
exclusive Indian context and further the cause of convolutional to accommodate the inputs of random
applying computer vision for emergency vehicle dimensions. Additionally, these base networks are
detection. supplemented by object detection networks like Faster
RCNN, Single Shot Detectors (SSD) and Region based
III. METHODOLOGY Fully Convolution Networks (R-FCN). Figure 1 depicts
The two selected computer vision techniques for a Faster RCNN with a VGG Net base network.
emergency vehicle recognition are object detection and The CNN architecture that has been chosen for
instance segmentation. Specifically designed implementation is Faster RCNN with a Resnet
Convolutional Neural Networks (CNNs) called Faster backbone [6]. The Faster RCNN is itself a combination
RCNN and Mask RCNN are employed for object of two networks, the Fast RCNN and the Region
detection and instance segmentation respectively. Proposal Network (RPN). The backbone network of the
These CNNs were initially trained, iteration wise, to RPN generates an output feature map that is examined
recognize features of emergency vehicles from their by the RPN for possible presence of object under
images. This was followed by comprehensive testing of detection. There is extensive parameter sharing between
the trained models for detection accuracy on an the backbone networks of both the RPN and the Fast
alternate unseen dataset. The results were categorized RCNN to facilitate computational efficiency.
as true positives, false positives, true negatives and false
B. Nature of Dataset and Training
negatives.
A custom dataset consisting of 400 downloaded images
IV. THE OBJECT DETECTION MODEL of emergency vehicles was assembled and divided in an
A. Principle and Architecture 80:20 ratio for training and validation respectively. The

Figure 1- Representation of Faster RCNN

Object detection is the localization of a particular object nature of the images ensured that traffic congestion,
in an image by means of generating a bounding box disorderly movement and non-homogeneity in the
around the object. Conventional convolutional neural shapes and sizes of the emergency vehicles were
networks that perform image classification (Resnet, factored into the model. The object detection model was
VGG Net, Inception Net etc.) serve as the backbone for trained in the TensorFlow deep learning platform for
object detection. The principle of transfer learning is 10200 iterations to decrease the values of training loss.
employed i.e. these base networks are pre-trained on The subsequent values of training and validation loss
large pre-existing datasets to minimize the number of obtained were 0.0086 and 0.0029 (figure 2).

11th ICCCNT 2020


Authorized licensed use limited to: MIT-World Peace University. Downloaded on March 24,2025 at 09:52:36 UTC from IEEE Xplore. Restrictions apply.
July 1-3, 2020 - IIT - Kharagpur
Kharagpur, India
IEEE - 49239

Figure 2 (a)- Training loss vs number of iterations Figure 2(b)-Validation loss vs number of iterations
C. Results
The object detection algorithm recognizes the
ambulance even when it is amidst a traffic congestion
(figure 3). The output is in the form of a bounding box
with detection accuracy in percentage.

Figure 3- Recognition of ambulance in traffic congestion

11th ICCCNT 2020


Authorized licensed use limited to: MIT-World Peace University. Downloaded on March 24,2025 at 09:52:36 UTC from IEEE Xplore. Restrictions apply.
July 1-3, 2020 - IIT - Kharagpur
Kharagpur, India
IEEE - 49239

V. THE INSTANCE SEGMENTATION MODEL

Figure 4- Mask RCNN architecture [6]


A. Principle and Architecture
IoU values. The backbone is a multilayered neural
Instance segmentation is a computer vision technique network which obtains feature maps of the input feed.
that detects and characterizes boundaries of a particular Here ResNet50 is employed as it is not a very deep
object of interest at the pixel level. Instance architecture. Fine tuning helps the model attain higher
segmentation techniques primarily employ the use of a accuracy with lesser training time.
flexible framework called Mask Region Based The ResNet backbone has a rectified linear unit (ReLU)
Convolution Neural Network (Mask RCNN) [7]. Mask activation, mathematically represented by
RCNN is a malleable and state of the art deep neural
network which accurately identifies objects in an image,
f(x)=max (0, x) (1)
video or a real time feed by enclosing them in a
bounding box and concurrently creates a prime
The Stochastic Gradient Descent algorithm with a mini
segmentation mask for specific instances detected in the
batch is used to update the weights and momentum for
feed. It is a multitasking network and surpasses all
existing models by blending both bounding box object minimizing loss and converging at the most accurate
detection (classification and location) and instance value.
segmentation synchronously.
θ = θ − η ⋅ ∇ J(θ; x(i); y(i)) (2)
The Mask R-CNN network (figure 4 [7]) has two
principal stages. The first stage anticipates or
The above equation represents the update rule for
contemplates the presence of the object in a region of
stochastic gradient descent where θ represents the
the input image also known as the Region of Interest
parameter matrix of the neural network. η represents the
(RoI). The second stage forecasts the probability,
learning rate of the algorithm, ▽ is the gradient
displays Image over Union (IoU) bounding box and the
calculated for the loss function J over the elements
binate mask around the image based on the results of
of the feature space x(i) and y(i). As parameter
the first stage. Both stages are fused in the backbone.
updates are more frequent, the rate of convergence
The network has three components namely FPN, RPN
and Backbone network architecture. The FPN or of stochastic gradient descent is quicker than
Feature Pyramid Network is a top-down or bottom-up normal batch training.
architecture and is used as a universal feature extractor. B. Nature of Dataset and Training-
Here the bottom-up architecture is implemented for A custom dataset of 400 images of emergency vehicles
feature extraction from the feed. was assimilated and annotated using the labelme tool.
The RPN or Region Proposal Network is a light Masks of respective images were generated post
network that scans the FPN bottom-up and proposes annotation. The model was trained using TensorFlow
probable regions in the image where the object is likely [9] on 320 images and validated on 80 images resulting
to be present. It then recognizes various regions by in an 80:20 split. This ensured that there was a balance
fitting multiple bounding boxes according to certain between learning the training data and validation. The

11th ICCCNT 2020


Authorized licensed use limited to: MIT-World Peace University. Downloaded on March 24,2025 at 09:52:36 UTC from IEEE Xplore. Restrictions apply.
July 1-3, 2020 - IIT - Kharagpur
Kharagpur, India
IEEE - 49239

model was first trained for 10 epochs, 1000 steps per


epoch for the last layer or heads. The model was then
fine-tuned for stage 4 and above (using ResNet50) for
10 more epochs and then it was trained for all stages for
40 more epochs. This resulted in decreasing training
loss and validation loss of the model for the specified
configurations (figure 6).
C. Results
The algorithm was tested on a dataset of images of
emergency vehicles stuck in traffic on Indian roads .
The outputs are masks generated on the object which is
enclosed in a bounding box with the class name and
accuracy (figure 7). Figure 6- Specified configurations

VI.

Figure 7- Identification of ambulance in traffic

11th ICCCNT 2020


Authorized licensed use limited to: MIT-World Peace University. Downloaded on March 24,2025 at 09:52:36 UTC from IEEE Xplore. Restrictions apply.
July 1-3, 2020 - IIT - Kharagpur
Kharagpur, India
IEEE - 49239

VI. ANALYSIS OF USE CASES VIII. CONCLUSION


The embedded systems in a traffic signal can be Two different algorithms in computer vision, object
programmed to accept an input from the detection unit detection and instance segmentation, have been
whenever an emergency vehicle is detected and described for emergency vehicle detection and
subsequently switch the signal to green from red. A localization. Both algorithms are effective in
reliable and robust system that can accurately detect an identifying an emergency vehicle from a cluster of
emergency vehicle and fast track its flow through heavy vehicles, and hence can be leveraged for applications in
city traffic is an asset to any Intelligent Transportation intelligent transportation systems like deployment of
System or Smart City venture. Autonomous vehicles smart traffic signal and autonomous vehicle.
can also have built-in emergency vehicle detection
REFERENCES
capabilities to allow priority movement of ambulances,
fire engines etc. In both use cases, it is essential to [1] R. Sundar, S. Hebbar and V. Golla, "Implementing
ensure that there are sufficient computational resources Intelligent Traffic Control System for Congestion
for the execution of the computer vision models. Control, Ambulance Clearance, and Stolen Vehicle
Detection," in IEEE Sensors Journal, vol. 15, no. 2,
Both object detection as well as image segmentation pp. 1109-1113, Feb. 2015
differ from conventional image classification in the [2] S. Amir, M. S. Kamal, S. S. Khan and K. M. A.
sense that they identify the location/coordinates of the Salam, "PLC based traffic control system with
object under detection. On the other hand, an image emergency vehicle detection and management,"
classifier would simply assign a particular label to 2017 International Conference on Intelligent
image when the object it is trained to detect is found in Computing, Instrumentation and Control
the image. Technologies (ICICICT), Kannur, 2017, pp. 1467-
For the intelligent traffic signal application, a 1472.
conventional image classifier would be ineffective as it [3] B. Fazenda, Hidajat Atmoko, Fengshou Gu, Luyang
is necessary to identify the lane in which the emergency Guan and A. Ball, "Acoustic based safety
emergency vehicle detection for intelligent transport
vehicle is present, so as to switch the signal for that systems," 2009 ICCAS-SICE, Fukuoka, 2009, pp.
particular lane. An object detection model would be an 4250-4255.
ideal fit for this application. [4] S. Roy and M. S. Rahman, "Emergency Vehicle
In the case of an autonomous vehicle, greater precision Detection on Heavy Traffic Road from CCTV
is required as the vehicle will have to maneuver itself Footage Using Deep Convolutional Neural
based on the spatial extent of the emergency vehicle. In Network," 2019 International Conference on
this case, an instance segmentation model, which traces Electrical, Computer and Communication
the emergency vehicle by performing pixel wise Engineering (ECCE), Cox's Bazar, Bangladesh,
classification, would be the best fit. Although the object 2019, pp. 1-6.
detection model will generate a bounding box, it will be [5] A. Raman, S. Kaushik, K. V. S. R. Rao and M.
Moharir, "A Hybrid Framework for Expediting
unable to provide the exact coordinates of the Emergency Vehicle Movement on Indian Roads,"
emergency vehicle. 2020 2nd International Conference on Innovative
Mechanisms for Industry Applications (ICIMIA),
VII. TESTING THE ALGORITHMS Bangalore, India, 2020, pp. 459-464, doi:
10.1109/ICIMIA48430.2020.9074933.
The following table (figure 8) encapsulates the accuracy [6] S. Ren, K. He, R. Girshick and J. Sun, "Faster R-
of the models, tested on a dataset of 100 images. CNN: Towards Real-Time Object Detection with
Region Proposal Networks," in IEEE Transactions
on Pattern Analysis and Machine Intelligence, vol.
39, no. 6, pp. 1137-1149, 1 June 2017, doi:
10.1109/TPAMI.2016.2577031
[7] L. Cai, T. Long, Y. Dai and Y. Huang, "Mask R-
CNN-Based Detection and Segmentation for
Pulmonary Nodule 3D Visualization Diagnosis," in
IEEE Access, vol. 8, pp. 44400-44409, 2020.
[8] K. He, X. Zhang, S. Ren and J. Sun, "Deep Residual
Learning for Image Recognition," 2016 IEEE
Figure 8- Tabular representation of accuracy Conference on Computer Vision and Pattern
Recognition (CVPR), Las Vegas, NV, 2016, pp.
The cumulative accuracy will be the sum of True 770-778.
[9] Waleed Abdulla, "Mask R-CNN for object detection
positives and True negatives i.e. 81 % for object
and instance segmentation on Keras and
detection and 92 % for instance segmentation. TensorFlow." https://github.com/matterport/Mask_
RCNN. (2017).

11th ICCCNT 2020


Authorized licensed use limited to: MIT-World Peace University. Downloaded on March 24,2025 at 09:52:36 UTC from IEEE Xplore. Restrictions apply.
July 1-3, 2020 - IIT - Kharagpur
Kharagpur, India

You might also like