DenseNet121 for Emergency Vehicle Classification
DenseNet121 for Emergency Vehicle Classification
Corresponding Author:
Amine Kherraki
Image Laboratory, School of Technology, Moulay Ismail University of Meknes
Marjane, Meknes 50050, Morocco
Email: [email protected]
1. INTRODUCTION
In recent years, intelligent transportation system (ITS) has gained much importance, due to the vast
increase in the number of cars, and other types of vehicle on the road [1]. Smart systems contain a large amount
of high-quality information for a smart and secure use [2]. For example, the ITS provides multiple services
such as transport and traffic monitoring. The aim of this monitoring is to acquire and analyze vehicle
movements, provide accurate data, and important statistics about the type and shape of the vehicle, as well as
the evaluation of road traffic and safety [1], [3]. Thus, transport and traffic monitoring comes to facilitate the
human labor, by vision-based tasks that a computer or automated system can perform [4]. Such a system is
essential for effective real-time traffic monitoring that are able to detect changes in traffic characteristics in a
timely manner, allowing regulatory agencies and authorities to respond quickly to traffic situations.
In literature, many applications for traffic video surveillance make vehicle re-identification in a multi-
camera environment [5]. These applications can report important information, such as traffic flow or traffic
information, and travel time in a distributed traffic control system. According to [6], and with regard to object
localization, the authors proposed an approach to find an instance of a vehicle by estimating its position with
ratio and size, which is widely used for vehicle tracking. In [7], the authors used a fixed camera-based
application for traffic video analysis and a central processing unit (CPU) based system to count the number of
passing vehicles along with their speed on a highway. The analysis of the results shows that the application
will not only count the number of vehicles but also estimate the speeds of the vehicle by providing more traffic
flow information. Deep learning approaches and algorithms such as convolutional neural network (CNN) are
widely used in many areas, applications, and issues in computer vision like image recognition, segmentation,
detection, and classification [8], [9]. In fact, vehicle detection methods must be fast enough to operate in real-
time, and to be immune to changes in lighting and different weather conditions, and have the ability to separate
vehicles from image sequences accurately and efficiently [10]. Image-based vehicle classification is one of the
most promising techniques for large-scale traffic data collection and analysis [11], and deep learning algorithms
are widely used in this topic. For example, the authors in [12] have used CNN architectures to classify road
vehicle images into six categories, including large buses, cars, motorcycles, minibuses, trucks, and vans. They
show that using CNN for vehicle type classification provides the most recent results on previously cropped
images containing only vehicles [12].
In this paper, we will discuss an important topic to which researchers had not given much importance
before, knowing that emergency vehicles have the top priority on the road. Our contribution consists of making
a comparison between the results of classification of emergency vehicles by using many CNN architectures.
Therefore, our paper would help researchers and developers in the implementation of some efficient real-time
emergency vehicle classification applications. The rest of the paper is organized as; section 2 presents related
work on the classification of vehicles in general, as well as the classification of emergency vehicles. Section 3
contains the definition of CNN, with a detail of each architecture. Section 4 provides details about the research
method, especially the dataset images preprocessing, and the implementation of CNNs. Section 5 presents, the
experimental results, and performance analysis discussion. Finally, section 6 contains concluding observations
and future work, with guidance in this area.
2. RELATED WORKS
2.1. Vehicle classification
Vehicle classification is a promising research task in ITS, and deep learning algorithms are the most
used techniques for this task. Indeed, CNN has performed well in this area, in terms of real-time speed and
accuracy compared to other machine learning algorithms, such as support vector machine (SVM), decision tree
(DT). According to [13], in 2015, the authors proposed a semi-supervised CNN for vehicle detection from
frontal view images. The authors used Laplacian filter learning to obtain the filters of the network on a large
amount of unlabeled data. Besides, they used a Softmax classifier in the output layer, and they trained their
model on a small amount of labeled data. A year earlier, the authors in [14] have introduced an unattended
CNN for vehicle classification. They used CNN to learn characteristics of vehicles and then classify them by
using Softmax regression. The proposed network filters are learned with a sparse filtering method. In [15], the
authors used CNN with low-resolution video images to detect and classify vehicles. The preprocessing
operation includes resizing each image and adjusting the contrast with histogram equalization. The proposed
CNN architecture detects higher-level features such as edges and corners. In addition, the authors vary the
number of filters and their sizes as well as the number of hidden layers [15]. After that, the authors in [16],
show that, a simple CNN surpasses scale-invariant feature transform (SIFT) and support vector machine (SVM)
models applied on vehicle classification. Finally in [17], [18], the authors used you only look once (YOLO)
for vehicle detection and AlexNet model for vehicle classification. Further, the authors used AlexNet as an
entity extractor and performed the classification of extracted entities by using a linear SVM.
diagnosed some features of the widespread language while, the implemented approach is based on the detection
of the characters “AMBULANCE” or “108" on the vehicle. From our point of view, we find their idea not very
practical, and sometimes there are other characters or just symbols like the moon, the cross, or another language
outright on the ambulance.
point of DenseNet121 is that it outperforms the majority of CNN architectures in terms of accuracy, and
does not require large memory [35], [36].
4. RESEARCH METHOD
In this section, we provide information about the general setup, and the dataset that we used in our
experiments. Next, we present detailed information about the building and training of the CNN network.
Figure 2 shows our workflow for classifying emergency vehicles.
Figure 3. Examples of emergency and normal vehicle images from analytics vidhya emergency vehicle
dataset [35]
the previous section. We have trained each architecture on 500 epochs, and we have used the accuracy, F1, and
loss metrics to evaluate the performances of our architectures.
Figure 5. DenseNet121 building model summary with the three added layers
which makes it inappropriate for real-time applications. For the rest of the models, including ResNet-50,
Xception, VGG19, and Inception-V3, they had 91.9%, 90.68%, 91.49%, and 88.25% respectively in accuracy,
however, their storage and processing requirements are high.
In regards to the loss metric, we notice that ResNet-50 had the minimum loss score of 17.20%, closely
followed by VGG16 with 17.86%. However, VGG19, Inception-V3, Inception-ResNet-V2 had high loss
scores, compared to the other models. Thus, they require a big amount of memory, which does not always
make them suitable for a real-time use. In terms of parameters, we notice that the MobileNetV2 had the best
result with the smallest number of required parameters. And the other models kept the same order as mentioned
in the memory metric, because the number of parameters has a direct relationship with the size of the required
memory, so the more parameters there are, the more memory there is. And this is the power of MobileNetV2
model which performs very well with a small number of parameters and low memory.
(a) (b)
Figure 7. Accuracy of implemented CNN models; (a) Accuracy of Xception, ResNetV2, VGG16, and
VGG19 CNN models; (b) Accuracy of InceptionV3, InceptionResNetV2, MobileNetV2, and DenseNet CNN
models
Figure 9 shows the confusion matrices of the implemented CNN models. The confusion matrix draws
some conclusions regarding the difficulty of the classification, and it also highlights some potential research
opportunities. In the following, we will discuss the confusion matrix result of the DenseNet121 model. In total,
we have used 247 images for validation, with 146 normal vehicles, and 101 emergency vehicles. In the “Normal
Vehicle” class, we observe that DenseNet121 recognizes 143 images out of 146, in other words, it produced
143 correct predictions out of 146, with 97.94% of the entire normal vehicles. The analysis of misclassified
images shows that missed vehicles contain advertisements, or have an abnormal shape compared to usual
normal vehicles as shown in Figure 10. In “Emergency Vehicle” class, we observe that the correct prediction
number is lower than that of “Normal Vehicle” class as illustrated in Figure 9, the DenseNet121 model
produced 92 correct predictions out of 101, ie 91.08%. This is because some of the emergency vehicle images
were taken from different distances, as well as different and difficult viewing angles. To avoid such cases, it is
necessary to add several classes and other types of vehicle to the dataset.
(a) (b)
Figure 8. Loss of implemented CNN models; (a) Loss of Xception, ResNetV2, VGG16, and VGG19 CNN
models; (b) Loss of InceptionV3, InceptionResNetV2, MobileNetV2, and DenseNet CNN models
the generated and trained DenseNet121 model, which had obtained the best score in terms of accuracy and F1
score. As shown in Figure 11, we have some examples of image classification into “Emergency Vehicle” and
“Normal vehicle”. As we have already said, this is the first paper which makes the classification of emergency
vehicle by using many CNN architectures.
There was a competition on the “Analytics Vidha platform” on this topic, where the competitor have
used ResNet-18 and ResNet-34 to classify emergency vehicles on Analytics Vidhya Emergency Vehicle
dataset [37]. The average of accuracy of the two models is 94.22% [40]. Details about the used architectures
are not mentioned, the only available information are the name of two architectures and the utilization of pre-
trained weight models from Keras. We mention that our models are trained from scratch which needs much
time for training. In addition, we emphasize that our best model performances surpass those of previous work.
Figure 10. Examples of normal vehicle images with advertisements and abnormal shapes [35]
6. CONCLUSION
In this paper, we have made a comparative study using eight famous CNN architectures to classify
emergency vehicle images. The design and implementation of an efficient deep learning system, have been
carried out to automatically classify emergency and normal vehicles in traffic scenes. First, we did a pre-
processing on the Vidhya Emergency Vehicle dataset to unify the image sizes. We have added three layers to
each CNN architecture, in particular, “GlobalAveragePooling2D” layer to reduce the dimensions of the input
image and accelerate the training, “Dropout” layer to avoid overfitting, and finally, “Dense” layer where we
put the number of output classes. Later, we made simulations of each architecture, and we notice that
DenseNet121 is the most appropriate model in real-time emergency vehicle classification with an accuracy of
95.14%, a F1 score of 93.87%, and an average order memory of 27.5 MB. Therefore, reached results are very
promising and will definitely give an important added value to applications that will use our best architecture
for the classification of emergency vehicles. The experiments allow us to sort the classification architectures
based on different criteria like accuracy, as well as memory, thus, researchers and developers can choose the
appropriate and suitable architecture for their applications. As a perspective, we plan to improve accuracy
scores and time processing, we also plan to use a hybrid approach to classify emergency vehicles based on
image and siren sound.
REFERENCES
[1] N. Buch, S. A. Velastin, and J. Orwell, “A review of computer vision techniques for the analysis of urban traffic,” IEEE Trans.
Intell. Transp. Syst., vol. 12, no. 3, pp. 920–939, Sep. 2011, doi: 10.1109/TITS.2011.2119372.
[2] “DIRECTIVE 2010/40/EU OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL of 7 July 2010 on the framework for
the deployment of Intelligent Transport Systems in the field of road transport and for interfaces with other modes of transport (Text
with EEA relevance,” 2010. Accessed: Feb. 19, 2021. [Online]. Available: https://eur-lex.europa.eu/legal-
content/EN/TXT/PDF/?uri=CELEX:32010L0040&qid=1613756510712&from=EN.
[3] M. M. Hasan, G. Saha, A. Hoque, and M. B. Majumder, “Smart traffic control system with application of image processing
techniques,” 2014, doi: 10.1109/ICIEV.2014.6850751.
[4] A. Arinaldi, J. A. Pradana, and A. A. Gurusinga, “Detection and classification of vehicles for traffic video analytics,” in Procedia
Computer Science, Jan. 2018, vol. 144, pp. 259–268, doi: 10.1016/j.procs.2018.10.527.
[5] D. Zapletal and A. Herout, “Vehicle re-identification for automatic video traffic surveillance,” in IEEE Computer Society
Conference on Computer Vision and Pattern Recognition Workshops, Dec. 2016, pp. 1568–1574, doi: 10.1109/CVPRW.2016.195.
[6] M. Ozuysal, V. Lepetit, and P. Fua, “Pose estimation for category specific multiview object localization,” Mar. 2010, pp. 778–785,
doi: 10.1109/cvpr.2009.5206633.
[7] S. H. Kim, J. Shi, A. Alfarrarjeh, D. Xu, Y. Tan, and C. Shahabi, “Real-time traffic video analysis using intel viewmont
coprocessor,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes
in Bioinformatics), 2013, vol. 7813 LNCS, pp. 150–160, doi: 10.1007/978-3-642-37134-9_12.
[8] E. Maggiori, Y. Tarabalka, G. Charpiat, and P. Alliez, “Convolutional neural networks for large-scale remote-sensing image
classification,” IEEE Trans. Geosci. Remote Sens., vol. 55, no. 2, pp. 645–657, Feb. 2017, doi: 10.1109/TGRS.2016.2612821.
[9] G. Cheng, C. Yang, X. Yao, L. Guo, and J. Han, “When deep learning meets metric learning: remote sensing image scene
classification via learning discriminative CNNs,” IEEE Trans. Geosci. Remote Sens., vol. 56, no. 5, pp. 2811–2821, May 2018, doi:
10.1109/TGRS.2017.2783902.
[10] S. Gupte, O. Masoud, R. F. K. Martin, and N. P. Papanikolopoulos, “Detection and classification of vehicles,” IEEE Trans. Intell.
Transp. Syst., vol. 3, no. 1, pp. 37–47, Mar. 2002, doi: 10.1109/6979.994794.
[11] L. W. Tsai, J. W. Hsieh, and K. C. Fan, “Vehicle detection using normalized color and edge map,” IEEE Trans. Image Process.,
vol. 16, no. 3, pp. 850–864, Mar. 2007, doi: 10.1109/TIP.2007.891147.
[12] L. Zhuo, L. Jiang, Z. Zhu, J. Li, J. Zhang, and H. Long, “Vehicle classification for large-scale traffic surveillance videos using
Convolutional Neural Networks,” Mach. Vis. Appl., vol. 28, no. 7, pp. 793–802, Oct. 2017, doi: 10.1007/s00138-017-0846-2.
[13] Z. Dong, Y. Wu, M. Pei, and Y. Jia, “Vehicle Type Classification using a semisupervised convolutional neural network,” IEEE
Trans. Intell. Transp. Syst., vol. 16, no. 4, pp. 2247–2256, Aug. 2015, doi: 10.1109/TITS.2015.2402438.
[14] Z. Dong, M. Pei, Y. He, T. Liu, Y. Dong, and Y. Jia, “Vehicle type classification using unsupervised convolutional neural network,”
in Proceedings - International Conference on Pattern Recognition, Dec. 2014, pp. 172–177, doi: 10.1109/ICPR.2014.39.
[15] C. M. Bautista, C. A. Dy, M. I. Mañalac, R. A. Orbe, and M. Cordel, “Convolutional neural network for vehicle detection in low
resolution traffic videos,” in Proceedings - 2016 IEEE Region 10 Symposium, TENSYMP 2016, Jul. 2016, pp. 277–281, doi:
10.1109/TENCONSpring.2016.7519418.
[16] H. Huttunen, F. S. Yancheshmeh, and C. Ke, “Car type recognition with deep neural networks,” in IEEE Intelligent Vehicles
Symposium, Proceedings, Aug. 2016, vol. 2016-August, pp. 1115–1120, doi: 10.1109/IVS.2016.7535529.
[17] Y. Zhou, H. Nejati, T. T. Do, N. M. Cheung, and L. Cheah, “Image-based vehicle analysis using deep neural network: A systematic
study,” in International Conference on Digital Signal Processing, DSP, Jul. 2016, vol. 0, pp. 276–280, doi:
10.1109/ICDSP.2016.7868561.
[18] X. Li and X. Guo, “A HOG feature and SVM based method for forward vehicle detection with single camera,” in Proceedings -
2013 5th International Conference on Intelligent Human-Machine Systems and Cybernetics, IHMSC 2013, 2013, vol. 1, pp. 263–
266, doi: 10.1109/IHMSC.2013.69.
[19] H. Razalli, R. Ramli, and M. H. Alkawaz, “Emergency vehicle recognition and classification method using HSV color
segmentation,” in Proceedings - 2020 16th IEEE International Colloquium on Signal Processing and its Applications, CSPA 2020,
Feb. 2020, pp. 284–289, doi: 10.1109/CSPA48992.2020.9068695.
[20] P. Gowtham, P. Eswari, and V. P. Arunachalam, “An Investigation approach used for pattern classification and recognition of an
emergency vehicle,” Dec. 2018, doi: 10.1109/ICSNS.2018.8573610.
[21] “Convolutional Neural Networks (LeNet) - DeepLearning 0.1 Documentation | 2 D Computer Graphics | Cognitive Science.”
https://www.scribd.com/document/333051443/Convolutional-Neural-Networks-LeNet-DeepLearning-0-1-Documentation
(accessed Feb. 19, 2021).
[22] M. Boukabous and M. Azizi, “Review of learning-based techniques of sentiment analysis for security purposes,” Springer, Cham,
2021, pp. 96–109.
[23] A. Moubayed, M. Injadat, A. B. Nassif, H. Lutfiyya, and A. Shami, “E-learning: challenges and research opportunities using
machine learning data analytics,” IEEE Access, vol. 6, pp. 39117–39138, Jul. 2018, doi: 10.1109/ACCESS.2018.2851790.
[24] “Traffic sign detection using convolutional neural network | by Sanket Doshi | Towards Data Science.”
https://towardsdatascience.com/traffic-sign-detection-using-convolutional-neural-network-660fb32fe90e (accessed Feb. 19, 2021).
[25] J. Gu et al., “Recent advances in convolutional neural networks,” Pattern Recognit., vol. 77, pp. 354–377, May 2018, doi:
10.1016/j.patcog.2017.10.013.
[26] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” 2015, Accessed: Feb. 19,
2021. [Online]. Available: http://www.robots.ox.ac.uk/.
[27] I. Idrissi, M. Boukabous, M. Azizi, O. Moussaoui, and H. El Fadili, “Toward a deep learning-based intrusion detection system for
iot against botnet attacks,” IAES Int. J. Artif. Intell., vol. 10, no. 1, pp. 110–120, 2021, doi: 10.11591/ijai.v10.i1.pp110-120.
[28] C. Kyrkou and T. Theocharides, “EmergencyNet: Efficient aerial image classification for drone-based emergency monitoring using
atrous convolutional feature fusion,” IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens., vol. 13, pp. 1687–1699, 2020, doi:
10.1109/JSTARS.2020.2969809.
[29] G. G. Dario et al., “On the behavior of convolutional nets for feature extraction,” J. Artif. Intell. Res., vol. 61, pp. 563–592, 2018,
doi: 10.1613/jair.5756.
[30] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Computer Society
Conference on Computer Vision and Pattern Recognition, Dec. 2016, pp. 770–778, doi: 10.1109/CVPR.2016.90.
[31] C. Szegedy et al., “Going deeper with convolutions,” in Proceedings of the IEEE Computer Society Conference on Computer Vision
and Pattern Recognition, Oct. 2015, vol. 07-12-June-2015, pp. 1–9, doi: 10.1109/CVPR.2015.7298594.
[32] F. Chollet, “Xception: Deep learning with depthwise separable convolutions,” in Proceedings - 30th IEEE Conference on Computer
Vision and Pattern Recognition, CVPR 2017, Nov. 2017, vol. 2017-January, pp. 1800–1807, doi: 10.1109/CVPR.2017.195.
[33] A. G. Howard et al., “MobileNets: Efficient convolutional neural networks for mobile vision applications,” arXiv, 2017.
[34] M. Längkvist, L. Karlsson, and A. Loutfi, “Inception-v4, Inception-ResNet and the impact of residual connections on learning,”
Pattern Recognit. Lett., vol. 42, no. 1, pp. 11–24, 2014, [Online]. Available: http://arxiv.org/abs/1512.00567.
Deep convolutional neural networks architecture for … (Amine Kherraki)
120 ISSN: 2252-8938
[35] I. Allaouzi, M. Ben Ahmed, and B. Benamrou, “An Encoder-Decoder model for visual question answering in the medical domain,”
CEUR Workshop Proc., vol. 2380, no. September 2019, pp. 9–12, 2019.
[36] C. Ye, C. Devaraj, M. Maynord, C. Fermüller, and Y. Aloimonos, “Evenly cascaded convolutional networks,” Proc. - 2018 IEEE
Int. Conf. Big Data, Big Data 2018, pp. 4640–4647, 2019, doi: 10.1109/BigData.2018.8622196.
[37] “JanataHack_AV_ComputerVision | Kaggle.” https://www.kaggle.com/shravankoninti/janatahack-av-computervision (accessed
Feb. 19, 2021).
[38] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: A simple way to prevent neural networks
from overfitting,” J. Mach. Learn. Res. 15 1929-1958, vol. 299, no. 3–4, pp. 345–350, 2014, doi: 10.1016/0370-2693(93)90272-J.
[39] Q. Ji, J. Huang, W. He, and Y. Sun, “Optimized deep convolutional neural networks for identification of macular diseases from
optical coherence tomography images,” Algorithms, vol. 12, no. 3, pp. 1–12, 2019, doi: 10.3390/a12030051.
[40] “Emergency Vehicle Classification - PyTorch, ResNet | Kaggle.” https://www.kaggle.com/shravankoninti/emergency-vehicle-
classification-pytorch-resnet (accessed Feb. 19, 2021).
BIOGRAPHIES OF AUTHORS
Amine Kherraki was born in Meknes, Morocco, in 1994. He received the B.S
degree in School of Technology from Hassan First University at Berrechid, Morocco, in
2017. He received the M.S degree in Computer Science at the National School of Applied
Science, Sidi Mohamed Ben Abdellah University, Fez, Morocco. Currently, He is Ph.D.
candidate at Moulay Ismail University, Meknes, Morocco. His research interests include
Deep Learning, Computer Vision, Business Intelligence, and Big Data. He can be contacted
at email: [email protected]