2020 IEEE International Conference on Advances in Electrical Engineering and Computer Applications (AEECA)
YOLO Based Recognition Method for Automatic
License Plate Recognition
Waqar Riaz, Abdullah Azeem, Gao Chenqiang, Zhou Yuxi, Saifullah, Waqas Khalid
School of Communication and Information Engineering
Chongqing University of Posts and Telecommunications
[email protected]
Abstract— Over the past two decades, the number of vehicles
have increased intensely. With this increase, it is gradually more
difficult to track each vehicle for law enforcement and traffic
management purposes. Automatic License Plate Recognition
(ALPR) is a popular surveillance system that captures vehicle
images and identifies their license plate numbers and became an
important research topic of this era. In our study, AOLP dataset
is used for the license plate recognition. Keeping the strategy of
multi task learning for character string recognition we
employed YOLOv3 for the recognition and CRNN for
classification for our proposed method. For evaluation, we
allocated 40% images to the training set, 20% to the validation
set and 40% to the test set. For the test set evaluation we choose
lower threshold i.e. 0.125 achieving 99.82 recall. Our proposed
method achieved 86% recognition rate with recognizing 88% Figure 1. Lps of different layouts and notice the extensive variety in a
different format on different lp layouts
three letter and 99% of four letter plates. In the end while using
temporal redundancy, final recognition rate is significantly
improved i.e. 96%. Our proposed method improves recognition
The vehicle license plate detection algorithms, which are
rates from 93.58% to 96.1% having outperforming Sighthound often used and proposed by several international and
and OpenALPR by 9% and 4.9%, respectively. domestic researchers, are usually classified into three
categories based on Template Matching, features, and motion
Keywords—YOLOv3, CRNN, OpenALPR, Object Detection information. Since we can see that the need of License Plate
Detection and Recognition has not developed now but rather
I. INTRODUCTION numerous years back, a few studies have already been done
Automatic ALPR technology is constantly gaining in this field[3][4].
popularity, particularly in safety and traffic management Visual object detection is probably the most common one
systems [1]. Different license plate detection applications adopted by the researchers, and it is used as a basic functional
require different levels of scene analysis, including module for scene analysis in ALPR applications hence,
identifying the types of objects in the scene, locating these increasing the interest of researchers in this region. Due to the
objects, and determining the exact boundaries of each object. diversity of open deployment atmosphere, automatic scene
These scene analysis functions correspond to three basic analysis running on the ALPR platform becomes very
research tasks in the field of computer vision, commonly demanding, which introduces many new difficulties to object
known as image classification, object detection and semantic detection tasks and algorithms. These challenges mainly
(occurrence) segmentation. License Plate Recognition include: (1) how to manage the various changes commonly
Systems are used regularly for access control in structures encountered in the visual aspects of objects in attaining
and parking areas, law enforcement, stolen automobile images (for example, light, vision, small size, and ratio) (2)
detection, traffic management, automated toll collection and how to use ALPR platforms with inadequate memory and
advertising learning. There are lots of successful industrial computing power while running detection algorithm; (3)
systems [2] available and are still much documentation or How to manage the real-time requirements and detection
general public information regarding the ALPR system using accuracies.
deep learning algorithms used in plate localization and
recognition. Although, there are certain constraints to deal The AOLP dataset [5] have vehicle images in various
with, such as particular detectors or viewing angles, places and distances from the camera. Moreover, in some
appropriate lighting requirements, capturing at a instances, the car isn't completely visible in the picture. To
predetermined region, and particular kinds of vehicles (that the best of our knowledge, there aren't any public datasets for
they wouldn't find LPs from vehicles like bikes, trucks or ALPR using annotations of automobiles, bikes, LPs, and
buses) shown in Figure 1. Within this situation, Deep characters. Thus, we could point out two dominant challenges
Learning (DL) methods appeared as an efficient parameter in in our dataset. To begin with, generally, vehicle and bike LPs
the current field. have different aspect ratios, not permitting ALPR methods to
use this constraint to filter false positives. Also, car and bike
LPs have various designs and positions. As significant
978-1-7281-6521-9/20/$31.00 ©2020 IEEE 87 Dalian, China•August 25-27, 2020
Authorized licensed use limited to: Zhejiang University. Downloaded on June 10,2025 at 12:12:47 UTC from IEEE Xplore. Restrictions apply.
improvements in object detection were attained via YOLO- process skips the character segmentation point and
inspired models [6], we chose fine-tuning to get it for ALPR. immediately recognizes the character string of an image (here,
On the other side, Consequently, Fast-YOLO is considerably the cropped LP characters). As there could be numerous
quicker but less precise than YOLOv3 [7] adopted in our characters, each character is entitled as a task on the network.
work. Since we're processing movie frames, we also employ
temporal redundancy like we process every frame B. Convolutional Recurrent Neural Network
individually and then combine the results to make a more CRNN is a version constructed for scene text recognition
robust calculation for every automobile. Based on the [15], which is made up of convolutional layers followed by
YOLOv3 deep learning algorithm, the proposed method recurrent layers, along with a custom transcription layer was
outperforms results on AOLP public vehicle images dataset made to convert the per-frame predictions to a tag sequence.
for an Automatic License Plate Recognition system. This layer manages the input for a sequence labeling issue,
forecasting a tag distribution x = x1, x2, . . . , xn for every
II. OBJECT DETECTION ALGORITHM BY YOLOV3 single feature vector y = y1, y2, … , yn in the feature map.
Driven by the progress in computing power (i.e., GPU and In the subsequent sections, we explain the CNN models
deep learning chips) and the accessibility of large-scale utilized for license plate character detection and recognition.
samples (e.g., COCO [8]), in accordance to the fact that deep It's worth noting that all parameters (e.g., CNNs input,
neural network has a fast, scalable and end-to-end learning number of epochs, amongst others) given here are defined
framework, so it has been extensively practiced. In particular, depending on the validation group and presented in the
compared with commonly used shallow methods, the CNN further section, where the experiments have been reported.
model has significantly improved image classification (e.g.,
ResNet [9]), and object detection (e.g., Faster R-CNN [10]) III. METHODOLOGY
and semantic segmentation (Mask R-CNN [11]), etc. This
Here we cast a framework which detects and recognizes
detection framework has provoked great interest as well as,
LPs in a given image. Our approach is designed to focus on
in recent years, many advanced object detectors based on
finding and reading LPs in tough environments. This type of
CNN have been proposed.
method can be divided into three main parts:
Moreover, the YOLO[12] series model is a real-time
• LPs Detection
object detection system based on Convolutional Neural
Network (CNN). Keeping the network architecture of Google • Character Segmentation
Net, the YOLO network is doing the same only the difference
in 1x1 layers for cross channel information integration and • Character Recognition
also convolution layer of 3x3. YOLO is based on 2 fully
connected layers and 24 convolutional layers in network
structure. Ali Farhadi [13] published YOLO v2, highlighting
intensive enhancement in the speed and accuracy of the
algorithm. Furthermore, Ali Farhadi rectified and proposed
YOLO v3, which further improves the object detection
performance [7], it detects small objects and dense or
overlapping compact objects that may be the most popular
deep object detector in practical applications, presenting the
accuracy and speed of detection are very balanced. Further
key improvements include:
• Loss: Softmax loss of YOLOv2 being substituted by
logistic loss in YOLO v3. While the predicted object
Figure 2 Flow diagram of the proposed ALPR system
class is more complicated, especially selecting
logistic regression is more effective when there are
In our proposed method, we trained YOLOv3 for vehicle
many overlapping labels in the data set.
and LPs detection, as shown in FIGURE 2. In a simple
• Anchor point: YOLOv3 uses 9 anchor points scenario, YOLOv3 should be able to detect LP, but in more
instead of 5 anchor points of YOLO v2, thus complex scenes, it might not be deep enough to achieve this
improving Intersection over Union (IoU). task. Therefore, in order to use YOLOv3, we need to change
the last number of layers considering the number of classes.
• Detection: YOLOv2 uses only 1 while YOLOv3 YOLO utilizes predicting boxes (A=9) and K class
uses 3 detections, by which detection effects of probability as shown in Eq. (1) to predict bounding boxes,
small objects get impressively improved. confidence, and class probability.
• Backbone: In YOLOv3, the Darknet-19 network of
YOLOv2 is switched with a darknet-53 network, 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 = (𝐾𝐾 + 5) ∗ 𝐴𝐴 (1)
thereby improving the accuracy of object detection
by network deepening. Our paper uses the latest In order to detect LPs, vehicle and LPs coordinates are
YOLOv3 model to perform the detection of the used to train the CNN for vehicle LPs detection. In the
AOLP dataset. validation set, in order to detect all the vehicles, we evaluated
the confidence threshold having the lowest false positive rate.
A. Multitask Learning
After LP detection, the character segmentation algorithm
Multi-Task learning [14] is just another strategy for (architecture shown in TABLE I.) is trained using LP with
character string recognition created for license plates. This margins and coordinates of characters. As described in the
88
Authorized licensed use limited to: Zhejiang University. Downloaded on June 10,2025 at 12:12:47 UTC from IEEE Xplore. Restrictions apply.
previous stage, this margin is based on the validation set to images, 757 law enforcement, and 681 Access control (AC).
make sure that all the characters lie with predicted LP size. These Images are captures in different weather conditions,
In order to increase the training set, we also generate a different locations, and illumination conditions. For
negative image of every LP. evaluation, we divided 40% images to the training set, 20%
to the validation set, and 40% to test set.
TABLE I. CHARACTER SEGMENTATION CNN ARCHITECTURE
Layer Filters Size
Conv 32 3x3
Max
Conv 64 3x3
Max
Conv 128 3x3/2
Conv 64 1x1
Conv 128 3x3
Max
Conv 256 3x3/2
Conv 128 1x1
Conv 256 3x3
Conv 512 3x3/2
Conv 256 1x1
Conv 512 3x3
Conv 35 1x1
Detection Figure 3. Detection results of LPs detected by our method. Pointed detection
of LPs showing the robustness of algorithm even have different camera
After LP detection and segmentation, we first introduce distance, illumination, angle, and several ambiguities
some padding (1-2 pixels) in order to improve prediction
because some characters might now be well segmented. In • Vehicle Detection: In order to perform vehicle
order to train the networks, segmented characters with labels detection, first, we evaluate confidence thresholds.
are passes Units as input. For recognition, we use the CRNN Using confidence = 0.5, we are unable to detect
algorithm [15], which is designed for text recognition. Given vehicles with complete accuracy. With confidence
the LP, containing the characters, features extracted using 0.25, all vehicles were successfully detected in the
CNN is transformed into feature vectors and then used as an validation dataset. So, based on this evaluation, we
input for the LTSM layer, helping sequence layer problem set confidence to 0.125 for the test set. Using this
and predicting a label distribution. threshold, we accomplished 100% recall with 99%
precision giving 5 false positives.
TABLE II. CRNN LAYERS
• LP Detection: In the validation set, every vehicle
Layer Input Size with LPs was predicted within the bounding box, as
Conv 64 3x3/1 shown in FIGURE 3. Therefore, the LP detection
Max 2x2/2
network was trained based on vehicle patches. As
conv 128 3x3/1
Max 2x2/2
expected, we achieved 100% recall and precision
Conv 256 3x3/1 both in validation and test set, which itself an
Conv 256 3x3/1 efficient result of our proposed approach.
max 2x2/2
conv 512 3x3/1 • Character Segmentation: For validation set
batch evaluation, we set the confidence threshold: 0.5,
conv 512 3x3/1 0.25 and 0.125, achieving 99.92% recall regardless.
batch So, in order to miss a few characters as possible, we
max 2x2/2x1 chose a lower threshold i.e., 0.125 achieving 99.82
conv 512 recall.
Layer Input Hidden
LTSM 512x1x40 256 • Character Recognition: We introduced padding
values: 1 pixel for digits and 2pixels for characters
IV. EXPERIMENTAL RESULTS in the validation set. In order to achieve even better
In this section, we conduct the experiments to verify the results, we used data augmentation with a flipped
effectiveness of the proposed algorithm. We use NVidia character. During the evaluation, we concluded that
1080Ti to conduct these experiments. We use following with augmentation and padding, letter recognition
parameters for training; 50k iterations, lr = [1-2,1-3 ,1-4, 1-5] can be improved.
with steps 10k, 20k, 25k iterations. While ignoring temporal redundancy, our proposed
A. Application Orientated LP (AOLP) Evaluation method achieved 86% recognition rate with recognizing 88%
The AOLP dataset [5] consists of 2049 images, which are three letter and 99% of four letter plates. Results were
divided into three categories, including 611 Road patrol improved significantly while taking advantage of temporal
redundancy. Using temporal redundancy, the final
89
Authorized licensed use limited to: Zhejiang University. Downloaded on June 10,2025 at 12:12:47 UTC from IEEE Xplore. Restrictions apply.
recognition rate was significantly improved, i.e., 96%. Our work, which advances the results using post processing rules
proposed method improved recognition rates from 93.58% to (redundancy). Current approach was essential for obtaining
96.1%. While, outperforming OpenALPR and slighthound by appropriate results subsequently, accordingly to the LP
4.9% and 9% and the results are proved in the given FIGURE layout classes, we ignored errors occurred in characters, those
4 showing the high accuracy rate of proposed YOLOv3 which were misclassified, and also in the number of predicted
model. characters to be considered. The average recognition rate is
96.1% on AOLP datasets used in the experiments while
TABLE III. RECOGNITION RATES (%) WITH REDUNDANCY ACHIEVED BY outperforming OpenALPR and Sighthound by 4.9% and 9%
PROPOSED SYSTEM INCLUDING PREVIOUS WORK ON ALOP DATASET.
respectively showing the effectiveness of our proposed
ALPR All Correct system.
Slighthound with redundancy 87.1
ACKNOWLEDGMENT
OpenALPR with redundancy 91.2
This work was supported by the Chongqing Research
Proposed with redundancy 96.1 Program of Basic ResearcH AND FRONTIER TECHNOLOGY
UNDER Grant cstc2018jcyjAX0227.
TABLE IV. TIME REQUIRED FOR NVIDIA 1080TI TO PROCESS THE
ALGORITHM REFERENCES
ALPR Stage Time [1] C. Gou, K. Wang, Y. Yao, and Z. Li, “Vehicle license plate recognition
based on extremal regions and restricted boltzmann machines,” IEEE
Vehicle Detection 10.211ms Transactions on Intelligent Transportation Systems, vol. 17, no. 4, pp.
LP segmentation and Classification 2.31ms 1096–1107, April 2016.
Character Recognition 1.590ms [2] R. Laroca, E. Severo, L. A. Zanlorensi et al., “A robust realtime
automatic license plate recognition based on the YOLO detector,” in
Total 14.111ms Proceedings of the 2018 International Joint Conference on Neural
Networks (IJCNN), pp. 1–10, Rio de Janeiro, Brazil, July 2018.
[3] Salah Al-Shami, Ali El-Zaart, Rached Zantout, Ahmed Zekri and
Khaled Almustafa. 2015 “A New Feature Extraction Method for
Licence Plate Recognition”, In 2015 Fifth International Conference on
Digital Information and Communication Technology and Its
Applications (DICTAP), Page(s) 64–69, IEEE 2015.
[4] Nicolas Thome, Antoine Vacavant, Lionel Robinault, and Serge
Miguet, "A cognitive and video-based approach for multinational
License Plate Recognition", Machine Vision and Applications,
Springer-Verlag, pp. 389-407, 2011.
[5] Hsu, G.-S.; Chen, J.-C.; Chung, Y.-Z.; , "Application-Oriented License
Plate Recognition," Vehicular Technology, IEEE Transactions on ,
vol.62, no.2, pp.552-561, Feb. 2013
[6] B. Wu, F. Iandola, P. H. Jin, and K. Keutzer, “SqueezeDet: Unified,
small, low power fully convolutional neural networks for real-time
object detection for autonomous driving,” in 2017 IEEE Conference on
Computer Vision and Pattern Recognition Workshops (CVPRW), July
2017, pp. 446–454.
[7] Redmon, J. and Farhadi, A. (2018) YOLO v3: An Incremental
Improvement.
[8] Lin, T.-Y. et al. Microsoft COCO: Common Objects in Context. in
Computer Vision – ECCV 2014 (eds. Fleet, D., Pajdla, T., Schiele, B.
& Tuytelaars, T.) 740–755 (Springer International Publishing, 2014).
[9] He, K., Zhang, X., Ren, S. & Sun, J. Deep Residual Learning for Image
Recognition. in 2016 IEEE Conference on Computer Vision and
Pattern Recognition (CVPR) 770–778 (IEEE, 2016).
[10] Ren, S., He, K., Girshick, R. & Sun, J. Faster R-CNN: Towards Real-
Time Object Detection with Region Proposal Networks. IEEE Trans.
Pattern Anal. Mach. Intell. 39, 1137–1149 (2017).
[11] He, K., Gkioxari, G., Dollar, P. & Girshick, R. Mask R-CNN. in
Proceedings of the IEEE international conference on computer vision
2961–2969 (2017).
[12] Redmon, J., Divvala, S., Girshick, R. and Farhadi, A. (2016) You Only
Look Once: Unified, Real-Time Object Detection. Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, Las
Vegas, 27-30 June 2016, 779-788.
Figure 4 The examples showing of accurately recognized LPs by our [13] Redmon, J. and Farhadi, A. (2017) YOLO9000: Better, Faster,
algorithm Stronger. Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition, Honolulu, 21-26 July 2017, 6517-6525G.
V. CONCLUSION [14] R. Gonc¸alves, M. A. Diniz, R. Laroca, et al., “Real-time automatic
license plate recognition through deep multi-task networks,” in 31th
In the proposed document, we compiled a real-time Conference on Graphics, Patterns and Images (SIBGRAPI), 110–117
Automatic License Plate Recognition system which is (2018).
proposed using YOLOv3 and CRNN. At every stage, the [15] B. Shi, X. Bai, and C. Yao, “An end-to-end trainable neural network
accuracy and speed trade-off was proven efficient in our for image-based sequence recognition and its application to scene text
modified network. A unified methodology for License Plate recognition,” IEEE Transactions on Pattern Analysis and Machine
Intelligence 39, 2298–2304 (2017).
classification and detection has been deduced in the proposed
90
Authorized licensed use limited to: Zhejiang University. Downloaded on June 10,2025 at 12:12:47 UTC from IEEE Xplore. Restrictions apply.