0% found this document useful (0 votes)
28 views10 pages

EILPR Toward End-To-End Irregular License Plate Recognition Based On Automatic P

This paper presents a novel end-to-end irregular license plate recognition (EILPR) system that addresses challenges in automatic license plate recognition (ALPR) due to multi-line text and perspective distortions. The EILPR utilizes a coarse-to-fine strategy for feature extraction, incorporating an automatic perspective alignment network (APAN) to improve recognition accuracy for various license plate types. Experimental results demonstrate that EILPR achieves state-of-the-art performance on multiple license plate benchmarks, highlighting its effectiveness in real-world applications.

Uploaded by

emil hard
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views10 pages

EILPR Toward End-To-End Irregular License Plate Recognition Based On Automatic P

This paper presents a novel end-to-end irregular license plate recognition (EILPR) system that addresses challenges in automatic license plate recognition (ALPR) due to multi-line text and perspective distortions. The EILPR utilizes a coarse-to-fine strategy for feature extraction, incorporating an automatic perspective alignment network (APAN) to improve recognition accuracy for various license plate types. Experimental results demonstrate that EILPR achieves state-of-the-art performance on multiple license plate benchmarks, highlighting its effectiveness in real-world applications.

Uploaded by

emil hard
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

2586 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 23, NO.

3, MARCH 2022

EILPR: Toward End-to-End Irregular License Plate


Recognition Based on Automatic
Perspective Alignment
Hui Xu , Xiang-Dong Zhou, Zhenghao Li , Liangchen Liu, Chaojie Li , and Yu Shi

Abstract— Automatic License plate recognition (ALPR) ALPR and increasing number of LP types, ALPR systems face
remains a challenging task in face of some difficulties such as greater challenges. LP images can be collected from surveil-
multi-line character distribution and license plate (LP) defor- lance cameras, high-speed snap cameras, cellphones, etc [1].
mation due to camera angles. Most existing ALPR methods
either focus on single-line LP or perform horizontal multi-line The plates in the images are likely to be distorted due to shoot-
LP detection and recognition with character-level annotations. ing angles, which directly affect the recognition of license
In this paper, we propose a novel end-to-end irregular license number. Moreover, some complex LP types such as multi-line
plate recognition (EILPR) to detect and recognize the LP of plates increase the challenges of the task. Since there are only
multi-line text or arbitrary shooting angles, using only plate-level single-line and double-line plates at present, the multi-line
annotations for training. In EILPR, a coarse-to-fine strategy
is adopted to extract the LP features accurately for sequence plate in this paper only refers to the double-line plate.
recognition. Firstly, a coarse rectangular box of the LP is With the success of convolutional neural network (CNN)-
located, along with the corresponding predicted LP class which based methods in text recognition, objection detection and
is single-line or double-line. Then, considering the fact that a LP recognition in the wild [2]–[4], CNN based approaches are
mainly generates perspective distortion in the image due to its widely used in ALPR systems, which contains character-level
rigid feature, we propose a new automatic perspective alignment
network (APAN) to extract the fine LP features connecting the and sequence-level methods. Character segmentation based
detection and recognition. For recognition, a location-aware 2D LPR methods [5]–[7] search for the location of each character
attention based recognition network is performed to recognize the and then classify them. Most multinational ALPR meth-
multi-line and multinational LP based on the extracted features. ods [8]–[10] are based on character detection/segmentation
Experiments on several datasets show that EILPR achieves the due to the differences in LP layouts among different countries.
state-of-the-art performance, demonstrating the effectiveness of
the proposed method. However, any incorrect character locating will result in the
mis-recognition of the text, even with a strong recognizer.
Index Terms— License plate detection and recognition, Character segmentation is unreliable to lighting, pose and
automatic perspective alignment, end-to-end training.
noise in the image, which is easy to cause recognition failure.
Sequence labelling based methods [2], [11]–[13] no longer
I. I NTRODUCTION require character segmentation, which improve the reliability
of LPR. However, existing such methods almost only recog-
A LPR is of great significance to modern Intelligent Trans-
portation System (ITS). With increasing demand of nize the single-line LP taken from a frontal and horizontal
angle. Cao et al. [14] attempted to recognize double-line
Manuscript received February 1, 2021; revised May 26, 2021 and Novem- LPs by the 1D sequence recognition method [15], cutting
ber 10, 2021; accepted November 17, 2021. Date of publication December 3,
2021; date of current version March 9, 2022. This work was supported in part the feature map from the center line and concatenating the
by the National Natural Science Foundation of China under Grant 61773325, split feature maps in the horizontal direction. However, this
Grant 61876154, and Grant 62106247; in part by the Natural Science method is implemented under the ideal condition that the LP
Foundation of Fujian Province under Grant 2018J01574; and in part by the
Chongqing Research Program of Technological Innovation and Application is frontal and have no deformations. In fact, the LP images
Demonstration under Grant cstc2019jscx-msxmX0424. The Associate Editor in real scenes are taken from arbitrary angles, so that the
for this article was T.-H. Kim. (Corresponding author: Xiang-Dong Zhou.) LP in the image may produces deformation. To detect and
Hui Xu is with the Chongqing Institute of Green and Intelligent Technology
(CIGIT), Chinese Academy of Sciences (CAS), Chongqing 400714, China, recognize the arbitrary-shaped LP in an end-to-end framework,
and also with the Chongqing School, University of Chinese Academy of a feature alignment module, which rectifies the deformed
Sciences, Chongqing 400714, China (e-mail: [email protected]). LP features into regular ones and connects the detection
Xiang-Dong Zhou, Zhenghao Li, and Yu Shi are with the Chongqing Insti-
tute of Green and Intelligent Technology (CIGIT), Chinese Academy of Sci- and recognition, is indispensable. However, the universal text
ences (CAS), Chongqing 400714, China (e-mail: [email protected]; alignment methods [16], [17], which are designed for various
[email protected]; [email protected]). irregular text, cannot achieve good effectiveness in ALPR.
Liangchen Liu is with the School of Computing and Information Sys-
tems, The University of Melbourne, Parkville, VIC 3010, Australia (e-mail: To overcome these problems, we propose a novel end-
[email protected]). to-end irregular license plate recognition (EILPR) to detect
Chaojie Li is with the School of Electrical Engineering and Telecom- and recognize the LP of multi-line text or arbitrary shooting
munications, UNSW Sydney, Kingsford, NSW 2032, Australia (e-mail:
[email protected]). angles. EILPR consists of these steps: LP detection, automatic
Digital Object Identifier 10.1109/TITS.2021.3130898 perspective alignment of LP features and 2D attention based
1558-0016 © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Zhejiang University. Downloaded on June 12,2025 at 08:48:07 UTC from IEEE Xplore. Restrictions apply.
XU et al.: EILPR: TOWARD EILPR BASED ON AUTOMATIC PERSPECTIVE ALIGNMENT 2587

LP recognition, where the first two modules are used as a for individual characters along with a Spatial Transformer
coarse-to-fine process to extract LP features. LPR should be Network [25] for character recognition. In these approaches,
performed based on the vehicle detection, so we take the accurate character segmentation plays a crucial role in the
cropped vehicle images as the input. Firstly, the LP detection system, which however is unreliable for images taken in the
network predicts a coarse rectangle box and the corresponding wild.
LP class (single-line or double-line) for each LP. Based on the Top-down approaches regard LPR as a sequence labelling
detection output, APAN is proposed to automatically calculate problem. Thanks to the improvement of text recogni-
the aligned LP features, which is the key to make EILPR end- tion, segmentation-free methods of LPR have achieved
to-end trainable. Finally, the aligned features are fed into the great development. Li and Shen [26] extracted CNN fea-
2D attention based recognizer to predict the LP number. tures and employed RNNs with Connectionist Tempo-
An early version of this work appeared in the confer- ral Classification (CTC) to predict the sequential labels.
ence paper [18], which is only a recognition method based Cheang et al. [27] propose a unified ConvNet-RNN model to
on detected plate images. We extend it in numerous ways, recognize real-world captured license plate, using a Convolu-
(i) exploiting a novel end-to-end LPR framework based on tional Neural Network (ConvNet) to perform feature extraction
cropped vehicle images, which includes both detection and and a RNN for sequence recognition. Some arbitrary-shaped
recognition, (ii) developing an alignment layer to connect the scene text recognition algorithms [16] can also be used for
detection and recognition for end-to-end training, (iii) and ALPR. However, the existing arbitrary-shaped text rectification
verifying the performance on multinational LP benchmarks. networks are not designed specifically for LPR, which did not
Hence the main contributions of the paper are summarized as take the rigid attribute of LPs into account.
follows:
1) An end-to-end license plate detection and recognition B. End-to-End LPR Methods
method for irregular plates such as multi-line plates and
The end-to-end LPR methods unify the LP detection
plates with arbitrary shooting angles is introduced.
and recognition into an overall structure. Montazzolli and
2) Considering the characteristics of the license plate itself,
Jung [28] proposed an end-to-end DL-ALPR system for
a feature alignment layer APAN is proposed to calcu-
Brazilian license plates based on Convolutional Neural Net-
late the aligned feature and connect the detection and
work architectures. Laroca et al. [5] employed two CNN net-
recognition into a unified framework.
works for character segmentation and recognition of Brazil LP
3) APAN is verified to be more effective than TPS-based
with fixed 7 characters based on a yolo detector. Hui et al. [29]
alignment. Meanwhile, EILPR has achieved state-of-the-
adopted a region proposal network for detection and Bidirec-
art performance on several license plate benchmarks.
tional RNNs (BRNNs) to capture the context information in
The rest of the paper is structured as follows: Section II both sides for recognition. HyperLPR [30] is an open source
briefly introduces the related works of ALPR, which includes Chinese license plate detection and recognition framework
two-stage and end-to-end approaches. In Section III, we depict with high speed. The framework use a mixture of deep
the proposed method in details, including the LP detection neural networks and classic image processing algorithms to
network, APAN and the 2d attention based recognizer. The perform detection, segmentation and recognition. RPNet [31]
Experimental evaluation is shown in Section IV, while the is an end-to-end LP recognition model that first released the
conclusion is in Section V. CCPD dataset. However, these end-to-end LPR methods can
only handle LPs which are horizontal or small deformation.
II. R ELATED W ORK SLPNet [32] is proposed to detect and recognize the deformed
With the increasing development of CNN and recurrent LPs, which is a CNN-based lightweight segmentation-free
neural network (RNN) [19], the main methods can be divided ALPR. SEE-LPR [33] is an end-to-end multinational LPR
into two kinds: two-stage and end-to-end. based on semantic segmentation. Nevertheless, neither SLPNet
nor SEE-LPR recognizes multi-line LPs.
A. Two-Stage LPR Methods To summarize, the end-to-end training framework developed
The two-stage LPR methods cascade the detection and recently performs well in joint optimition both detection and
recognition models which are trained separately. Firstly, the recognition. The CTC and 1D attention based methods [16],
license plate is located by the detection model, and then which perform well in single-line LP recognition, cannot
the recognition model extracts its feature maps to predict handle the multi-line LP such as the double-line. And the
the sequence number. LP detection usually uses the object multi-line LP recognition method [14] can only recognize the
detection model, such as Faster-RCNN [20], YOLO [21] and LP that have no deformations. In this paper, we propose a
so on [22], and LP recognition can be classified into two novel end-to-end LPR method for irregular plates based on
groups: bottom-up and top-down approaches. automatic perspective alignment, which is verified to achieve
Bottom-up approaches segment characters firstly and then the state-of-the-art performance.
classify each character. Chao et al. [23] presents a LPR method
based on character-specific extremal regions (ERs) and hybrid III. M ETHODOLOGY
discriminative restricted Boltzmann machines (HDRBMs). As depicted in Fig. 1, the proposed EILPR contains three
Hsu et al. Jain et al. [24] use a CNN classifier trained modules: LP detection network, APAN and 2D attention based

Authorized licensed use limited to: Zhejiang University. Downloaded on June 12,2025 at 08:48:07 UTC from IEEE Xplore. Restrictions apply.
2588 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 23, NO. 3, MARCH 2022

Fig. 1. Overall structure of our EILPR. The model is end-to-end trainable, which consists of a detection module, a feature alignment module and a
2d attention based recognition module. Given an input vehicle image, EILPR predicts the coarse bounding box and the LP number at the same time. The
detection module has two task: bounding box regression and LP types classification. Depending on the coarse location of LP, the ROI pooling layer is used to
obtain the ROI feature. Then, the automatic perspective alignment module is exploited to extract the LP feature for recognition. In the recognition network,
the aligned LP feature is combined with the classification feature for sequence decoding. Finally, the LP number is predicted by the proposed 2D attention
decoder.

recognition network. The detection module predicts a coarse


rectangular box location and a LP class which is single-line
or double-line. With the predicted bounding box, APAN firstly
extracts the coarse ROI features from shared feature maps, and
then calculates the fine LP features following the coarse-to-fine
strategy as shown in Fig. 4. Finally, the aligned LP features
are fed into a recognition network based on location-aware 2D
attention mechanism to predict the license number.

A. LP Detection Network Fig. 2. (a) Description of corner regression errors. The red quadrilateral
box represents the ground truth, while the blue and green quadrilateral boxes
The detection process is illustrated as the detection module represent different detection and recognition results. (b) The blue rectangle
in Fig. 1, which has two tasks: plate bounding box regression box is the ground truth in this paper, while the red one is the minimum
and plate classification. bounding box of the LP. d is the margin around ground truth box.
1) Ground-Truth Generation: It is challenging to recog-
nize the plate with serious perspective deformation and the
multi-line text in the image. For the plate with serious per-
spective deformation, some methods [32], [34] detect it by
locating the four corners. However, the corner regression loss
function either L1 loss or IoU loss is not conducive to the
plate recognition task as shown in Fig. 2(a). We can see that
the green quadrilateral box has the similar detection loss to
the blue one, but fail to be recognized due to the lack of part
of text. Therefore, it is unwise to eliminate the influence of
perspective deformation in the detection module.
As shown in Fig. 2(b), to ensure that the text can be fully
included, the ground-truth box extends a certain margin based
on the annotated box. The margin d x , d y are set to one-tenth of
the width and the height respectively in the paper. The ground
truth of detection module is expressed as B = {x, y, w, h, c}, Fig. 3. Detailed network architecture of the detection module.
where x, y, w, h represent the x and y coordinates of the
center point, the width and the height of the bounding box
respectively, and c is the plate class (single-line/double-line). probability, activated by a softmax function, and (ii) another
2) Network and Loss: The feed-forwarding result of the for regressing the rectangle box parameters, without activation.
CNN-based detector is a 6-channel feature vector that include In addition, the detection process described here is based on
2-channel single-line/double-line probabilities and 4-channel the detected vehicle image.
rectangle box parameters. The architecture of detection module The training objective of detection network can be divided
has a total of 12 convolutional layers with ReLU and Batch into two parts: the localization loss L loc and the classification
Normalization as shown in Fig. 3. There are 4 max pooling loss L cls . Let N be the size of a mini-batch in training. L loc
layers of size 2 × 2 and stride 2. Finally, the detection block is a Smooth L1 loss [35] between the predicted box (pb) and
has two parallel convolutional layers: (i) one for inferring the the ground truth box (gt), and L cls is a cross-entropy loss. The

Authorized licensed use limited to: Zhejiang University. Downloaded on June 12,2025 at 08:48:07 UTC from IEEE Xplore. Restrictions apply.
XU et al.: EILPR: TOWARD EILPR BASED ON AUTOMATIC PERSPECTIVE ALIGNMENT 2589

TABLE I
A RCHITECTURE OF P ERSPECTIVE T RANSFORMATION
P REDICTION N ETWORK

Fig. 4. Detailed architecture of APAN. We show the process on the LP


images for better visualization, but actually the operation is on the feature
maps.

detection loss function L det is described as follows,


1  
L loc = smooth L1 ( pb i − gt i ) (1)
N
N i∈{x,y,w,h}
1  alignment, APAN introduces the prior information that the
L cls = −[y · log( p) + (1 − y) · log(1 − p)] (2)
N deformation of LPs is rigid deformation. It predicts perspective
N
transformation parameters directly, which enhances geometric
L det = L loc + L cls (3)
constraints of LP image according to the rigid body property
where p denotes the prediction for the single-line LP while of the LP. And control points are no longer needed.
(1 − p) denotes the double-line LP. y represents the ground As shown in Fig. 4, the perspective transformation predic-
truth class, where single-line is 1 and double-line is 0. tion network directly predicts a 3-by-3 offset matrix T3×3
p . The
architecture of the prediction network is showed in Table. I.
The network architecture consists of three convolutional
B. Automatic Perspective Alignment Network (APAN)
layers. Each convolutional layer with a kernel of 3-by-3 is
The coarse location of the LP in the image is predicted by followed by a batch normalization layer, a ReLU layer and a
LP detection module. According to the coarse-to-fine strategy, pooling layer. With the input feature map I of size 16×50, the
APAN is developed to calculate fine LP features based on network predicts the offsets with 9 channels which corresponds
extracted coarse features as shown in Fig. 4. In APAN, to T3×3
p . The rough perspective transformation T̂ is combined
we exploits region-of-interest (ROI) pooling layers [35] to by the two matrices as follows,
extract coarse features of interest. Then, the perspective trans-
formation prediction network and grid sampling are performed T̂ = T p + E (4)
to calculate fine LP feature maps for recognition. where the unit matrix E3×3 is used to keep the identity
1) RoI Features: We know that feature maps from different mapping when the predicted T p is zero. The final perspective
layers within a network have different receptive field sizes, transformation T is normalized by T̂ (3, 3). T3×3 is actually
where the lower layers capture more fine details of the input a 3D transformation, establishing the location relationship
objects. Because the area of the LP is expected to be very small between the corresponding pixels of the aligned feature map
relative to the entire image, feature maps from relatively lower Ir and the ROI feature map I .
layers are matter for recognizing LP. Inspired by [31], EILPR We know that each member of the matrix T p have a different
extracts feature maps at the end of three low-level layers: the meaning and different magnitude, so we do not design losses
second, fourth, sixth convolutional layer. The sizes of extracted separately for APAN. Because all modules in APAN are
feature maps are (256 × ch) × (256 × cw) × 32, (128 × ch) × differentiable, we combine detection network, APAN and 2D
(128 × cw) × 64, (64 × ch) × (64 × cw) × 128. After these attention based recognition network for end-to-end training,
feature maps are extracted, ROI pooling layers are used to requiring no manual annotations on the perspective trans-
extract each ROI from shared feature maps and convert it into formation matrix. Inspired by STN which makes the spatial
a feature map with a fixed size Pw × Ph (e.g.,16 × 50 in this transformation learnable end-to-end, T3×3 will be used to
paper). Afterwards, these three resized feature blocks 16 × generate sampling grid for Ir .
50 × 32, 16 × 50 × 64, 16 × 50 × 128 are concatenated to one 3) Grid Sampling: A sampling grid P = { pi | pi ∈ R 2×1 }
feature block of size 16 × 50 × 224 for feature alignment. is built on I , applying T to each pixel position on Ir . To fulfill
2) Perspective Transformation Prediction: Thin-Plate- this, we have to complete the calculation on account of the 3D
Spline [36] (TPS) transformation are mostly used for irregular property of perspective transformation. Given a 3D point θ =
text alignment as in [16], [37], which can handle various [u, v, w]T as an input, the output point θ  = [x, y, z]T will be
text deformations. TPS-based alignment method consists calculated by perspective projective mapping as follows,
of 3 steps: control points prediction, a TPS transformation     T
calculating and grid sampling. This kind method results in x ,y ,z = T ×θ (5)
more complex deformation at higher degree of freedom, x  y  z 
which is a burden to LPR. In comparison to TPS-based x = , y = , z =  =1 (6)
z z z
Authorized licensed use limited to: Zhejiang University. Downloaded on June 12,2025 at 08:48:07 UTC from IEEE Xplore. Restrictions apply.
2590 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 23, NO. 3, MARCH 2022

We use h = {h i, j,c } to denote the feature map extracted


by the CNN encoder, where i, j, c indicate the positions
and channels in the feature map. Then, a gated recurrent
unit (GRU) [41] is used to convert the feature maps into a
character sequence (o1 , o2 , · · · , o L ), where L is the length of
the predicted license number. At time step t, the final predicted
distribution over characters is computed by
ot = So f tmax(Wo st + bo ) (8)
where st is the hidden state of the GRU at time step t.
Fig. 5. Detailed network architecture of 2d attention based recognition To get st , we should generate the embedding vector ôt −1 =
module. Embeddi ng(ot −1), which denotes the ground truth of the
previous time step during training or the prediction of the
TABLE II
previous time step during testing. The hidden state st is
A RCHITECTURE OF THE E NCODER OF THE 2 D ATTENTION
R ECOGNITION N ETWORK
updated as,
st = G RU (Concat (ôt −1, gt ), st −1 ) (9)

where the glimpse vector gt = i, j αt,i, j h i, j,c combines the
image features and the spatial attention mask. αt = {αt,i, j }
denotes the spatial attention map at time step t. We predict
the spatial attention map based on the previous GRU state as,
ex p(et,i, j )
αt,i, j =  (10)
i, j (ex p(et,i, j ))

where et,i, j = Va T tanh(H ) and Va is a trainable parameter.


To make αt sensitive to character arrangement especially for
the single-line and double-line LPs, h i, j is concatenated with
a one-hot encoding of the simplified spatial coordinates (i, r j ).
where the vector [x  , y  , z  ]T is an intermediate variable. Pixel
r j is used as the encoded coordinate instead of j as follows,
location θ indicates the pixel on Ir . u, v in the vector θ are 
 si ngle − li ne
the x-coordinate and y-coordinate of the pixel respectively, 1,
and w is a constant that is usually set to 1. Then, the 2D rj =  j (11)
2 + 0.5 , double − li ne
vector pi = [x, y]T is the location of sampling point on I .
The aligned features are extracted from the input feature maps here [·], which is the same as function f loor (·), returns the
with the set of sampling points, in which the interpolation value of a number rounded downward to the nearest integer,
method is bilinear interpolation. Ir is the same size as I and and j = 1, 2, 3, 4. H is updated as follows,
is generated as follows, H = Wh h i, j + Ws st −1 + Wi f i + Wr j fr j + b (12)
Ir = Bili near I nter polati on(P, I ) (7) where Wh , Ws , Wi , Wr j and b are trainable parameters. f i is
a one-hot encoding of coordinate i , while fr j is that of r j .
Finally, the aligned feature map Ir is passed into the 2D More details of this process are referred to [18].
attention based recognition network. We train the model using minimum the negative
log-likelihood of conditional probability as follows,
C. 2D Attention Based Recognition Network 
N 
M

The current segmentation-free ALPR methods, which L reg = − log p(yn,t |yn,1:t −1 , Cn , In , δ) (13)
extracted 1D sequence feature, can only recognize the single- n=1 t =1

line LP. By contrast, 2D attention based approaches, which where In , Cn and Yn = {yn,t } are the input image, classifica-
have achieve good performance in scene text recognition tion label and corresponding license number in the train set
[38], [39], can perform decoding in 2D space. B = {In , Cn , Yn }, n = 1 · · · N. δ represents the parameters of
To predict a license number sequence directly from the the model. M is the maximal length of the predicted license
2D feature map extracted by CNN, we adopt 2D attention number. Note that during training, we directly perform APAN
mechanism inspired by [40]. As depicted in Fig. 5, the recog- based on the ground truth detection results.
nition module is an attention-based encoder-decoder model.
D. Training
The encoder is a CNN structure, while the decoder is attention-
based. In this paper, the encoder is a light-weight network EILPR accomplishes LP bounding box detection and LP
shown in Table. II. With the input size of 16 × 50, the encoder number recognition in a single forward. The training involves
outputs 2D feature maps with the size of 4 × 13. The feature loss functions for detection and recognition performance,
maps are updated according to the plate classes, and then are as well as pre-training the recognition module before training
fed into an attention based GRU (256 units) network. EILPR end-to-end.

Authorized licensed use limited to: Zhejiang University. Downloaded on June 12,2025 at 08:48:07 UTC from IEEE Xplore. Restrictions apply.
XU et al.: EILPR: TOWARD EILPR BASED ON AUTOMATIC PERSPECTIVE ALIGNMENT 2591

1) Training Objective: The detection and recognition the rest as test set. CLPD and the synthetic dataset are
sub-networks can be connected by APAN for end-to-end combined into a pre-training LP dataset (PLPD) to pre-train
training. No other input is required except for the vehicle the recognition module in EILPR.
image annotated with the plate bounding box, plate class and CCPD [31] was collected in the parking lot of Hefei
plate number. For end-to-end training of LP detection and province of China as illustrated in Fig. 6(b), which is divided
recognition, the whole loss function can be formulated as: into several sub-datasets. We randomly select 50,000 ordinary
blue plate images from CCPD for experiments. As usual,
L E I L P R = L det + λL reg (14)
all the images are subsequently split into 2 subsets in the
where λ is the hyper-parameter to control the balance among proportion of 9:1 respectively for training and testing.
detection and recognition task, which is set to 1 in this paper. Double-line Traffic-data (DLTD) is a private double-line
2) Pre-Training Recognition Module: EILPR is a end-to- plates dataset collected from various surveillance cameras of
end LPR system for multi-line LPs. Existing LP data sets traffic crossroads, as shown in Fig. 6(d). DLTD contains about
for both detection and recognition do not uniformly cover 1000 images, and we split them as 8:2 into training set and test
various LP classes and character categories, which is greatly set respectively. Since the number of images in DLTD is far
affect the recognition ability. To make the recognition module fewer than that of CCPD, we separately verify the single-line
robust, we pre-train the recognition module with an integrated and double-line plates.
data set which consists of real single-line LP images and SYSU-ITS dataset is a public LP image set, which is
synthesized double-line LP images, and fine-tune the model provided by OpenITS. SYSU-ITS includes 1402 images con-
end-to-end on different real data sets. In the pre-training taining 958 single-line LPs and 84 double-line LPs, which are
phase, we extract the coarse features directly through the all selected from the HD images of road bayonets as shown
first three conventional layers of detection module as the in Fig. 6(c). We pick out double-line LP images as a test
ROI features in Fig. 4, ignoring ROI pooling layer. We pre- set. In addition, the accuracy of labels which are not in the
train the recognition module by batches of 64 for 30 thou- training set is not taken into account in the evaluation. For the
sand iterations, and evaluate the accuracy on the validation convenience of the experiments, we combine the double-line
set every 500 steps. We choose the model of the highest test sets of DLTD and SYSU-ITS into one test set named
accuracy in evaluations on validation set as the pre-trained D-SYSU.
model. Open-ALPR consist of 114 Brazilian and 108 European LP
EILPR is trained end-to-end from scratch with ADADELTA images, which is used to verify the generalization of EILPR
optimizer. The learning rate is set to 1.0 at the beginning in multinational LP recognition.
and decayed to 0.1 after 20 thousand iterations. Since not The LP images above are annotated with rectangle boxes for
enough public double-line plate images are available, we fine- detection and character sequence for recognition. We fine-tune
tune the model end-to-end on a private double-line plate the pre-trained model on training set CCPD and SYSU-ITS,
dataset and evaluate the performance of double-line images and then verify the performance on Open-ALPR and CCPD
on the SYSU-ITS [42] dataset. Due to the imbalance in the for single-line plates. Also we fine-tune the pre-trained model
number of single-line and double-line LP image sets in real on DLTD and verify the performance on D-SYSU for double-
scene, we fine-tune the pre-trained model on the two datasets line plates.
respectively to evaluate the end-to-end performances.
B. Implementation Details
IV. E XPERIMENTS We implement the proposed approach under the framework
We evaluate end-to-end LPR performance of the proposed of PyTorch [44]. We train the model on a work-station with
method on several standard benchmarks containing multina- two Intel Xeon(R) E5-2620 2.10GHz CPU, single NVIDIA
tional LPs. Moreover, analysis of each module and compar- GeForce GTX Titan X, and 64GB RAM. CUDA 9.0 and
isons with previous methods are also given to demonstrate the CuDNN v7 backends are used in our experiments so that the
superiority and reasonableness of EILPR. model is GPU-accelerated.

A. Datasets C. Ablation Studies


Synthetic data are generated randomly by image process- For better understanding the strengths of the proposed
ing, containing 20,000 multinational LP images with various method, we first provide the ablation studies from three
brightness, chroma, clarity and angles of view as shown in aspects. First, we demonstrate the benefits of end-to-end train-
Fig. 6(a). It contains the LP images of China, America, Italy ing. Second, we compare APAN and TPS-based alignment on
and so on. And the characters on the plates have 66 cate- recognition performance. Third, we compare our 2D attention
gories, which comprise 32 Chinese characters, 24 letters and based recognizer with the more popularized LSTM based one
10 numbers. and 1D attention based one.
Chinese license plates dataset (CLPD) [43] dataset contains 1) With vs. Without End-to-End Training: In end-to-end
about 260,000 Chinese single-line LPs collected from different framework, the LP recognition is depended on the LP region
security and surveillance cameras as shown in Fig. 6(a). predicted by detection module. Meanwhile, the recognition
We randomly selected 150,000 images as training set and supervision can provide more detailed text stroke features for

Authorized licensed use limited to: Zhejiang University. Downloaded on June 12,2025 at 08:48:07 UTC from IEEE Xplore. Restrictions apply.
2592 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 23, NO. 3, MARCH 2022

Fig. 6. Examples of different datasets.

TABLE III
P ERFORMANCE ON CCPD AND D-SYSU. Multi-line R EPRESENTS W HETHER THE M ETHOD I S A PPLICABLE TO M ULTI -L INE L ICENSE
P LATE R ECOGNITION . End-to-End R EPRESENTS W HETHER THE M ETHOD I S T RAINED E ND - TO -E ND

Fig. 8. Detection results with and without end-to-end training. From left to
right in (a, b): detection without guidance from recognition and EILPR.

training, LP whose feature is not salient could also be detected


accurately, because the recognition supervision has a cor-
rection effect on the detection network. Without end-to-end
training, LP detection may miss some LP regions or mis-
Fig. 7. Results of aligned LP and recognition. ROI represents the detected LP classify LP-alike background. As shown in Fig. 8(b), the
from the input. TPS-based represents the aligned LP and the corresponding model without end-to-end training mis-classified the text
predicted LP number with TPS-based alignment method. APAN represents the which has similar structured with LP. For the recognition
aligned LP generated by APAN and the corresponding LP number predicted
by EILPR. performance, we can see in Table. III, the end-to-end training
based methods (including EILPR and other configurations)
outperform our two-stage method significantly in recognition
LP detection. To demonstrate the performance of end-to-end performance. It can be seen that in the end-to-end training
training, we evaluate a variant of our method in which LP framework, detection and recognition modules are mutually
detection and recognition are trained separately. optimized.
Some qualitative results are shown in Fig. 8, which 2) APAN vs. TPS-Based Alignment: Aster [16], which is
reveal the detection performance with and without end-to- a state-of-the-art irregular text recognition method, consists
end training. Apparently, the detection results with end-to- of a text alignment module with TPS transformation and a
end training is more accurate. In Fig. 8(a), with end-to-end recognition module with 1D attention decoder. The TPS-based

Authorized licensed use limited to: Zhejiang University. Downloaded on June 12,2025 at 08:48:07 UTC from IEEE Xplore. Restrictions apply.
XU et al.: EILPR: TOWARD EILPR BASED ON AUTOMATIC PERSPECTIVE ALIGNMENT 2593

Fig. 9. Examples of detection and recognition results with EILPR.

alignment aims to transform features with TPS transformation the CNN in our recognition module and a 1D-attention based
by STN. Through analyzing the end-to-end recognition results, decoder. As shown in Table. III, adopting CTC or 1D-attention
we argue that TPS-based alignment may be unsuitable for have little influence on single-line LPR performance. As a
license plates. Some examples are shown in Fig. 7. The red result, the 2D attention based recognizer far outperforms CTC
characters in Fig. 7 are mis-classified with TPS-based method, based and 1D attention based recognizer on double-line LP
as they are not properly aligned. TPS-based methods depends end-to-end performance.
on the prediction of dozens of control points, and for the image More specifically, CCPD has a large number of difficult
with large deformation, it may cut off part of the text. APAN samples with severe perspective deformation, while D-SYSU
aims to reduce text deformation, but it does not completely and Open-ALPR have less. As shown in Table. III, the
eliminate it due to the presence of predictive bias. As shown recognition module based on 2D attention decoder performs
in Fig. 7, the aligned plates generated by APAN is significantly well on both single-line plates and double-line plates. The
superior to TPS-based method and are sufficient for accurate attention-based methods outperforms CTC-based methods,
recognition, although the alignment is not perfect. because attention-based decoders are more better at handling
To further explore the impact of TPS-based alignment on the irregular text. It can be concluded that, our model performs
LPs of different shapes, we evaluate a variant of our method best on both multi-line LPs and largely deformed LPs.
which replace APAN with TPS-based alignment. Table. III
show that TPS-based method for large deformation LPs will D. Comparison With the State-of-the-Art
reduce the end-to-end recognition performance, indicating that In this subsection, we compare with previous methods on
TPS transformation parameters are not suitable for irregular several benchmarks to verify the superiority of our work.
license plate recognition. Because the LPs in D-SYSU have We evaluate the performance of the proposed EILPR with
small deformations, the recognition accuracy is not much other publicly reported models on Chinese LP recognition. The
different on D-SYSU. It can be seen that APAN, which is rule for the calculation of the recognition accuracy is described
based on perspective transformation, is better to depict the as follows: only when all the characters of each LP on an
deformation of LP images. image are correctly recognized, the result is considered to be
3) 2D Attention Based Recognizer vs. Others: Most previ- correct. We compare with several publicly available models
ous ALPR methods are based on CTC or 1D-attention decoder, that are HyperLPR [30], MTCNN+LPRNet [45], RPNet [31],
which predict LP number from 1D sequence features. How- SLPNet [32], SEE-LPR [33] and so on. However, these models
ever, these methods cannot recognize the multi-line license can only recognize single-line license plates or recognize
plates. In contrast, our method adopts the 2d attention to multi-line plates based on character segmentation, so we
decode the multi-line LP number. To verify the effectiveness integrate MTCNN and the method [14] as the comparison
of 2D attention based recognizer, we evaluate a variant of model for multi-line LPR.
our method which consists of the CNN in our recognition 1) Experiments on Single-Line LP: As shown in Table. III,
module, a bi-directional LSTM with 256 output channels per the proposed method can achieve state-of-the-art performance
direction, a fully-connected layer and a CTC decoder. Also, on CCPD. Because CCPD has a large number of samples with
we evaluate another variant of our method which consists of severe perspective deformation, EILPR and SLPNet, which

Authorized licensed use limited to: Zhejiang University. Downloaded on June 12,2025 at 08:48:07 UTC from IEEE Xplore. Restrictions apply.
2594 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 23, NO. 3, MARCH 2022

take into account the LP deformation in the image, outperform [2] H. Li, P. Wang, and C. Shen, “Toward end-to-end car license plate
others significantly. For the single-line LP recognition, the two detection and recognition with deep neural networks,” IEEE Trans.
Intell. Transp. Syst., vol. 20, no. 3, pp. 1126–1136, Mar. 2019.
methods have the similar effectiveness. [3] S. Ge, J. Li, Q. Ye, and Z. Luo, “Detecting masked faces in the wild
2) Experiments on Double-Line LP: As shown in Table. III, with LLE-CNNs,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.
the proposed method can achieve state-of-the-art performance (CVPR), Jul. 2017, pp. 2682–2690.
[4] S. Ge, C. Zhang, S. Li, D. Zeng, and D. Tao, “Cascaded correlation
on D-SYSU, which consists of double-line LPs. With the help refinement for robust deep tracking,” IEEE Trans. Neural Netw. Learn.
of 2D attention based recognizer, EILPR outperforms other Syst., vol. 32, no. 99, pp. 1276–1288, Apr. 2020.
methods by a large gap. Due to the limitation of data volume [5] R. Laroca et al., “A robust real-time automatic license plate recognition
based on the Yolo detector,” in Proc. Int. Joint Conf. Neural Netw.
and data balance, the performance on double-line LPs is not (IJCNN), Jul. 2018, pp. 1–10.
as good as that of single-line LPs. [6] R. D. Castro-Zunti, J. Yépez, and S. Ko, “License plate segmentation
3) Experiments on Multinational LP: EILPR can also be and recognition system using deep learning and OpenVINO,” IET Intell.
Transp. Syst., vol. 14, no. 2, pp. 119–126, Feb. 2020.
used for multinational LP detection and recognition, which is [7] C. L. P. Chen and B. Wang, “Random-positioned license plate recog-
verified on benchmark Open-ALPR as shown in Table. III. nition using hybrid broad learning system and convolutional networks,”
SEE-LPR can only recognize single-line LP because of its IEEE Trans. Intell. Transp. Syst., early access, Aug. 4, 2020, doi:
10.1109/TITS.2020.3011937.
CTC-based decoding. The method [10] recognizes multi-line
[8] M. A. Raza, C. Qi, M. R. Asif, and M. A. Khan, “An adaptive approach
LP based on character detection, which is limited by image for multi-national vehicle license plate recognition using multi-level
quality and manual annotation. By comparison, EILPR has a deep features and foreground polarity detection model,” Appl. Sci.,
good generalization ability, because it is not limited by the vol. 10, no. 6, p. 2165, Mar. 2020.
[9] C. Henry, S. Y. Ahn, and S.-W. Lee, “Multinational license plate recog-
length of LP number and the spatial position of character. nition using generalized character sequence detection,” IEEE Access,
Fig. 9 shows detection and recognition results on bench- vol. 8, pp. 35185–35199, 2020.
marks with our EILPR model. In addition, in terms of [10] Z. T. Soghadi and C. Y. Suen, License Plate Detection and Recognition
by Convolutional Neural Networks (Pattern Recognition and Artificial
the recognition speed, the model based on 2D atten- Intelligence). Cham, Switzerland: Springer, 2020.
tion is about two times slower than the CTC-based [11] L. Zhang, P. Wang, H. Li, Z. Li, and Y. Zhang, “A robust attentional
model, which needs further optimization. Fortunately, the framework for license plate recognition in the wild,” IEEE Trans. Intell.
Inference Engine from Intel OpenVINO [46] has been
Transp. Syst., vol. 22, no. 99, pp. 1–10, Nov. 2020.
[12] S.-L. Chen, C. Yang, J.-W. Ma, F. Chen, and X.-C. Yin, “Simultaneous
applied to accelerated CTC-based or segmentation-based LPR End-to-End vehicle and license plate detection with multi-branch atten-
models [6], [43]. Acceleration of 2D attention based LPR tion neural network,” IEEE Trans. Intell. Transp. Syst., vol. 21, no. 9,
pp. 3686–3695, Sep. 2020.
model will be implemented in the future research. In addition, [13] O. Bulan, V. Kozitsky, P. Ramesh, and M. Shreve, “Segmentation-
the synthesis data of LPs will be improved by GAN-based and annotation-free license plate recognition with deep localization and
methods [47] and the ALPR method can be applied in urban failure identification,” IEEE Trans. Intell. Transp. Syst., vol. 18, no. 9,
pp. 2351–2363, Sep. 2017.
network with related methods [48]. And we will improve [14] Y. Cao, H. Fu, and H. Ma, “An end-to-end neural network for multi-
our method in difficult conditions such as occlusion and line license plate recognition,” in Proc. 24th Int. Conf. Pattern Recognit.
low-resolution inspired by relevant methods [49], [50]. (ICPR), Aug. 2018, pp. 3698–3703.
[15] C.-Y. Lee and S. Osindero, “Recursive recurrent nets with attention
modeling for OCR in the wild,” in Proc. IEEE Conf. Comput. Vis.
V. C ONCLUSION Pattern Recognit. (CVPR), Jun. 2016, pp. 2231–2239.
[16] B. Shi, M. Yang, X. Wang, P. Lyu, C. Yao, and X. Bai, “ASTER: An
In this paper, we proposed EILPR, a novel end-to-end irreg- attentional scene text recognizer with flexible rectification,” IEEE Trans.
ular ALPR method based on automatic perspective alignment. Pattern Anal. Mach. Intell., vol. 41, no. 9, pp. 2035–2048, Sep. 2018.
It can detect and recognize multi-line license plates from [17] L. Qiao et al., “Text perceptron: Towards end-to-end arbitrary-shaped
text spotting,” Proc. AAAI Conf. Artif. Intell., vol. 34, no. 7, Apr. 2020,
the detected vehicle images accurately. In order to weaken pp. 11899–11907.
the influence of detection error on the recognition results, [18] H. Xu, Z.-H. Guo, D.-H. Wang, X.-D. Zhou, and Y. Shi, “2D license
we adopt a coarse-to-fine strategy to extract the LP features plate recognition based on automatic perspective rectification,” in Proc.
25th Int. Conf. Pattern Recognit. (ICPR), Jan. 2021, pp. 202–208.
precisely. Therefore, we propose automatic perspective align-
[19] X. He, C. Li, T. Huang, C. Li, and J. Huang, “A recurrent neural network
ment network (APAN) to implement the feature extraction for solving bilevel linear programming problem,” IEEE Trans. Neural
strategy and connect detection and recognition for end-to- Netw. Learn. Syst., vol. 25, no. 4, pp. 824–830, Apr. 2014.
end training. Considering the rigid body property and 2D [20] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards
real-time object detection with region proposal networks,” IEEE Trans.
text distribution of the LP, EILPR is verified to have good Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137–1149, Jun. 2017.
performance on the largely deformed and multi-line LP. Exper- [21] J. Redmon and A. Farhadi, “YOLOv3: An incremental improvement,”
iments conducted on challenging multinational benchmarks in Computer Vision and Pattern Recognition. Washington, DC, USA:
IEEE Comput. Soc., 2018.
verify the effectiveness of our method. As for future work, [22] M. Mittal et al., “An efficient edge detection approach to provide
we will improve the performance of the system in difficult better edge connectivity for image analysis,” IEEE Access, vol. 7,
environment such as foggy and snowy. In addition, model pp. 33240–33255, 2019.
[23] C. Gou, K. Wang, Y. Yao, and Z. Li, “Vehicle license plate recognition
acceleration based on OpenVINO is also in the future research. based on extremal regions and restricted Boltzmann machines,” IEEE
Trans. Intell. Transp. Syst., vol. 17, no. 4, pp. 1096–1107, Apr. 2016.
R EFERENCES [24] V. Jain, Z. Sasindran, A. Rajagopal, S. Biswas, H. S. Bharadwaj, and
K. R. Ramakrishnan, “Deep automatic license plate recognition sys-
[1] C. N. E. Anagnostopoulos, I. E. Anagnostopoulos, I. D. Psoroulas, tem,” in Proc. 10th Indian Conf. Comput. Vis., Graph. Image Process.
V. Loumos, and E. Kayafas, “License plate recognition from still images (ICVGIP), 2016, pp. 1–8.
and video sequences: A survey,” IEEE Trans. Intell. Transp. Syst., vol. 9, [25] M. Jaderberg et al., “Spatial transformer networks,” in Proc. Adv. Neural
no. 3, pp. 377–391, Sep. 2008. Inf. Process. Syst. (NIPS), 2015, pp. 2017–2025.

Authorized licensed use limited to: Zhejiang University. Downloaded on June 12,2025 at 08:48:07 UTC from IEEE Xplore. Restrictions apply.
XU et al.: EILPR: TOWARD EILPR BASED ON AUTOMATIC PERSPECTIVE ALIGNMENT 2595

[26] H. Li, P. Wang, M. You, and C. Shen, “Reading car license plates Hui Xu received the B.Sc. and M.Sc. degrees in
using deep neural networks,” Image Vis. Comput., vol. 72, pp. 14–23, automation from the Beijing University of Posts and
Apr. 2016. Telecommunications in 2009 and 2012, respectively.
[27] T. K. Cheang, Y. S. Chong, and H. T. Yong, “Segmentation-free vehicle She is currently pursuing the Ph.D. degree with the
license plate recognition using ConvNet-RNN,” in Proc. Int. Workshop Chongqing School, University of Chinese Academy
Adv. Image Technol., 2017, pp. 1–5. of Sciences. She also works as an Engineer with
[28] S. Montazzolli and C. Jung, “Real-time Brazilian license plate detection the Chongqing Institute of Green and Intelligent
and recognition using deep convolutional neural networks,” in Proc. Technology, Chinese Academy of Sciences. Her
30th SIBGRAPI Conf. Graph., Patterns Images (SIBGRAPI), Oct. 2017, research interests include scene text detection and
pp. 55–62. recognition, license plate detection and recognition,
[29] H. Li, P. Wang, and C. Shen, “Toward end-to-end car license plate and intelligent transportation systems.
detection and recognition with deep neural networks,” IEEE Trans.
Intell. Transp. Syst., vol. 20, no. 3, pp. 1126–1136, Mar. 2017.
[30] Zeussees. High Performance Chinese License Plate Recog- Xiang-Dong Zhou received the B.S. degree in
nition Framework. Accessed: Oct. 2020. [Online]. Available: applied mathematics and the M.S. degree in man-
https://github.com/zeusees/HyperLPR agement science and engineering from the National
[31] Z. Xu, W. Yang, A. Meng, N. Lu, and L. Huang, “Towards end-to-end University of Defense Technology, Changsha, China,
license plate detection and recognition: A large dataset and baseline,” and the Ph.D. degree from the Institute of Automa-
in Proc. Eur. Conf. Comput. Vis., Sep. 2018, pp. 255–271. tion, Chinese Academy of Sciences, Beijing, China,
[32] W. Zhang, Y. Mao, and Y. Han, “SLPNet: Towards end-to-end car in 1998, 2003, and 2009, respectively. He is cur-
license plate detection and recognition using lightweight CNN,” in rently a Professor with the Chongqing Institute of
Proc. IEEE 4th Int. Conf. Signal Image Process. (ICSIP), Oct. 2020, Green and Intelligent Technology, Chinese Academy
pp. 290–302. of Sciences. His research interests include handwrit-
[33] D. Tang, K. H., X. Meng, R. Z. Liu, and T. Lu, SEE-LPR: A Semantic ing recognition and ink document analysis.
Segmentation Based End-to-End System for Unconstrained License Plate
Detection and Recognition. Cham, Switzerland: Springer, 2020.
[34] S. Montazzolli and C. R. Jung, “License plate detection and recognition Zhenghao Li received the B.S. degree in telecom-
in unconstrained scenarios,” in Proc. Eur. Conf. Comput. Vis. (ECCV), munications engineering and the Ph.D. degree
Sep. 2018, pp. 580–596. in instrumentation science and technology from
[35] R. Girshick, “Fast R-CNN,” in Proc. IEEE Int. Conf. Comput. Vis. Chongqing University, China, in 2003 and 2009,
(ICCV), 2015, pp. 1440–1448, doi: 10.1109/ICCV.2015.169. respectively. In 2019, he has been with the
[36] F. L. Bookstein and P. Warps, “Thin-plate splines and the decomposition Chongqing Institute of Green and Intelligent Tech-
of deformations,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 11, no. 6, nology, Chinese Academy of Sciences, where he
p. 585, Jun. 1989. is currently an Associate Professor. His current
[37] M. Yang et al., “Symmetry-constrained rectification network for scene research interests include image processing, pattern
text recognition,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), recognition, and edge computing.
Oct. 2019, pp. 9147–9156.
[38] H. Li, P. Wang, C. Shen, and G. Zhang, “Show, attend and read: A
simple and strong baseline for irregular text recognition,” in Proc. AAAI
Conf. Artif. Intell., vol. 33, 2019, pp. 8610–8617. Liangchen Liu received the B.Eng. degree in
[39] X. Yang, D. He, Z. Zhou, D. Kifer, and C. L. Giles, “Learning to read information engineering and the M.Sc. degree in
irregular text with attention mechanisms,” in Proc. 26th Int. Joint Conf. instrument science and technology from Chongqing
Artif. Intell., Aug. 2017, p. 3. University, Chongqing, China, in 2009 and 2012,
[40] Z. Wojna et al., “Attention-based extraction of structured information respectively, and the Ph.D. degree from The Uni-
from street view imagery,” in Proc. 14th IAPR Int. Conf. Document versity of Queensland, Brisbane, QLD, Australia,
Anal. Recognit. (ICDAR), Nov. 2017, pp. 844–850. in 2017. He is currently a Research Fellow with
[41] K. Cho, B. V. Merrienboer, C. Gulcehre, F. Bougares, and The University of Melbourne. His current research
Y. Bengio, “Learning phrase representations using RNN encoder- interests include unsupervised learning, detection,
decoder for statistical machine translation,” in Proc. Conf. Empirical segmentation, and visual attribute and its related
Methods Natural Lang. Process. (EMNLP), 2014, pp. 1724–1734, doi: applications.
10.3115/v1/D14-1179.
[42] OpenITS. SYSU-ITS. Accessed: Oct. 2020. [Online]. Available:
https://www.openits.cn/openData4/569.jhtml Chaojie Li received the B.Eng. degree in elec-
[43] S. Zherzdev and A. Gruzdev, “LPRNet: License plate recognition via tronic science and technology and the M.Eng. degree
deep neural networks,” 2018, arXiv:1806.10447. in computer science from Chongqing University,
[44] A. Paszke et al., “Automatic differentiation in pytorch,” in Proc. NIPS- Chongqing, China, in 2007 and 2011, respectively,
W, 2017, pp. 1–4. and the Ph.D. degree from RMIT University, Mel-
[45] X. Xue. A Two Stage Lightweight and High Performance License bourne, Australia, in 2017. He is currently a Senior
Plate Recognition in MTCNN and LPRNet (2019). Accessed: Research Associate with the School of Electrical
Nov. 2019. [Online]. Available: https://github.com/xuexingyu24/License Engineering and Telecommunications, UNSW Syd-
Plate Detection Pytorch. ney, Sydney. His current research interests include
[46] Intel Openvino Toolkit/Intel Software. Accessed: Sep. 2020. [Online]. graph representation learning, distributed optimiza-
Available: https://software.intel.com/en-us/articles/OpenVINO- tion and control of energy storage, neural networks,
InferEngine. and their application.
[47] T. Wang, T. Zhang, L. Liu, A. Wiliem, and B. Lovell, “Cannygan: Edge-
preserving image translation with disentangled features,” in Proc. IEEE
Int. Conf. Image Process. (ICIP), Sep. 2019, pp. 514–518. Yu Shi received the B.S. degree in computer science
[48] K.-J. Pai, R.-S. Chang, R.-Y. Wu, and J.-M. Chang, “A two-stages tree- and technology and the M.E degree in software
searching algorithm for finding three completely independent spanning engineering from Wuhan University, Wuhan, China,
trees,” Theor. Comput. Sci., vol. 784, pp. 65–74, Sep. 2019. in 2003 and 2007, respectively. He is currently
[49] S. Ge, C. Li, S. Zhao, and D. Zeng, “Occluded face recog- a Senior Engineer with CIGIT, CAS. He is also
nition in the wild by identity-diversity inpainting,” IEEE Trans. the Director of the Research Center for Intelligent
Circuits Syst. Video Technol., vol. 30, no. 10, pp. 3387–3397, Security Technology, CIGIT. He has published more
Oct. 2020. than 20 patents and obtained four patent licenses.
[50] S. Ge, S. Zhao, C. Li, Y. Zhang, and J. Li, “Efficient low-resolution He is the West Light A Class awarded by the Chinese
face recognition via bridge distillation,” IEEE Trans. Image Process., Academy of Sciences.
vol. 29, pp. 6898–6908, 2020.

Authorized licensed use limited to: Zhejiang University. Downloaded on June 12,2025 at 08:48:07 UTC from IEEE Xplore. Restrictions apply.

You might also like