Literature Survey
Literature Survey
For the past several years, there has been an increasing interest among researchers in
the problem related to extracting text from video. Intensive research has been carried
out in this area, which is evident from large number of technical papers. One such
Zoe Jeffrey, Xiaojun Zhai et al., have proposed a method of Automatic number plate
recognition system based on ARM- DSP The arithmetic capability of digital signal
processors (DSPs), the multiple peripheral interfaces and the high frequency
execution of the ARM processors make them an attractive choice for real time
embedded systems. DSPs are already widely used for applications such as audio and
speech processing, image and video processing, and wireless signal processing.
Practical applications include surveillance, video encoding and decoding, and object
tracking and detection in images and video. On the other hand, rapid development of
Field Programmable Gate Arrays (FPGAs) offers alternative way to provide a low
cost acceleration for computationally intensive tasks such as digital signal processing.
Most of these applications use ARM, DSPs and FPGAs due to the processing power
offered, in order to provide portability and real-time capability, and create custom
embedded architectures for different application requirements. The main goal of this
work is to design and implement efficient and novel architectures for automatic
which operates in high definition (HD) and in real time. In addition, a separate ANPR
FPGAs which accelerate digital image processing algorithms. The investigation of the
7
algorithm and its optimization focused on real time image and video processing for
(NPS) and optical character recognition (OCR) in particular, which are the three key
stages of the ANPR process. ANPR often forms part of an intelligent transportation
systems. Its applications include identifying vehicles by their number plates for
The distance at which a vehicle plate could be identified using a specified lens at
maximum zoom is provided in the work by Mike Constant [10]. The distance can
vary from 100 meters to 300 meters in some cases. The common guidelines suggest
that, to read a number plate, the car should be 50% of the screen height. The height of
the vehicle is assumed as 1.5 meters and the size of the lens as 7.5-75 mm.
Michael Lidenbaum et al. has devised an algorithm for moving car license plate
license plate number. The recognition will be performed in almost real time, watching
cars passing at low speed in front of video recording device. In the beginning, a video
is taken on a sunny day with ordinary camera settings. During the development, he
concluded, that the current picture quality is excellent for first task, cutting license
plate containing frames, but the second part, number recognition was almost
The pictures were taken with normal exposure time that caused smoothing of
The number was too small and too few pixels available to analyze.
8
The picture with small exposure time also was not used because the dynamic range of
the picture was too small for reliable detection of yellow color. The acceptable quality
of the picture was achieved when the tradeoff between exposure time and dynamic
The objects are in fact, portions from the original image having a higher average
distance from the camera, etc), the area containing the car license plate number proves
to be one of these objects. Once the interest areas are thus segmented, they are smartly
binarized (with the aid of some statistical methods and using several test points), and
hyper sphere classifiers which led us several years ago to a neural like technology,
The first training experiments were focused on car license plates recognition. The
number of test images taken was about 200 images in various conditions. This amount
of training ensured a recognition ratio of well over 99%. It is emphasized again that
the recognition/training system can, in fact, learn any kind of machine printed text, if
algorithm are the features extracted for the classifier. Feature analysis determines the
descriptors or feature set, used to describe all characters. Given a character image, the
9
feature extractor derives the features that the character possesses. The derived features
The process was similar to a skeletonization, with only the significant internal borders
of the objects being highlighted. Although this style of representation could greatly
on the top half of the labels are incredibly dense. So, rather than emphasizing on the
shape of the characters, median thresholding often distorts them to the point that they
are unrecognizable.
The label occupies less than 80% of the image, which causes a large portion of
Many of the characters on the top half of the label are facing at an upward
angle, resulting from the depths of the threads. This may have important
this in future.
10
Finally, the iteration is performed on the image, and translates each of these objects
into individual portable bitmaps to maintain uniqueness of each of these files which
that was able to read uppercase typewritten output at the fantastic speed of one
character per minute. During the late 1960’s, the technology underwent many
dramatic developments, but OCR systems were considered exotic and futuristic, being
Today, OCR systems are less expensive, faster, and more reliable, it is not uncommon
to find PC-based OCR systems, which are capable of recognizing several hundred
characters per minute. Less expensive electronic components and extensive research
have paved the way for these new systems. Commercial OCR systems can largely be
grouped into two categories: task-specific readers and general purpose page readers:
The first technique that was considered by mean and median thresholding, works very
similar to mean and median smoothing. That is, a neighborhood, typically 3x3 or 5x5,
is analyzed and, for mean filtering, the bisection is determined by the average of the
surrounding pixel values. Using this procedure, though, the same issues are predicted
as mean smoothing that is, an average would weaken the edges of the objects we are
attempting to detect.
Though the classification in the bottom half of the image has been obtained, there is
text as possible, the prediction that this was not the expected result.
11
According to Sorin Draghici et al., an artificial neural network based artificial vision
system is able to analyze the image of a car given by a camera, locate the registration
plate and recognize the registration number of the car. This paper describes in detail
and the methods used to solve them. The main features of the system presented are
and online learning, self assessment of the output reliability and high reliability based
The system proposed by Sorin Draghici et al., has designed using a modular approach
which allows easy upgrading and/or substitutions of various sub modules thus making
it potentially suitable in large range of vision applications. The OCR engine was
OCR engine which is suited to the particular application and to upgrade it easily in
future. At present, there are several versions of OCR engine. One of them is based
on fully connected feed forward artificial neural network with sigmoidal activation
functions. This network can be trained with various training algorithms such as error
The system has showed the following performance (on average) on real-world data
12
Leonard G.C.Hamy et al., has described the task of recognition of Australian vehicle
number plates (also called license plates or registration plates in other countries). A
system for Australian number plate recognition must cope with wide variations in the
appearance of the plates. Each state uses its own range of designs with font variations
between the designs. There are special designs issued for significant events such as
the Sydney 2000 Olympic Games. Also, vehicle owners may place the plates inside
glass covered frames or use plates made of non-standard materials. These issues
successfully locate and read Australian vehicle number plates in digital images.
According to Serkan Ozbay et al., and Ergun Ercelebi et al., Automatic Vehicle
Identification (AVI) has many applications in traffic systems (highway electronic toll
collection, red light violation enforcement, border and customs checkpoints, etc.).
License Plate Recognition is an effective form of AVI systems. In their study, a smart
and simple algorithm is presented for vehicle’s license plate recognition system. The
2. Segmentation of characters
For extracting the plate region, edge detection algorithms and smearing algorithms
are used. In segmentation part, smearing algorithms, filtering and some morphological
13
algorithms are used. Also finally statistical based template matching is used for
recognition of plate characters. The performance of the proposed algorithm has been
Halina Kwasnicka et al., and Bartosz Wawrzyniak et al., have described an approach to
license plate localization and recognition. They proposed a method which is designed
conditions. The main assumption of their method is the ability of recognition of all
license plates which can be found in an individual picture. To solve the problem of
localization of a license plate, two independent methods are used. The first one was
based on the connected components analysis and the second one search for the
network is used to recognize them. Finally, to separate correct license plates from
other captions in the picture, during the license plate recognition process, a syntax
analysis is used. The proposed approach is discussed together with results obtained on
a benchmark data set of license plate pictures. In this paper, examples of correct and
proposed method.
According to Hyo Jong Lee et al., although the recognition of a license plate number
or vehicle type has been researched, the recognition of vehicles using all features has
not been studied due to its complexity. In this paper, a novel method is proposed to
identify vehicles with specific information that is color, license plate, and vehicle’s
model. Low level image processing and texture descriptors are computed from the
14
front image of vehicles. Then, two three layer neural networks were built and trained
Zhong et al. (1995) has located text in images of compact disc, book cover, or traffic
scenes in two steps. In the first step, approximate locations of text lines were obtained
and then text components in those lines were extracted using color segmentation. Wu
et al. (1999) has proposed a texture segmentation method to generate candidate text
regions. A set of feature components is computed for each pixel and these are
Shivakumara et al. (2010) has proposed an algorithm to detect video text for low and
high contrast images, which are classified by analysing the edge difference between
Sobel and Canny edge detectors. After computing edge and texture features, low-
contrast and high-contrast thresholds are used to extract text objects from low and
Shyang-Lih Chang et al., Li-Shien Chen et al., Yun-Chung Chung et al., and Sei-Wan
Chen et al., Automatic license plate recognition (LPR) plays an important role in
most of them worked under restricted conditions, such as fixed illumination, limited
vehicle speed, designated routes, and stationary backgrounds. In this study, as few
LPR technique consists of two main modules: a license plate locating module and a
attempts to extract license plates from an input image, while the latter conceptualized
in terms of neural subjects aims to identify the number present in a license plate.
15
Experiments have been conducted for the respective modules. In the experiment on
locating license plates, 1088 images taken from various scenes and under different
conditions were employed. Of which, 23 images have been failed to locate the license
plates present in the images; the license plate location rate of success is 97.9%. In the
experiment of identifying license number plate, 1065 test images from which license
plates have been successfully located. In which, 47 images have been failed to
identify the numbers of the license plates located in the image. The identification rate
of success is 95.6%. Combing the above success rates, the overall rate of success of
Prathamesh Kulkarni et al., Ashish Khatri et al., Prateek Banga et al., Kushal Shah et al.,
Automatic Number Plate Recognition (ANPR) is a real time embedded system which
automatically recognizes the license number of vehicles. In this paper, the task of
recognizing number plate for Indian conditions is considered, where number plate
standards are rarely followed. The system consists of integration of algorithms like:
‘Feature-based number plate Localization’ for locating the number plate, Image
Scissoring for character segmentation and statistical feature extraction for character
recognition; which is specifically designed for Indian number plates. The system can
recognize single and double line number plates under widely varying illumination
Papavassiliou et al. (2007) has proposed a parametric spectral-based method for text
verification in videos. By assuming that the horizontal projections of text regions are
periodic, the author has computed the spectrum of the projection and apply linear
prediction coefficients analysis to estimate the poles of the candidate block. The
16
amplitude and angle of the pole and the spectral centroid value of the projection are
used as features to classify candidate text blocks. However, if a text block is mixed
with background edges, the periodicity of the text area is spoiled and the approach
may fail.
Jing Zhang et al. (2008) has proposed a ‘new edge-based text verification approach
for video’. In this paper, they propose a new edge-based text verification approach for
video. Based on the investigation of the relation between candidate blocks and their
neighbor areas, the proposed approach first detects background edges in candidate
blocks, and then erases them by an edge tracking technique, and finally the candidate
blocks containing too few remaining edges are eliminated as false alarms. Three
measures for text detection evaluation in video were used to assess the performance of
Vassilis Papavassiliou et al. (2007) has proposed a new method for verifying text
areas detected in video streams. This algorithm explores the spectral properties of the
horizontal projection of candidate text regions in order to reduce the high amount of
false alarms that most text detection algorithms suffer from. The full algorithm (text
detection module produced 94.82% recall rate but only 51.84% precision rate. The
addition of the verification module increased the precision rate to 78.93%, keeping the
The closest related work is that of Li et al. (2000) for video text tracking. The system
includes a component for text frame classification to find the first text frame in a
17
video stream in order to start text tracking. The method of text frame classification is
based on a supervised learning method using a neural network classifier. The method
is thus dependent on the training set and requires considerable training time for the
use of the neural network classifier. It serves also a different objective from our
present work as our aim is to classify a set of unknown video images into classes of
text and non-text frames. Li’s system, on the other hand, is to locate a starting text
frame from a video stream known to contain text using a training set of video text
text detection through Bayesian classification and boundary growing method. They
presented a new enhancement method that includes product of Laplacian and Sobel
operation to enhance text pixels in video. To classify true text pixels, they propose a
Bayesian classifier without assuming a priori probability about the input frame but
estimating it based on three probable matrices. Three different ways of clustering are
classifier with the canny edge map of the input frame. A boundary growing method is
introduced to traverse the multi-oriented scene text lines using text candidates. The
Boundary growing method works based on the concept of nearest neighbor. The
robustness of the method has been tested on a variety of datasets that include their
own created data (non-horizontal and horizontal text data) and two publicly available
data namely video frames of Hua and complex scene text data of ICDAR 2003
18
Shivakumara et al (2011) have proposed a laplacian approach to multi oriented text
detection in video. Unlike many other approaches which assume that text is
horizontally oriented, this method is able to handle text of arbitrary orientation. The
input image is first filtered with Fourier-Laplacian. K-means clustering is then used to
identify candidate text regions based on the maximum difference. The skeleton of
each connected component helps to separate the different text strings from each other.
Finally, text string straightness and edge density are used for false positive
elimination.
Lukas Neumann and Jiri Matas (2012) have proposed a method real-time scene text
External Regions (ERs). The ER detector is robust to blur, illumination, color and
texture variation and handles low contrast text. In the first classification stage, the
with O(1) complexity per region. Only ERs with locally maximal probability are
selected for the second stage, where the classification is improved using more
feedback loops is then applied to group ERs into words and to select the most
Shivakumara et al (2008, 2009, 2010, and 2012) have proposed method for text
detection in video image and camera image as well based on edge features and texture
features. The main focuses of these methods is text detection in video but not text
19
detection in natural scene images. Therefore, the methods give good accuracy for
Yang Zhang et al. (2012) have proposed a ‘new method for text verification based on
random forests’. In this paper, they would exploit the performance of random forests
for text verification. And to combine different features with random forests trained
with different kinds of features, they can improve the accuracy of classification.
Experimental results demonstrate that random forests are suitable for text verification,
superior or comparable with SVM and it can improve the accuracy of classification by
In the present state of art, the concerned authorities have to stop the vehicle, and ask
the drivers to produce the documents related to all these information. Some of the
information like tax paid receipt and insurance documents can be verified. However,
other information as we have mentioned above can be verified on case to case basis,
i.e., if there is a request by the higher authorities to check, only then it is verified.
Tracking down these types of vehicles manually is difficult task because the
authorities have to monitor the vehicles day and night. Another difficulty is to note
down the number that is present on the vehicle, whether the tax payment is up to date,
and it is also difficult to find the duplication of the numbers. Automating this process
by placing a camera at a constant position could get over these problems. The camera
will take the pictures and using these pictures, further processing can be done.
The main goal is to build a prototype system, which should be capable of recognizing
real time, watching cars passing at low speed in front of video recording device.
20
Locating and detecting text in video is an interesting and real time research problem,
which finds lot of applications in multimedia related area. This problem is nearer to
the human perception as some of the strategies can be taken from human perception.
In this work, a method is proposed to locate the vehicle number written in the front
or back panel of the vehicle. The input is taken from a stationary camera, which
continuously takes the video of the passing vehicles through it. The problem of
and correction and segmentation. Quality of the video produced by camera is not
noise removal, edge detection, is done on the recorded video. Any standard OCR can
be used at later stage to identify the text. Since the domain of the characters is very
limited in the text of vehicle number, high recognition rate can be expected in the
algorithm, which must be as simple as possible, since the types of characters that
appear on the number plates are limited. Some of the papers which inspired us in
21