ECE2006-Digital Signal Processing
HELMET AND VEHICLE
LICENSE PLATE DETECTION
SYSTEM
18BEC1075 AMSHUMAN G
18BEC1150 J KIRON
18BEC1152 B VISWESHWARAN
Objective
Since a lot of motorcycle accidents are happening in our country, it is necessary for riders
to wear helmet and any violation of the rule must be penalized. With such a large
population, it would be efficient to use an automatic system that would detect helmetless
riders and their number plates for the traffic police act upon it.
Our project aims to implement exactly this using Machine Learning to detect helmetless
drivers and Optical Character Recognizer (OCR) to extract the number plate.
Input and Output
The input to the system will be images of motorcycles with riders wearing/not wearing helmets.
The images will contain the license plate number of the motorcycle as well.
The output of the system will be the input image with bounding boxes of motorbikes and
persons. If the rider is not wearing helmet, then the license plate number of the helmetless driver
will be displayed.
The dataset for training helmet detection was taken from
https://drive.google.com/drive/folders/1TwicJ2kMf3YLH1TwDZOZKSwyARlezcCt
Software Tool
• We used Python 3.8 to code for the helmet detection and number plate recognition
using Machine Learning and Deep Learning algorithms, and OCR.
• Libraries such as keras, numpy, imutils Open CV2, pytesseract, etc. were used.
Flow Diagram
Input image
Motorbike detection Helmet Classification
Yes
If No
Helmet License plate detection
present
License Plate number OCR
Packages:
Keras: Keras is the high-level API of TensorFlow 2.0: an approchable, highly-productive interface for solving
machine learning problems, with a focus on modern deep learning. It provides essential abstractions and
building blocks for developing and shipping machine learning solutions with high iteration velocity.
Open CV: OpenCV (Open Source Computer Vision Library) is an open source computer vision and machine
learning software library. OpenCV was built to provide a common infrastructure for computer vision
applications and to accelerate the use of machine perception in the commercial products.
NumPy: It is a library for the Python programming language, adding support for large, multi-dimensional
arrays and matrices, along with a large collection of high-level mathematical functions to operate on these
arrays.
imutils: A series of convenience functions to make basic image processing functions such as translation,
rotation, resizing, skeletonization, displaying Matplotlib images, sorting contours, detecting edges with
OpenCV and Python 2.7
Pytessract: Python-tesseract is an optical character recognition (OCR) tool for python. That is, it will recognize
and “read” the text embedded in images.
Algorithm:
Convolution Neural Network (CNN)
CNN image classifications takes an input image, process it and classify it under certain
categories (Eg., Dog, Cat, Tiger, Lion). Computers sees an input image as array of pixels
and it depends on the image resolution. Based on the image resolution, it will see h x w x
d( h = Height, w = Width, d = Dimension ). Eg., An image of 6 x 6 x 3 array of matrix of
RGB (3 refers to RGB values) and an image of 4 x 4 x 1 array of matrix of grayscale
image.
Technically, deep learning CNN models train and test each input image and pass it through
a series of convolution layers with filters (Kernals), Pooling, fully connected layers (FC)
and apply Softmax function to classify an object with probabilistic values between 0 and
1.
Layers
Convolution Layer: Convolution is the first layer to extract features from an input image.
Convolution preserves the relationship between pixels by learning image features using
small squares of input data. It is a mathematical operation that takes two inputs such as
image matrix and a filter or kernel.
Layers
Convolution of an image with different filters can perform operations such as edge
detection, blur and sharpen by applying filters. The below example shows various
convolution image after applying different types of filters (Kernels).
Layers
ReLU: It stands for Rectified Linear Unit for a non-linear operation. The output is ƒ(x) =
max(0,x). ReLU’s purpose is to introduce non-linearity in our ConvNet.
Pooling Layer: Pooling layers section would reduce the number of parameters when the
images are too large. Max pooling takes the largest element from the rectified feature map.
Architecture of CNN model
MobileNet SSD:
The MobileNet-SSD model is a type of CNN model that is a combination of MobileNet
model and SSD model. SSD is a Single-Shot Multibox Detection (SSD) network intended
to perform object detection. This model is implemented using the Caffe deep learning
framework
• Single Shot: This means that the tasks of object localization and classification are done in
a single forward pass of the network
• MultiBox: This is the name of a technique for bounding box regression developed by
Szegedy et al
• Detector: The network is an object detector that also classifies those detected objects.
Other Formulae Used:
Loss function:
Mean Subtration:
Scaling Factor:
What is OCR?
OCR systems transform a two-dimensional image of text, which has machine printed or
handwritten text from its image representation into machine-readable text.
Steps Involved:
Pre-processing
Text Recognition
Post-processing
Pre-processing
Raw Image: Despeckle:
De-skew: Binarisation:
Additional Pre-processing Methods Used
Bilateral filter:
LHS is the result at pixel p, and the RHS is essentially a sum over all pixels q
weighted by the Gaussian function. I_q is the intensity at pixel q.
Gaussian Function: for a kernel size 2k+1 x 2k+1
Canny Edge Detection:
It is a 4 step process.
1. Noise Reduction: First step is to remove the noise in the image with a 5x5 Gaussian
filter.
2. Intensity Gradient: Smoothened image is then filtered with a Sobel kernel in both
horizontal and vertical direction to get Gx and Gy.
Edge gradient and direction for each pixel:
Gradient direction is always perpendicular to edges.
Canny Edge Detection:
3. Non-maximum Suppression:
Non-maximum suppression is applied to find the locations with the sharpest
change of intensity value.
Compare the edge strength of the current pixel with the edge strength of the pixel
in the positive and negative gradient directions.
If the edge strength of the current pixel is the largest compared to the other pixels
in the mask with the same direction, the value will be preserved.
Canny Edge Detection:
4. Hysteresis Thresholding:
This stage decides which are all edges are really edges and which are
not. For this, we need two threshold values, minVal and maxVal.
Text Recognition
Pattern recognition: pixel by pixel Feature extraction: This method
comparison decomposes characters into "features"
like lines, closed loops, line direction,
and line intersections.
Uses K-NN Algorithm
pytesseract : RNN-LSTM
A recurrent neural network (RNN) is a class of artificial neural networks where
connections between nodes form a directed graph along a temporal sequence.
Internal Structure of RNN:
LSTM:
4 interactive layers
Can remember information for long periods of time
Steps Involved
Forget Layer: Decides what percentage Cell state is updated
of data is to be kept
Input gate layer and tanh layer decides Output stage: filtered version of cell
what data is to be updated and added.
state
Input and Output
Case 1:
Input: Output:
Input and Output
Case 2:
Input: Output: