A Project Skill Lab Report On: Department of Computer Applications
A Project Skill Lab Report On: Department of Computer Applications
Bachelor of technology
By
Mr . E.Purushotham
Assistant Professor
At
Institute Vision
Institute Mission
Provide congenial academic ambience with state-art of resources for learning and
research.
COs \
POs
CO1 √
CO2 √
CO3 √
CO4 √
CO5 √
CO6 √
CO7 √
CO8 √
CO9 √
CO10 √
CO11 √
CO12 √
Evaluation Rubrics for Project work:
Incomplete literature
Extensive literature Considerable literature survey with
Literature Survey
survey with standard survey with standard substandard
(CO4)
references. references. references.
Conclusion of
Conclusion of project Conclusion of project project work has
Project work impact on
work has strong impact work has considerable feeble impact on
Society (CO6)
on society. impact on society. society.
Conclusion of project Conclusion of
Conclusion of project
Project work impact work has strong impact work has considerable project work has
on Environment (CO7) on impact on feeble impact on
Environment. environment. environment.
Moderate Insufficient
Clearly understands understanding of understanding of
Ethical attitude (CO8) ethical and social ethical and social ethical and social
practices. practices. practices.
Presentation in logical
sequence with key Presentation with
Presentation with key
Oral Presentation points, clear insufficient key
points, conclusion and
(CO10) conclusion and points and improper
good language
excellent language conclusion
Time and Cost Comprehensive time Moderate time and Reasonable time and
Analysis (CO11) and cost analysis cost analysis cost analysis
This is to certify that the project Skill Lab work entitled“HANDWRITTEN DIGIT
RECOGNITION USING PYTHON“ is carried out by S.Durga Prasad ( Reg. No
19751A0599) , R.Balaji (Reg.No.19751A0592) ,N.Likhit (Reg.No.19751A0571) S.Ravi
Kishore (Reg. No .19751A05A6).under my supervision and guidance during the academic year
2020-2021, in partial fulfillment of the requirements for the award of the degree of Bachelor of
Technology.
I affirm that the project Skill Lab work titled being HANDWRITTEN DIGIT
RECOGNITIONUSING PYTHON submitted in partial fulfillment for the award of
Bachelor of technology is the original work carried out by me. It has not formed the
part of any other project work submitted for award of any degree, either in this or any
other university.
ABSTRACT i
LIST OF FIGURES ii
1 INTRODUCTION 01-06
1.1 Aim and Objective
1.2 Motivation
1.3 pre-requests
1.4 The MNIST dataset
1.5 what is deep learning
1.6 what is machine learning
1.7 Deep Learning vs Machine
Learning
2 INSTALLATIONS 07-13
2.1 Python Installation using python
2.2 kerans installation using windows
3 METHODOLOGY AND RELATABLE 14-20
WORK
3.1.Dataset
3.2 Support Vector Machine
3.3 Multilayered Perceptron
3.4 Convolutional Neural
Network
3.5 Visualization
3.6 Related Works
4 IMPLEMENTATION 21-24
4.1 Pre-Processing
4.2 Support Vector Machine
4.3 Multilayered Perceptron
4.4 .Convolutional Neural
Network
5 PROJECT BUILDING 25-33 5.1
Import The Libraries And Load
The Dataset
5.2Preprocess The Data
5.3 Create The Model
5.4Train The Model
5.5 Evaluate The Model
5.6Create GUI To Predict Digits
5.7 Source Code
5.1.1 Code For Model Training
5.1.2 Code For GUI
5.8 Output
6 RESULT 34-38
7 CONCLUSION AND FUTURE 39-40
ENHANCEMENT
7.1 Conclusion
7.2 Future Enhancement
8 REFERENCES
i
ABSTRACT
The reliance of humans over machines has never been so high such that
from object classification in photographs to adding sound to silent movies
everything can be performed with the help of deep learning and machine learning
algorithms. Likewise, Handwritten text recognition is one of the significant areas
of research and development with a streaming number of possibilities that could
be attained. Handwriting recognition (HWR), also known as Handwritten Text
Recognition (HTR), is the ability of a computer to receive and interpret
intelligible handwritten input from sources such as paper documents,
photographs, touch-screens and other devices [1]. Apparently, in this paper, we
have performed handwritten digit recognition with the help of MNIST datasets
using Support Vector Machines (SVM), Multi-Layer Perceptron (MLP) and
Convolution Neural Network (CNN) models. Our main objective is to compare
the accuracy of the models stated above along with their execution time to get the
best possible model for digit recognition.
LIST OF FIGURES
Sl no Figures Pg.no
1.1 Deep Learning 04
1.2 Machine Learning 5
3.1 Bar graph illustrating the MNIST handwritten digit 15
training dataset
3.2 Plotting of some random MNIST Handwritten digits 15
3.3 working mechanism of SVM Classification 16
3.4 the basic architecture of the Multi layeyer perceptron 17
3.5 the architectural design of CNN layers 18
4.3.1 Sequential Block Diagram of Multi-layers perceptron 22
model
6.1 CNN Bar graph depicting accuracy comparison 35
LIST OF ABBREVATIONS
CHAPTER-I
INTRODUCTION
Aim:
1.2 Motivation:
This thesis is conducted by using Machine learning concepts. Before going deep
into the topic, we must know about some of these concepts. Machine Learning is a
method which trains the machine to do the job by itself without any human interaction.
At a high level, machine learning is the process of teaching a computer system on how
to make accurate predictions when fed the data. Those predictions will be the output.
There are many sub-branches in machine learning like Neural Networking, Deep
Learning, etc[1].
Among these, Deep Learning is considered to be the most popular sub-branch
of Machine Learning. Initially, the idea of Machine Learning has come into existence
during the 1950s, with the definition of perception[2].
It is the first machine which was capable of sensing & learning. Further, there
was multilayer perceptron in the 1980s, with a limited number of hidden layers.
2
However, the concept of perceptron was not in usage because of its very limited
learning capability. After many years, in the early 2000s, a new concept called Neural
Networks came into existence with many hidden layers[3].
After the emergence of neural networks, many machine learning concepts like
deep learning came into force with multiple levels of representation. Because of these
multiple levels of representation phenomenon, it has become easy to learn and recognize
machines. The human brain is considered as a reference to build deep learning concepts,
as the human brain similarly processes information in multiple layers[4].
A human can easily solve and recognize any problem, but this is not the same in
the case of a machine. Many techniques or methods should be implemented to work as
a human. Apart from all the advancements that have been made in this area, there is still
a significant research gap that needs to be filled. Consider, for example, online
handwriting recognition vs offline recognition [5]. In online handwriting recognition of
letters, an on-time compilation of letters is performed while writing because stroke
information is captured dynamically[5]. Whereas, in offline recognition, the letters
aren’t captured dynamically. Online handwriting recognition is more accurate when
compared to offline handwriting recognition because of the lack of information[6].
Therefore, there can be research done in this area to improve offline handwriting
recognition.
1.3 PREREQUISITES
The interesting Python project requires you to have basic knowledge of Python
programming, deep learning with Keras library and the Tkinter library for building GUI.
images are represented as a 28×28 matrix where each cell contains grayscale
pixel value.
If deep learning is a subset of machine learning, how do they differ? Deep learning
distinguishes itself from classical machine learning by the type of data that it works with
and the methods in which it learns.
Machine learning algorithms leverage structured, labeled data to make
predictions—meaning that specific features are defined from the input data for the
model and organized into tables. This doesn’t necessarily mean that it doesn’t use
unstructured data; it just means that if it does, it generally goes through some
preprocessing to organize it into a structured format.
Deep learning eliminates some of data pre-processing that is typically involved
with machine learning. These algorithms can ingest and process unstructured data, like
text and images, and it automates feature extraction, removing some of the dependency
on human experts. For example, let’s say that we had a set of photos of different pets,
and we wanted to categorize by “cat”, “dog”, “hamster”, et cetera. Deep learning
algorithms can determine which features (e.g. ears) are most important to distinguish
each animal from another. In machine learning, this hierarchy of features is established
manually by a human expert.
Then, through the processes of gradient descent and backpropagation, the deep
learning algorithm adjusts and fits itself for accuracy, allowing it to make predictions
about a new photo of an animal with increased precision.
Machine learning and deep learning models are capable of different types of learning as
well, which are usually categorized as supervised learning, unsupervised learning, and
reinforcement learning. Supervised learning utilizes labeled datasets to categorize or
make predictions; this requires some kind of human intervention to label input data
correctly. In contrast, unsupervised learning doesn’t require labeled datasets, and
instead, it detects patterns in the data, clustering them by any distinguishing
characteristics. Reinforcement learning is a process in which a model learns to become
more accurate for performing an action in an environment based on feedback in order
to maximize the reward.
6
CHAPTER 2 INSTALLATIONS
All the available versions of Python will be listed. Select the version required by you
and click on Download. Let suppose, we chose the Python 3.9.1 version.
Run the installer. Make sure to select both the checkboxes at the bottom and then click
Install New.
The installation process will take few minutes to complete and once the installation is
successful, the following screen is displayed.
To ensure if Python is succesfully installed on your system. Follow the given steps −
• The version of the python which you have installed will be displayed if the
python is successfully installed on your windows.
D:\cudnn-10.1-windows10-x64-v7.5.0.56
Add the following path in your Environment. Subjected to changes in your installation
path.
D:\cudnn-8.0-windows10-x64-v5.1\cuda\bin
You can either follow this Tutorial here or the following steps (for Windows 10).
Step 6.1: open the Start Search, type in “env”
11
Step 6.2: choose “Edit environment variables for your account”: step
6.3:
under the “Users' Variables” section (the upper half), find the row with “Path” in the
first column, and click edit.
Turn off all the prompts. Open a new Anaconda Prompt to type the following
command(s)
echo %PATH%
You shall see that the new Environment PATH is there.
keras-gpu
STEP 9: TESTING
In the event that you get a tensorflow Attribute error, ensure you do the following then
try again:
CHAPTER 3
METHODOLOGY AND RELATABLE WORK
The comparison of the algorithms (Support vector machines, Multi-layered perceptron
and Convolutional neural network) is based on the characteristic chart of each algorithm
on common grounds like dataset, the number of epochs, complexity of the algorithm,
accuracy of each algorithm, specification of the device (Ubuntu 20.04 LTS, i5 7th gen
processor) used to execute the program and runtime of the algorithm, under ideal
condition.
13
3.1.DATASET
Figure 3.1. Bar graph illustrating the MNIST handwritten digit training dataset
(Label vs Total number of training samples).
14
Figure 3.3. This image describes the working mechanism of SVM Classification with
supporting vectors and hyperplanes.
Each layer consists of several nodes that are also formally referred to as neurons and
each node is interconnected to every other node of the next layer. In basic MLP there
are 3 layers but the number of hidden layers can increase to any number as per the
problem with no restriction on the number of nodes. The number of nodes in the input
and output layer depends on the number of attributes and apparent classes in the dataset
respectively. The particular number of hidden layers or numbers of nodes in the hidden
layer is difficult to determine due to the model erratic nature and therefore selected
experimentally. Every hidden layer of the model can have different activation functions
for processing. For learning purposes, it uses a supervised learning technique called
backpropagation. In the MLP, the connection of the nodes consists of a weight that gets
adjusted to synchronize with each connection in the training process of the model[11].
Figure 3.4. This figure illustrates the basic architecture of the Multi layer perceptron
with variable specification of the network.
CNN is a deep learning algorithm that is widely used for image recognition and
classification. It is a class of deep neural networks that require minimum pre-processing.
It inputs the image in the form of small chunks rather than inputting a single pixel at a
time, so the network can detect uncertain patterns (edges) in the image more efficiently.
CNN contains 3 layers namely, an input layer, an output layer, and multiple hidden
layers which include Convolutional layers, Pooling layers(Max and Average pooling),
16
Fully connected layers (FC), and normalization layers [12]. CNN uses a filter (kernel)
which is an array of weights to extract features from the input image. CNN employs
different activation functions at each layer to add some non-linearity [13]. As we move
into the CNN, we observe the height and width decrease while the number of channels
increases. Finally, the generated column matrix is used to predict the output [14].
Figure 3.5. This figure shows the architectural design of CNN layers in the form of a Flow chart.
3.5. VISUALIZATION
In this research, we have used the MNIST dataset (i.e. handwritten digit dataset)
to compare different level algorithm of deep and machine learning (i.e. SVM,
ANNMLP, CNN) on the basis of execution time, complexity, accuracy rate, number of
epochs and number of hidden layers (in the case of deep learning algorithms). To
visualize the information obtained by the detailed analysis of algorithms we have used
bar graphs and tabular format charts using module matplotlib, which gives us the most
precise visuals of the step by step advances of the algorithms in recognizing the digit.
The graphs are given at each vital part of the programs to give visuals of each part to
bolster the outcome.
The following are some of the terms and concepts used in this research. Our work
performance of machine learning methods by using a support vector machine, artificial
neural network and convolutional neural network on handwritten digits recognition is
inspired by a few related works[55]. While, applying this three classifier SVM, ANN,
and CNN to recognizing digits with noise. It demonstrated that SVM, ANN and CNN
system can achieve high accuracy on recognition of handwritten digits on documented
images[39]. However, these methods are used in this work to find the best algorithm for
handwritten digits recognition. They were few drawbacks identified by the research
area, by this, we can say that it is important to conduct a pre-study in order to understand
the work that has been already done on classifying the methods and to understand the
limitations of existing machine learning methods[10]. The results from the literature
review give us a lot of existing research area on preprocessing, segmentation, feature
extraction with specific techniques and classification to recognize the digits In the paper
[93], the authors have conducted research related to “Handwritten Word Recognition
Using Multi-view Analysis”. The major contribution of this research is a solution to the
problem of efficiently recognizing handwritten words from a limited size lexicon. The
authors developed a multiple classifier systems, that analyzes the words from three
different approximation levels, in order to get a computational approach inspired by the
human reading process. The authors of the paper [94] have conducted research related
to “Handwriting Recognition On Form Document”. The author used Freeman Chain
Code, with the division of a region into nine sub-regions, histogram normalization of
chain code as feature extraction and Artificial Neural Networks, to classify the
characters on the form document. In the paper [95], the authors have conducted research
related to “Neural Networks for Handwritten English Alphabet Recognition.” They
have developed a system to recognize handwritten English alphabets by using neural
networks. In this system, each alphabet has been represented by binary values that are
used as an input to a simple feature extraction system, whose output is fed to the neural
network system. In the paper [96], The authors have extracted the features of numeral
and mathematical operators. They have used SVM for classification as well as to remove
18
the noise from the dataset. A feature extraction method has been used on NIST dataset
which consists of uppercase, lowercase, and merger of uppercase and lowercase.
The authors of the paper [97] “Sunspot drawings handwritten character recognition
method based on deep learning”, presented a deep learning method for scanned sunspot
drawings handwritten characters recognition. A Convolution Neural Network, which is
a type of deep learning algorithm and is truly successful in the training of multi-layer
network structure, is used to train the recognition model of handwritten character
images. The advantages of the proposed method by Chinese Academy Yunnan and the
experimental results show that the proposed method achieves a high recognition
accuracy rate. The authors of [98] “New approach for segmentation and recognition of
handwritten numeral strings” have proposed a new system for segmentation and
recognition of unconstrained handwritten numeral strings. The proposed system uses a
combination of foreground and background features for segmentation of touching digits.
In this paper [99], the authors have proposed a directional method for feature extraction
on English handwritten characters. The collected data has been classified based on the
similarity between the vector feature of data training and the vector feature of data
testing. The authors of the paper [100] “New efficient algorithm for recognizing
handwritten Hindi digits”, have presented a new algorithm for recognizing handwritten
Hindi digits, which is based on using the topological characters combined with
statistical properties of the given digits in order to extract a set of features that can be
used in the process of digit classification.
19
CHAPTER 4
IMPLEMENTATION
4.1 PRE-PROCESSING
Pre-processing is an initial step in the machine and deep learning which focuses
on improving the input data by reducing unwanted impurities and redundancy. To
simplify and break down the input data we reshaped all the images present in the dataset
in 2-dimensional images i.e (28,28,1). Each pixel value of the images lies between 0 to
255 so, we Normalized these pixel values by converting the dataset into ’float32’ and
then dividing by 255.0 so that the input features will range between 0.0 to 1.0. Next, we
20
performed one-hot encoding to convert the y values into zeros and ones, making each
number categorical, for example, an output value 4 will be converted into an array of
zero and one i.e [0,0,0,0,1,0,0,0,0,0].
Figure 4.3.1. Sequential Block Diagram of Multi-layers perceptron model built with
the help of Keras module .
We used a neural network with 4 hidden layers and an output layer with 10 units (i.e.
total number of labels). The number of units in the hidden layers is kept to be 512. The
input to the network is the 784-dimensional array converted from the 28×28 image. We
used the Sequential model for building the network. In the Sequential model, we can
just stack up layers by adding the desired layer one by one. We used the Dense layer,
also called a fully connected layer since we are building a feedforward network in which
all the neurons from one layer are connected to the neurons in the previous layer. Apart
from the Dense layer, we added the ReLU activation function which is required to
introduce non-linearity to the model. This will help the network learn non-linear
decision boundaries. The last layer is a softmax layer as it is a multiclass classification
problem [19].
28(Width), 1(Number of channels). Next, we created the model whose first layer is a
Conv layer [20]. This layer uses a matrix to convolve around the input data across its
height and width and extract features from it. This matrix is called a Filter or Kernel.
The values in the filter matrix are weights. We have used 32 filters each of the
dimensions (3,3) with a stride of 1. Stride determines the number of pixels shifts.
Convolution of filter over the input data gives us activation maps whose dimension is
given by the formula: ((N + 2P - F)/S) + 1 where N= dimension of input image, P=
padding, F= filter dimension and S=stride. In this layer, Depth (number of channels) of
the output image is equal to the number of filters used. To increase the non-linearity,
we have used an activation function that is Relu [21]. Next, another convolutional layer
is used in which we have applied 64 filters of the same dimensions (3,3) with a stride of
1 and the Relu function. Next, to these layers, the pooling layer [22] is used which
reduces the dimensionality of the image and computation in the network. We have
employed MAX-pooling which keeps only the maximum value from a pool. The depth
of the network remains unchanged in this layer. We have kept the pool-size (2,2) with
a stride of 2, so every 4 pixels will become a single pixel. To avoid overfitting in the
model, Dropout layer [23] is used which drops some neurons which are chosen
randomly so that the model can be simplified. We have set the probability of a node
getting dropped out to 0.25 or 25%. Following it, Flatten Layer [23] is used which
involves flattening i.e. generating a column matrix (vector) from the 2-dimensional
matrix. This column vector will be fed into the fully connected layer [24]. This layer
consists of 128 neurons with a dropout probability of 0.5 or 50%. After applying the
Relu activation function, the output is fed into the last layer of the model that is the
output layer. This layer has 10 neurons that represent classes (numbers from 0 to 9) and
the SoftMax function [25] is employed to perform the classification. This function
returns probability distribution over all the 10 classes. The class with the maximum
probability is the output.
23
CHAPTER-5
PROJECT BUILDING
The image data cannot be fed directly into the model so we need to perform some
operations and process the data to make it ready for our neural network. The
24
dimension of the training data is (60000,28,28). The CNN model will require one more
dimension so we reshape the matrix to shape (60000,28,28,1).
Now we will create our CNN model in Python data science project. A CNN model
generally consists of convolutional and pooling layers. It works better for data that are
represented as grid structures, this is the reason why CNN works well for image
classification problems. The dropout layer is used to deactivate some of the neurons and
while training, it reduces offer fitting of the model. We will then compile the model with
the Adadelta optimizer.
The model.fit() function of Keras will start the training of the model. It takes the
training data, validation data, epochs, and batch size.
It takes some time to train the model. After training, we save the weights and model
definition in the ‘mnist.h5’ file.
We have 10,000 images in our dataset which will be used to evaluate how good
our model works. The testing data was not involved in the training of the data therefore,
it is new data for our model. The MNIST dataset is well balanced so we can get around
99% accuracy.
Now for the GUI, we have created a new file in which we build an interactive
window to draw digits on canvas and with a button, we can recognize the digit. The
25
Tkinter library comes in the Python standard library. We have created a function
predict_digit() that takes the image as input and then uses the trained model to predict
the digit.
Then we create the App class which is responsible for building the GUI for our app.
We create a canvas where we can draw by capturing the mouse event and with a button,
we trigger the predict_digit() function and display the results.
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras import backend as K
print(x_train.shape,y_train.shape)
26
x_train=x_train.reshape(x_train.shape[0],28,28,1)
x_test=x_test.reshape(x_test.shape[0],28,28,1)
input_shape=(28,28,1)
x_train=x_train.astype("float32")
x_test=x_test.astype("float32")
x_train/=255
x_test/=255
batch_size=128
num_classes=10
epochs=10
model=Sequential()
model.add(Conv2D(32,kernel_size=(5,5),activation='relu',input_sh
ape=input_shape))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D,kernal_size=(5,5),activation='relu')
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Flatten())
model.add(Dense(128,activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(64,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes,activation='softmax'))
27
model.compile(loss=keras.losses.categorical_crossentropy,optimiz
er=keras.optimizer.Adadelta(),metrics=['accuracy'])
hist=model.fit(x_train,y_train,batch_size=batch_size,epochs=epoc
hs,verbose=1,validation_data=(x_test,y_test))
print("The model has successfully trained")
score = model.evaluate(x_test,y_test,verbose=0)
print('Test loss:',score[0])
print('Test accuracy:',score[1])
model.save('mnist.h5')
print("Saving the model as mnist.h5")
model = load_model('mnist.h5')
def predict_digit(img):
#resize image to 28x28 pixels
img = img.resize((28,28))
#convert rgb to grayscale
img = img.convert('L')
img = np.array(img)
28
class App(tk.Tk):
def __init__(self):
tk.Tk.__init__(self)
self.x = self.y = 0
# Creating elements
self.canvas = tk.Canvas(self, width=300, height=300, bg = "white",
cursor="cross")
self.label = tk.Label(self, text="Draw..",font=("Helvetica", 48))
self.classify_btn = tk.Button(self, text = "Recognise", command
=self.classify_handwriting)
self.button_clear = tk.Button(self, text = "Clear",command = self.clear_all)
# Grid structure
self.canvas.grid(row=0, column=0,pady=2, sticky=W, )
self.label.grid(row=0,column=1,pady=2, padx=2)
self.classify_btn.grid(row=1,column=1, pady=2, padx=2)
self.button_clear.grid(row=1, column=0, pady=2)
self.canvas.bind("<B1-Motion>", self.draw_lines)
def clear_all(self):
self.canvas.delete("all")
29
def classify_handwriting(self):
HWND = self.canvas.winfo_id() # get the handle of the canvas
rect = win32gui.GetWindowRect(HWND) # get the coordinate of the canvas
a,b,c,d = rect
rect=(a+4,b+4,c-4,d-4)
im = ImageGrab.grab(rect)
app = App()
mainloop()
30
5.8 Output
Here Are The Screenshots Of The Obtained Output From The Code
31
CHAPTER-6 RESULT
After implementing all the three algorithms that are SVM, MLP and CNN we
have compared their accuracies and execution time with the help of experimental graphs
for perspicuous understanding. We have taken into account the Training and Testing
Accuracy of all the models stated above. After executing all the models, we found that
SVM has the highest accuracy on training data while on testing dataset CNN
accomplishes the utmost accuracy. Additionally, we have compared the execution time
to gain more insight into the working of the algorithms. Generally, the running time of
an algorithm depends on the number of operations it has performed. So, we have trained
our deep learning model up to 30 epochs and SVM models according to norms to get
the apt outcome. SVM took the minimum time for execution while CNN accounts for
the maximum running time.
This table represents the overall performance for each model. The table contains
5 columns, the 2nd column represents model name, 3rd and 4th column represents the
training and testing accuracy of models, and 5th column represents execution time of
models.
32
CHAPTER-7
7.1 CONCLUSION
tracks of the suspicious activity under the system, in fingerprint and retinal scanners,
database filtering applications, Equipment checking for national forces and many more
problems of both major and minor category. The advancement in this field can help us
create an environment of safety, awareness and comfort by using these algorithms in
day to day application and high-level application (i.e. Corporate level or Government
level). Application-based on artificial intelligence and deep learning is the future of the
technological world because of their absolute accuracy and advantages over many major
problems.
37
REFERENCES: