“Face Recognition based
Attendance System”
Submitted in partial fulfilment of the requirements
of the degree of
Bachelor of Engineering
In Instrumentation Engineering
By
Pratima Belel
Sonal Singh
Shivam Yadav
Aparna Singh
Under the Guidance of
Mr. Kader Shaikh
Department of
InstrumentationEngineering
Vivekanand Education Society's
Institute Of Technology
Collector's Colony, Chembur, Mumbai, Maharashtra 400074
May 2021
1
CERTIFICATE
This is to certify that the thesis entitled, “Face Recognition based
Attendance System” is a bonafide work carried out by Sonal Singh,
Pratima Belel, Shivam Yadav, Aparna Singh, submitted to the
University of Mumbai in partial fulfillment of the requirements for
the award of the degree of “Bachelor of Engineering in
Instrumentation Engineering”.
_________________________
Mr. Kader Shaikh
Supervisor /Guide
_________________________ _________________________
Mr. P.P. Vaidya Dr. J.M. Nair
Head of Department. Principal
2
PROJECT REPORT APPROVAL FOR B. E
This thesis / dissertation/project report entitled “ Face Recognition based
Attendance system ” by Pratima Belel , Sonal Singh , Aparna Singh and
Shivam Yadav is approved for the award of
of Bachelor Of Engineering (Instrumentation Engineering).
Examiner 01
_________________________
Examiner 02
_________________________
Date :
Place:
3
DECLARATION
I declare that this written submission represents my ideas in my own
words and where others' ideas or words have been included, I have
adequately cited and referenced the original sources. I also declare
that I have adhered to all principles of academic honesty and
integrity and have not misrepresented or fabricated or falsified any
idea/data/fact/source in my submission. I understand that any
violation of the above will be cause for disciplinary action by the
Institute and can also evoke penal action from the sources which
have thus not been properly cited or from whom proper permission
has not been taken when needed.
Pratima Belel
_________________________
Sonal Singh
_________________________
Aparna Singh
_________________________
Shivam Yadav
_________________________
4
ACKNOWLEDGEMENTS
We are profoundly grateful to Prof. Kader Shaikh for his expert guidance and
continuous encouragement throughout to see that this project meets its target since
its commencement to its completion.
Pratima Belel
Sonal Singh
Shivam Yadav
Aparna Singh
5
INDEX
Chapter 01 - Introduction…………………...9-11
1.1 Overview…………………………………10
1.2 Objective…………………………………11
Chapter 02 - Literature Review………………..12-20
2.1 Introduction…………….………………...13
2.2 Face recognition techniques……………....15
2.2.1. Feature based approach………………....15-16
2.2.2 Neural Networks approach……………....17
2.3. What is DLib……………….……………...17-18
2.4. Basic System Architecture………………...18-20
Chapter 03 - Methodology…………….…………21-29
3.1 Algorithm Description……………………...22
3.1.1. Histogram Oriented Gradients…………...22-23
3.1.2. Convolutional neural networks…………..24-26
3.1.3. Support vector machine ………………….26-27
3.2. Data Source and Data collection…………....28
3.3. Step by Step flow of algorithm……………..29
Chapter 04 - Implementation…………………..30-34
4.1.Installation…………………………………...31
4.1.1. Software requirements…………………….31-33
4.2. Working……………………………………...34
Chapter 05 - Screenshot of code and results…...35-40
5.1. Code screenshot ……………………………...36-38
5.2. Output screenshot……………………………..39
5.3. Conclusion…………………………………….40
Chapter 06 - Future Scope……………………….41-42
6
Chapter 07- References…………………………...43-44
ABSTRACT
7
Face recognition is one of the most important applications in video
surveillance and computer vision. However, the conventional
algorithms of face recognition are susceptible to multiple
conditions, such as lighting, occlusion, viewing angle or camera
rotation. Therefore, face recognition based on deep learning can
greatly improve the recognition speed and compatible external
interference. In this thesis, we use convolutional neural networks
(ConvNets)and Histogram of oriented gradients (HOG) for face
recognition. The neural networks have the merits of end-to-end,
sparse connection and weight sharing.
The purpose of this thesis project is to identify the name of
different people based on the location of the detected box of a face.
This thesis presents different methods and algorithms used in dlib
framework face detector pipeline for the comparison of our model
output, namely, comparing the training results of the same photo
under different models of dlib face detection pipeline
8
CHAPTER 01
INTRODUCTION
9
1.1 Overview
The face is our primary focus of attention in social life playing an
important role in conveying identity and emotions. We can recognize a
number of faces learned throughout our lifespan and identify faces at a
glance even after years of separation. This skill is quite robust despite
large variations in visual stimulus due to changing condition, aging and
distractions such as beard, glasses or changes in hairstyle.
The location of facial components namely eyes, nose, and mouth
constitute prime landmarks in marking the presence of a face in an
image. However, this process would be made difficult when a person
tends to exhibit different expressions and pose variations. Some of the
commonly encountered challenges in facial detection and recognition
include variations in lighting conditions, occlusion, wearing of
spectacles, having more facial hair and so on.
Then face recognition or verification is carried out which checks
whether a given test input containing a face is matched with already
available faces stored in the database or face gallery . Most of the facial
recognition problems deal with feature extraction and machine learning
techniques. Facial recognition tasks are performed in many areas of
image and vision applications where security is more focused without
any second thought of compromising it.
1.2 Objective
Our core objective is to create such a model which would be able to
10
recognize faces giving a higher accuracy rate at the same time a light
weight model with high speed and low computation so that we would
be able to deploy in hardwares like RPi.
Firstly, we need to collect data sets of our classroom student faces. Here
we have collected faces of students from our classroom under the
requirement of our model creation and created a database of the
students with their images for recognition.
Secondly, under the condition of external disturbance factors, we tried
to find an algorithm to avoid the influence of environment and light
and explore whether the system can recognize the occluded face and
the rotated face position.
Finally, we will compare experimental results and explore the effect
of facial proportions with different angles on confidence in
recognition.
11
CHAPTER 02
LITERATURE REVIEW
2.1 Introduction
Being a subset of Artificial Intelligence, Machine Learning enables an
environment to develop a quality of acquiring and understanding
knowledge from previous events. Extrapolation of existing data and
having the foresight of coming values based on sample
inputs is a major part of machine learning. It also focuses on the
12
development of computer programs that can access data and use it to
learn for themselves.
Processing vast amounts of data are made more accessible through
Machine Learning. Artificial intelligence could be a broader conception
than machine learning that addresses the employment of computers to
mimic the psychological feature functions of humans. At the point when
machines do assignments dependent on calculations in a smarter way,
that is AI. Machine learning could be a lot of Artificial Intelligence and
spotlights on the intensity of machines to get a gathering of data and
learn for themselves, regularly changing
calculations as they become familiar with the data they are handling.
Machine Learning can be done in multiple ways as per the
users/programmer requirement. Currently, a highly used method for
Machine Learning has Supervised Learning. Majorly used is
Unsupervised, whereas Reinforcement and Semi-supervised are
used on rare occasions.
Supervised learning: This algorithm has been in the way that it uses its
past to learn new data using examples from the past. The past data is
used as input, output desired is known, and these are the starting steps
for these algorithms to start learning . This algorithm has the ability to
correct its errors and mistakes by comparing the output to the given
output.
13
Unsupervised learning: This algorithm doesn't give any output. This
algorithm has to discover the structure for its output from unparalleled
data.
Like ML, "deep Learning" is likewise a technique that concentrates
highlights or characteristics from crude data sets. The central matter of
contrast is DL does this by using a multi-layer Artificial neural network
with many shrouded layers stacked in a steady progression. DL
additionally has, to some degree, progressively modern algorithms and
requires more powerful computational resources. These are
exceptionally created computers with high-performance CPUs or GPUs.
Artificial neural networks (ANNs) are computing systems that are
actually inspired by biological neural networks. Artificial neural
networks (ANNs) are the systems within the neural networks .
After some time, consideration concentrated on coordinating explicit
mental capacities, prompting deviations from science, for example, back
propagation, or passing data in the switch bearing and altering the
system to mirror that data.
The actual goal of the neural network is to find solutions in the same
way that a human brain would resolve. The input can lead to output
either by using a linear relationship or a non-linear relationship.
2.2 Face recognition technique:
Face recognition techniques used are :
● Feature based approach: depend upon intensity
● Neural networks approach
14
2.2.1 Feature-based approach
Feature-based methodology will process the input picture to
distinguish and remove facial highlights, for example, eyes,
mouth, nose, and so forth and after that, it figures
out the geometric relationship among those facial focuses, along
these lines lessening the input facial picture to a vector of the
geometric elements. This methodology is sub-partitioned into:
• Geometric feature-based matching: These techniques are
dependent on the computation of a set of components from the
image of a face. The complete set can be portrayed as a vector.
The vector represents the position and the span of the primary
facial highlights like the nose, eyes, eyebrows, mouth, jaw, and
the blueprint of the Face. The pros of the approach are, it defeats
the issue of occlusion, and it doesn’t require extensive
computational time. The disadvantage is that it doesn’t provide a
high degree of precision.
• Elastic bunch graph : This approach is based on dynamic link
structures. A diagram for an individual face is produced utilizing
a lot of fiducially focuses on the Face, each fiducially point is a
hub of a completely associated diagram and is named with the
Gabor channels’ reaction.A representative set of such graphs is
combined into a stack-like structure called face bunch graph. The
recognition of a new face image is done by comparing its
image graph to those of all the known face images and the one
with the highest similar values are selected as closed matching.
15
Advantages:
• By focusing only on the bounded areas, they do not modify or
damage any information in the images
• This approach generates improvised recognition outcomes than
the feature-based approach.
Disadvantages:
The approach utilizes an immense unit of interaction between the
test and training images
• When there is a massive difference in the pore, scale, and
illumination, this technique doesn’t carry out productively.
• The hybrid approach is achieved by synthesizing multiple
techniques to benefit from developing prominent outputs.
Adopting numerous technologies will enable us to compensate for
the cons of one approach by the pros of another technique.
2.2.2. Neural Networks approach
There are many neural networks based approaches for face
recognition systems such as hybrid methods . They use
unsupervised learning for feature extraction and supervised
learning for detecting features . For classification convolutional
neural networks are used .
They examined that the error rates could be reduced by training
various neural networks and calculating an average of their
inputs, although it was consuming more time than the normal
method.
2.3. What is DLib ?
16
Dlib is a general purpose cross-platform open source software
library written in the C++ programming language.Dlib is a
modern C++ toolkit containing machine learning algorithms and
tools for creating complex software in C++ to solve real world
problems. It is used in both industry and academia in a wide
range of domains including robotics, embedded devices, mobile
phones, and large high performance computing environments.
Dlib's open source licensing allows you to use it in any
application, free of charge.
In particular, it now contains software components for dealing
with networking, threads, graphical interfaces, complex data
structures, linear algebra, statistical machine learning, image
processing, data mining, XML and text parsing, numerical
optimization, Bayesian networks, and numerous other tasks. In
recent years, much of the development has been focused on
creating a broad set of statistical machine learning tools.
However, dlib remains a general purpose library and welcomes
contributions of high quality software components useful in any
domain.
Core to the development philosophy of dlib is a dedication to
portability and ease of use. Therefore, all code in dlib is designed
to be as portable as possible and similarly to not require a user to
configure or install anything. To help achieve this, all platform
specific code is confined inside the API wrappers. Everything
else is either layered on top of those wrappers or is written in
pure ISO standard C++. Currently the library is known to work
on OS X, MS Windows, Linux, Solaris, the BSDs, and HP-UX.
Dlib contains many interesting application-specific algorithms
for e.g. It contains methods for facial recognition, tracking,
landmark_detection, and others. Of course, landmark detection
itself can be used to create a variety of other applications like
face morphing, emotion recognition, facial manipulation, etc
17
For our face recognition we have used python face_recognition
package which is built using dlib’s state-of-the-art face recognition,
built with deep learning. The model has an accuracy of 99.38% on
the Labeled Faces in the Wild benchmark.
2.4. Basic System Architecture :
Three basic steps are used to develop a robust face recognition system:
(1) Face detection: It detects and locates the human face.
(2) Feature extraction: It is used to extract the feature vectors of the
human face, and
(3) Face recognition : includes the features extracted from the human
face to compare it with all template face databases to decide the human
face identity.
Figure 2.4. Basic face recognition block diagram
Face Detection: It begins with the localization of the human faces in a
particular image. The purpose of this step is to determine if the input
image contains human faces or not.. In order to facilitate the design of a
18
further face recognition system and make it more robust, pre-processing
steps are performed. Many techniques are used for to detect and locate
faces such as HOG, PCA,etc
Feature Extraction: The main function of this step is to extract the
features of the face images detected in the detection step. This step
represents a face with a set of features vectors that describes the
prominent features of the face image such as mouth, nose, and eyes with
their geometry distribution. Several techniques involve extracting the
shape of the mouth, eyes, or nose to identify the face using the size and
distance and HOG is widely used to extract face features .
Face Recognition: This step considers the features extracted from the
background during the feature extraction step and compares it with
known faces stored in a specific database. There are two general
applications of face recognition, one is called identification and another
one is called verification. During the identification step, a test face is
compared with a set of faces aiming to find the most likely match.
During the identification step, a test face is compared with a known face
in the database in order to make the acceptance or rejection decision .
CNN is widely used to perform this task.
19
CHAPTER 03
METHODOLOGY
20
3.1 Algorithm Description:
3.1.1 Histogram of oriented gradients (HOG) :
The HOG is one of the best descriptors used for shape and edge
description. The HOG technique can describe the face shape using the
distribution of edge direction or light intensity gradient. The process of
this technique is done by sharing the whole face image into cells (small
region or area); a histogram of pixel edge direction or direction gradients
is generated of each cell; and, finally, the histograms of the whole cells
are combined to extract the feature of the face image.
The feature vector computation by the HOG descriptor proceeds as
follows : firstly, divide the local image into regions called cells, and then
calculate the amplitude of the first-order gradients of each cell in both
the horizontal and vertical direction. The most common method is to
21
apply a 1D mask, [–1 0 1].
where I(x, y) is the pixel value of the point (x, y) and Gx(x, y) and Gy(x,
y) denote the horizontal gradient amplitude and the vertical gradient
amplitude, respectively. The magnitude of the gradient and the
orientation of each pixel (x, y) are computed as follows:
The magnitude of the gradient and the orientation of each pixel in the
cell are voted in nine bins with the tri-linear interpolation. The
histograms of each cell are generated pixel based on direction gradients
and, finally, the histograms of the whole cells are combined to extract
the feature of the face image.
The HOG descriptor can overcome the problem of varying
illumination as it is invariant to lighting conditions. It is used
to extract magnitude of edge information and works well even
during variations in poses and illumination conditions. HOG
works well under such challenging situations as it represents
directionality of edge information thereby making it
significant for the study of pattern and structure of the
interested object.
22
3.1.2 Convolutional neural networks:
CNNs are made up of three different types of layers: convolution layers,
pooling layers, and fully-connected layers.
1. Convolutional layer: sometimes called the feature extractor layer
because features of the image are extracted within this layer.
Convolution preserves the spatial relationship between pixels by
learning image features using small squares of the input image.
The input image is convoluted by employing a set of learnable
neurons. This produces a feature map or activation map in the
output image, after which the feature maps are fed as input data to
the next convolutional layer. The convolutional layer also contains
rectified linear unit (ReLU) activation to convert all negative
values to zero. This makes it very computationally efficient, as few
neurons are activated each time.
2. Pooling layer: used to reduce dimensions, with the aim of reducing
processing times by retaining the most important information after
convolution. This layer basically reduces the number of parameters
and computation in the network, controlling overfitting by
progressively reducing the spatial size of the network. There are
two operations in this layer: average pooling and maximum
pooling:
-Average-pooling takes all the elements of the sub-matrix,
calculates their average, and stores the value in the output matrix.
-Max-pooling searches for the highest value found in the sub-
matrix and saves it in the output matrix.
3. Fully-connected layer: in this layer, the neurons have a complete
connection to all the activations from the previous layers. It connects
neurons in one layer to neurons in another layer. It is used to classify
images between different categories by training.
23
The Principle of feature Extraction in CNNs
The principle is entirely based on the types of neurons are the
neurons output function and the neurons activation function; the
model connections by convolution kernels, neurons activation
function, pooling, dropout, and neurons output function; the
learning rule of CNNs can be divided into feature extraction and
prediction, which convolution layer and pooling layer are
responsible for feature extraction, fully-connected MLP is
responsible for prediction.
These feature maps are then converted into face encodings which
would be unique for each and every face
24
3.1.3. Support vector machines (SVMs) :
The feature vectors extracted by any descriptor are classified by linear or
nonlinear SVM. The SVM classifier may realize the separation of the
classes with an optimal hyperplane. To determine the last, only the
closest points of the total learning set should be used; these points are
called support vectors .
There is an infinite number of hyperplanes capable of perfectly
separating two classes, which implies to select a hyperplane that
maximizes the minimal distance between the learning examples and the
learning hyperplane (i.e., the distance between the support vectors and
the hyperplane). This distance is called “margin”. The SVM classifier is
used to calculate the optimal hyperplane that categorizes a set of labels
training data in the correct class.
An SVM tries to find a hyperplane to distinguish the samples with the
smallest errors.
After training the network, the network learns to output vectors that are
the encodings that are closer to each other(similar) for faces of the same
person(looking similar). The above vectors now transform into:
25
The system recognises the face if the generated embedding is closer or
similar to any other embedding and gives the confirmation of the face
label
3.2. Data Sources and Data Collection
The primary problem of deep learning is how to train data, so we
need to prepare training data and mark the location and
classification of faces in each image. Because it is difficult to find
public datasets that meet our requirements, we choose to collect
data by ourselves and use pre trained weights of the
face_recognition package of python which has already been
26
trained on plenty of human faces with approximately 99%
accuracy.
Figure 3.1. The student image data we collected
We collected data from various sources of our class students,
created drive and manually labelled the data and stored them in
the directory
3.3. Step by Step flow of algorithm
27
1. Detect faces using HOG descriptor
2. Alignment and transformation of face as per facial landmarks
3. Centralize the face in the frame and crop the background
4. Compute 128-d face embeddings to quantify a face using CNN
5. Train a Support Vector Machine (SVM) on top of the embeddings
6. Recognize faces in images and video streams
7. Store the entries, mark the attendance and update the excel sheet
28
CHAPTER 04
IMPLEMENTATION
4.1 Installation
The algorithm proposed above was implemented using
python programming language in the form of a desktop
29
application
4.1.1 Software Requirements
● OS - Windows 10
● Visual Studio C++
● Programming language Python
Python version >= 3.x is required
● Need to install the below packages
Figure 4.1.1 Required packages
30
The dlib library, maintained by Davis King, contains our
implementation of “deep metric learning” which is used to
construct our face embeddings used for the actual recognition
process.
The face_recognition library, created by Adam Geitgey, wraps
around dlib’s facial recognition functionality, making it easier to
work with.
If one do not wants the package to be installed globally in the
system:
● Create a virtual environment so that the changes does not
affect your system:
python3 -m venv <env-name>
● Activate the environment: <env-name>/bin/activate
● Install the package using pip: pip3 install <path-to-folder>
So, the module installed with the pip:
pip install face_recognition and all the packages from
requirements.txt
31
4.2 Working
Face Detection and Cropping - The captured image is sent as a
list. The image is nothing but a matrix of numbers which
correspond to the pixel values. Thus, face detection performs this
task.We use the method ‘face_locations(image)’ from the
face_recognition library where the image is the captured image.
This function detects the face based on the Histogram of Oriented
Gradients or just HOG. The faces are detected and crop the area
of the image where the faces are marked.The algorithm detects all
the faces clearly visible in the captured image of the classroom.
Each student should be in an upright position and facing the
camera to avoid exclusion of their presence by the system.
Face Recognition - The following happens after faces are
cropped and stored.
1. Each image is taken and its face encodings are generated using
the face encodings method from the face_recognition library that
is in the backend neural networks generates these encodings.
2. The encoding is directly compared against all the encodings of
known images stored in a JSON file.
When an image is recognized then the names are appended to the
identified image list.
3.Store Recognized Entries whenever the algorithm finds a
32
match. Based on the list we then update the corresponding field of
the person and mark the attendance with date and time.
CHAPTER 05
SCREENSHOT OF THE CODE AND
THE RESULTS
33
5.1.CODE SCREENSHOT
34
35
36
37
5.2. OUTPUT SCREENSHOT
5.3.CONCLUSION
The model proposed is very effective as it gives quick and accurate
results. Since our model stores only the encodings of the images rather
than the images themselves, thereby reducing the space and time needed
38
to retrieve images from the database and process them each time an
image needs to be recognized. The system is also capable of batch
processing multiple images at a time. Thus, the aim of the was to
demonstrate multiple face detection and recognition is successfully
achieved by using Dlib built face_recognition Network.
39
CHAPTER 06
FUTURE SCOPE
40
● We can deploy code to the hardware like Raspberry PI for real
time attendance application
● We can also connect it with some Structured database like SQL for
maintaining the proper record of each and every students
● Connecting it with a student email address so that they too receive
a attendance confirmation
● We implemented a semi-automated attendance system using the
multiple face recognition model. This model can be further
enhanced by making it fully automatic with the available digital
identification and tracking each and thus improving the safety and
security of students in the campus
● We can also deploy into mobile devices and students can take by
themselves but at the same time restriction on IP address and GPS
so that they don’t trick the attendance system
41
CHAPTER 08
REFERENCES
● Sawhney, Shreyak, et al. "Real-time smart attendance system using face recognition
techniques." 2019 9th International Conference on Cloud Computing, Data Science &
Engineering (Confluence). IEEE, 2019.
42
● King, Davis E. "Max-margin object detection." arXiv preprint arXiv:1502.00046 (2015).
● Khan, Sikandar, Adeel Akram, and Nighat Usman. "Real Time Automatic Attendance
System for Face Recognition Using Face API and OpenCV." Wireless Personal
Communications 113, no. 1 (2020): 469-480.
● Khan, Sikandar, Adeel Akram, and Nighat Usman. "Real Time Automatic Attendance
System for Face Recognition Using Face API and OpenCV." Wireless Personal
Communications 113, no. 1 (2020): 469-480.
● Khan, Sikandar, Adeel Akram, and Nighat Usman. "Real Time Automatic Attendance
System for Face Recognition Using Face API and OpenCV." Wireless Personal
Communications 113, no. 1 (2020): 469-480.
● Mehta, Preeti, and Pankaj Tomar. "An Efficient Attendance Management System based on
Face Recognition using Matlab and Raspberry Pi 2." International Journal of Engineering
Technology Science and Research IJETSR 3.5 (2016): 71-78
● F. Schroff, D. Kalenichenko and J. Philbin, "FaceNet: A unified embedding for face
recognition and clustering," 2015 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), Boston, MA, 2015, pp. 815-823
43
44