0% found this document useful (0 votes)
99 views

New Project Report

This document discusses applying deep learning for sign language gesture recognition. It introduces sign language as a visual means of communication for deaf people and the need for a system to recognize signs and convey their meaning to others. The key challenges are segmenting objects from images, extracting features for classification, and classifying gestures so computers can understand sign language. The goal is to develop a system to accurately classify signs to help bridge communication between deaf and non-signing individuals.

Uploaded by

atul ponted
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
99 views

New Project Report

This document discusses applying deep learning for sign language gesture recognition. It introduces sign language as a visual means of communication for deaf people and the need for a system to recognize signs and convey their meaning to others. The key challenges are segmenting objects from images, extracting features for classification, and classifying gestures so computers can understand sign language. The goal is to develop a system to accurately classify signs to help bridge communication between deaf and non-signing individuals.

Uploaded by

atul ponted
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 48

Application of Deep Learning for sign Language Gesture Recognition with Efficient Hand Gesture Representation

CHAPTER 1
INTRODUCTION

Page 1
Application of Deep Learning for sign Language Gesture Recognition with Efficient Hand Gesture Representation

Speech impaired people use hand signs and gestures to communicate. Normal people face
difficulty in understanding their language. Hence there is a need of a system which recognizes
the different signs, gestures and conveys the information to the normal people. It bridges the
gap between physically challenged people and normal people.

1.1 Image Processing


Image processing is a method to perform some operations on an image, in order to get an
enhanced image or to extract some useful information from it. It is a type of signal
processing in which input is an image and output may be image or characteristics/features
associated with that image. Nowadays, image processing is among rapidly growing
technologies. It forms core research area within engineering and computer science
disciplines too.
Image processing basically includes the following three steps:
• Importing the image via image acquisition tools.
• Analysing and manipulating the image.
• Output in which result can be altered image or report that is based on image analysis.
There are two types of methods used for image processing namely, analogue and digital image
processing. Analogue image processing can be used for the hard copies like printouts and
photographs. Image analysts use various fundamentals of interpretation while using these visual
techniques. Digital image processing techniques help in manipulation of the digital images by
using computers. The three general phases that all types of data have to undergo while using
digital technique are pre- processing, enhancement, and display information extraction.

1.1.1 Digital image processing:


Digital image processing consists of the manipulation of images using digital computers. Its
use has been increasing exponentially in the last decades. Its applications range from medicine
to entertainment, passing by geological processing and remote sensing. Multimedia systems,
one of the pillars of the modern information society, rely heavily on digital image processing.
Digital image processing consists of the manipulation of those finite precision numbers. The
processing of digital images can be divided into several classes: image enhancement, image
restoration, image analysis, and image compression. In image enhancement, an image is
manipulated, mostly by heuristic techniques, so that a human viewer can extract useful
information from it.
Digital image processing is to process images by computer. Digital image processing can be
defined as subjecting a numerical representation of an object to a series of operations in order
to obtain a desired result. Digital image processing consists of the conversion of a physical

Page 2
Application of Deep Learning for sign Language Gesture Recognition with Efficient Hand Gesture Representation

image into a corresponding digital image and the extraction of significant information from the
digital image by applying various algorithms.

1.1.2 Pattern recognition:


On the basis of image processing, it is necessary to separate objects from images by pattern
recognition technology, then to identify and classify these objects through technologies
provided by statistical decision theory. Under the conditions that an image includes several
objects, the pattern recognition consists of three phases, as shown in Fig.

Fig1.1: Phases of pattern recognition


The first phase includes the image segmentation and object separation. In this phase, different
objects are detected and separate from other background. The second phase is the feature
extraction. In this phase, objects are measured. The measuring feature is to quantitatively
estimate some important features of objects, and a group of the features are combined to make
up a feature vector during feature extraction. The third phase is classification. In this phase,
the output is just a decision to determine which category every object belongs to. Therefore,
for pattern recognition, what input are images and what output are object types and structural
analysis of images. The structural analysis is a description of images in order to correctly
understand and judge for the important information of images.

1.2 Sign Language


It is a language that includes gestures made with the hands and other body parts, including
facial expressions and postures of the body.It used primarily by people who are deaf and dumb.
There are many different sign languages as, British, Indian and American sign languages.
British sign language (BSL) is not easily intelligible to users of American sign Language
(ASL) and vice versa .
A functioning signing recognition system could provide a chance for the inattentive
communicate with non-signing people without the necessity for an interpreter. It might be wont
to generate speech or text making the deaf more independent. Unfortunately there has not been
Page 3
Application of Deep Learning for sign Language Gesture Recognition with Efficient Hand Gesture Representation

any system with these capabilities thus far. during this project our aim is to develop a system
which may classify signing accurately.
American Sign Language (ASL) is a complete, natural language that has the same linguistic
properties as spoken languages, with grammar that differs from English. ASL is expressed by
movements of the hands and face. It is the primary language of many North Americans who are
deaf and hard of hearing, and is used by many hearing people as well.

1.3 Sign Language and Hand Gesture Recognition


The process of converting the signs and gestures shown by the user into text is called sign
language recognition. It bridges the communication gap between people who cannot speak and
the general public. Image processing algorithms along with neural networks is used to map the
gesture to appropriate text in the training data and hence raw images/videos are converted into
respective text that can be read and understood. Dumb people are usually deprived of normal
communication with other people in the society. It has been observed that they find it really
difficult at times to interact with normal people with their gestures, as only a very few of those
are recognized by most people. Since people with hearing impairment or deaf people cannot
talk like normal people so they have to depend on some sort of visual communication in most
of the time. Sign Language is the primary means of communication in the deaf and dumb
community. As like any other language it has also got grammar and vocabulary but uses visual
modality for exchanging information. The problem arises when dumb or deaf people try to
express themselves to other people with the help of these sign language grammars. This is
because normal people are usually unaware of these grammars. As a result it has been seen that
communication of a dumb person are only limited within his/her family or the deaf community.
The importance of sign language is emphasized by the growing public approval and funds for
international project. At this age of Technology the demand for a computer based system is
highly demanding for the dumb community. However, researchers have been attacking the
problem for quite some time now and the results are showing some promise. Interesting
technologies are being developed for speech recognition but no real commercial product for
sign recognition is actually there in the current market. The idea is to make computers to
understand human language and develop a user friendly human computer interfaces (HCI).
Making a computer understand speech, facial expressions and human gestures are some steps
towards it. Gestures are the non-verbally exchanged information. A person can perform
innumerable gestures at a time. Since human gestures are perceived through vision, it is a
subject of great interest forcomputer vision researchers. The project aims to determine human
gestures by creating an HCI. Coding of these gestures into machine language demands a
complex programming algorithm. In our project we are focusing on Image Processing and

Page 4
Application of Deep Learning for sign Language Gesture Recognition with Efficient Hand Gesture Representation

Template matching for better output generation.

1.4 Motivation
The 2011 Indian census cites roughly 1.3 million people with “hearingimpairment”. In contrast
to that numbers from India’s National Association of the Deaf estimates that 18 million people
–roughly 1 per cent of Indian population are deaf. These statistics formed the motivation for
our project. As these speech impairment and deaf people need a proper channel to
communicate with normal people there is a need for a system . Not all normal people can
understand sign language of impaired people. Our project hence is aimed at converting the sign
language gestures into text that is readable for normal people.

1.5 Problem Statement


Speech impaired people use hand signs and gestures to communicate. Normal people face
difficulty in understanding their language. Hence there is a need of a system which recognizes
the different signs, gestures and conveys the information to the normal people. It bridges the gap
between physically challenged people and normal people.

Page 5
Application of Deep Learning for sign Language Gesture Recognition with Efficient Hand Gesture Representation

CHAPTER 2
LITERATURE SURVEY

Page 6
Application of Deep Learning for sign Language Gesture Recognition with Efficient Hand Gesture Representation

In Literature survey we have gone through other similar works that are implemented in the
domain of sign language recognition.The summaries of each of the project works are mentioned
below
Sign Language Recognition (SLR) system, which is required to recognize sign
languages, has been widely studied for years.The studies are based on various input sensors,
gesture segmentation, extraction of features and classifcation methods.This paper aims to
analyze and compare the methods employed in the SLR systems, classi cations methods that
have been used, and suggests the most promising method for future research. Due to recent
advancement in classifcationmethods, many of the recent proposed works mainly contribute on
the classifcation methods, such as hybrid method and Deep Learning. This paper focuses on the
classifcation methodsused in prior Sign Language Recognition system. Based on our
review, HMM- based approaches have been explored extensively in prior research, including
its modifcations.
This study is based on various input sensors, gesture segmentation, extraction of
features and classification methods. This paper aims to analyze and compare the methods
employed in the SLR systems, classifications methods that have been used, and suggests the
most reliable method for future research. Due to recent advancement in classification methods,
many of the recently proposed works mainly contribute to the classification methods, such as
hybrid method and Deep Learning. Based on our review, HMM-based approaches have been
explored extensively in prior research, including its modifications.Hybrid CNN-HMM and
fully Deep Learning approaches have shown promising results and offer opportunities for
further exploration.
Chat applications have become a powerful mediathat assist people to communicate in different
languages witheach other. There are lots of chat applications that are useddifferent people in
different languages but there are not such achat application that has facilitate to communicate
with signlanguages. The developed system isbased on Sinhala Sign language. The system has
included fourmain components as text messages are converted to sign messages, voice
messages are converted to sign messages, signmessages are converted to text messages and
sign messages areconverted to voice messages. Google voice recognition API hasused to
develop speech character recognition for voice messages.The system has been trained for the
speech and text patterns by usingsome text parameters and signs of Sinhala Sign language
isdisplayed by emoji. Those emoji and signs that are included inthis system will bring the
normal people more close to the disabled people. This is a 2 way communication system but it
uses pattern of gesture recognition which is not very realiable in getting appropriate output.
In this paper we proposed some methods,through which the recognition of the signs becomes
easy forpeoples while communication. And the result of thosesymbols signs will be converted
Page 7
Application of Deep Learning for sign Language Gesture Recognition with Efficient Hand Gesture Representation

into the text. In this project,we are capturing hand gestures through webcam andconvert this
image into gray scale image. The segmentationof gray scale image of a hand gesture is
performed usingOtsu thresholdingalgorithm.. Total image level is dividedinto two classes one
is hand and other is background. Theoptimal threshold value is determined by computing
theratio between class variance and total class variance. Tofind the boundary of hand gesture in
image Canny edgedetection technique is used.In Canny edge detection we used edge based
segmentation and threshold based segmentation.Then Otsu’s algorithm is used because of its
simple calculation and stability.This algorithm fails, when the global distribution of the target
and background vary widely.
Computer recognition of sign language is an important research problem for enabling
communication with hearing impaired people. This project introduces an efficient and fast
algorithm for identification of the number of fingers opened in a gesture representing an
alphabet of the Binary Sign Language. The system does not require the hand to be perfectly
aligned to the camera. The project uses image processing system to identify, especially English
alphabetic sign language used by the deaf people to communicate. The basic objective of this
project is to develop a computer based intelligent system that will enable dumb people
significantly to communicate with all other people using their natural hand gestures. The idea
consisted of designing and building up an intelligent system using image processing, machine
learning and artificial intelligence concepts to take visual inputs of sign language’s hand
gestures and generate easily recognizable form of outputs. Hence the objective of this project is
to develop an intelligent system which can act as a translator between the sign language and
the spoken language dynamically and can make the communication between people with
hearing impairment and normal people both effective and efficient. The system is we are
implementing for Binary sign language but it can detect any sign language with prior image
processing
One of the major drawback of our society is the barrier that is created between disabled
or handicapped persons and the normal person. Communication is the only medium by which
we can share our thoughts or convey the message but for a person with disability (deaf and
dumb) faces difficulty in communication with normal person. For many deaf and dumb
people , sign language is the basic means of communication. Sign language recognition (SLR)
aims to interpret sign languages automatically by a computer in order to help the deaf
communicate with hearing society conveniently. Our aim is to design a system to help the
person who trained the hearing impaired to communicate with the rest of the world using sign
language or hand gesture recognition techniques. In this system, feature detection and feature
extraction of handgesture is done with the help of SURF algorithm using image processing.
All this work is done using MATLAB software. With the help of this algorithm, a person can
easily trained a deaf and dumb.
Page 8
Application of Deep Learning for sign Language Gesture Recognition with Efficient Hand Gesture Representation

Speech impairment is a disability which affects one’s ability to speak and hear. Such
individuals use sign language to communicate with other people. Although it is an effective
form of communication, there remains a challenge for people who do not understand sign
language to communicate with speech impaired people. The aim of this paper is to develop an
application which will translate sign language to English in the form of text and audio, thus
aiding communication with sign language. The application acquires image data using the
webcam of the computer, then it is preprocessed using a combinational algorithm and
recognition is done using template matching. The translation in the form of text is then
converted to audio. The database used for this system includes 6000 images of English
alphabets. We used 4800 images for training and 1200 images for testing. The system produces
88% accuracy.
This research work presents a prototype system that helps to recognize hand gesture to
normal people in order to communicate more effectively with the special people. Aforesaid
research work focuses on the problem of gesture recognition in real time that sign language
used by the community of deaf people. The problem addressed is based on Digital Image
Processing using Color Segmentation, Skin Detection, Image Segmentation, Image Filtering,
and Template Matching techniques. This system recognizes gestures of ASL (American Sign
Language) including the alphabet and a subset of its words.

2.1 Libraries
2.1.1 TensorFlow:
TensorFlow is a free and open-source software library for dataflow and differentiable
programming across a range of tasks. It is a symbolic math library, and is also used for
machine learning applications such as neural networks. It is used for both research and
production at Google.

Features: TensorFlow provides stable Python (for version 3.7 across all platforms) and C APIs;
and without API backwards compatibility guarantee: C++, Go, Java, JavaScript and Swift
(early release). Third-party packages are available for C#, Haskell Julia, MATLAB,R, Scala,
Rust, OCaml, and Crystal."New language support should be built on top of the C API.
However, not all functionality is available in C yet." Some more functionality is provided by
the Python API. Application: Among the applications for which TensorFlow is the foundation,
are automated image-captioning software, suchas DeepDream.

2.1.2 Opencv:
OpenCV (Open Source Computer Vision Library) is a library of programming

Page 9
Application of Deep Learning for sign Language Gesture Recognition with Efficient Hand Gesture Representation

functions mainly aimed at real-time computer vision.[1] Originally developed by Intel, it was
later supported by Willow Garage then Itseez (which was later acquired by Intel[2]). The
library is cross-platform and free for use under the open-source BSD license. OpenCV's
application areas include:
 2D and 3D feature toolkits
 Egomotion estimation
 Facial recognition system
 Gesture recognition
 Human–computer interaction (HCI)
 Mobile robotics
 Motion understanding
 Object identification
 Segmentation and recognition
Stereopsis stereo vision: depth perception from 2 cameras
 Structure from motion (SFM).
 Motion tracking
 Augmented reality
To support some of the above areas, OpenCV includes a statistical machine learning library
that contains:
 Boosting
 Decision tree learning
 Gradient boosting trees
 Expectation-maximization algorithm
 k-nearest neighbor algorithm
 Naive Bayes classifier
 Artificial neural networks
 Random forest
 Support vector machine (SVM)
 Deep neural networks (DNN)
AForge.NET, a computer vision library for the Common Language Runtime (.NET
Framework and Mono).
ROS (Robot Operating System). OpenCV is used as the primary vision package in ROS.
CVIPtools, a complete GUI-based computer-vision and image-processing software
environment, with C function libraries, a COM-based DLL, along with two utility programs for
algorithm development and batch processing.

Page 10
Application of Deep Learning for sign Language Gesture Recognition with Efficient Hand Gesture Representation

2.1.3 Keras:
Keras is an open-source neural-network library written in Python. It is capable of
running on top of TensorFlow, Microsoft Cognitive Toolkit, R, Theano, or PlaidML. Designed
to enable fast experimentation with deep neural networks, it focuses on being user-friendly,
modular, and extensible. It was developed as part of the research effort of project ONEIROS
(Open-ended Neuro-Electronic Intelligent Robot Operating System), and its primary author
and maintainer is François Chollet, a Google engineer. Chollet also is the author of the
XCeption deep neural network model. Features: Keras contains numerous implementations of
commonly used neural- network building blocks such as layers, objectives, activation
functions, optimizers, anda host of tools to make working with image and text data easier to
simplify the coding necessary for writing deep neural network code. The code is hosted on
GitHub, and community support forums include the GitHub issues page, and a Slack channel.
In addition to standard neural networks, Keras has support for convolutional and recurrent
neural networks. It supports other common utility layers like dropout, batch normalization, and
pooling.

Keras allows users to productize deep models on smartphones (iOS and Android), on the web,
or on the Java Virtual Machine. It also allows use of distributed training of deep-learning
models on clusters of Graphics processing units (GPU) and tensor processing units (TPU)
principally in conjunction with CUDA.
Keras applications module is used to provide pre-trained model for deep neural networks.
Keras models are used for prediction, feature extraction and fine tuning.

Pre-trained models
Trained model consists of two parts model Architecture and model Weights. Model
weights are large file so we have to download and extract the feature from ImageNet database.
Some of the popular pre-trained models are listed below,
 ResNet
 VGG16
 MobileNet
 InceptionResNetV2
 InceptionV3

2.1.4 Numpy:
NumPy (pronounced /ˈnʌmpaɪ/ (NUM-py) or sometimes /ˈnʌmpi/ (NUM-pee)) is a
Page 11
Application of Deep Learning for sign Language Gesture Recognition with Efficient Hand Gesture Representation

library for the Python programming language, adding support for large, multi- dimensional
arrays and matrices, along with a large collection of high-level mathematical functions to
operate on these arrays. The ancestor of NumPy, Numeric, was originally created by Jim
Hugunin with contributions from several other developers. In 2005, Travis Oliphant created
NumPy by incorporating features of the competing Numarray into Numeric, with extensive
modifications. NumPy is open- source software and has many contributors. Features: NumPy
targets the CPython reference implementation of Python, which is a non-optimizing bytecode
interpreter. Mathematical algorithms written for this version of Python often run much slower
than compiled equivalents. NumPy addresses the slowness problem partly by providing
multidimensional arrays and functions and operators that operate efficiently on arrays,
requiring rewriting some code, mostly inner loops using NumPy. Using NumPy in Python
gives functionality comparable to MATLAB since they are both interpreted,and they both
allow the user to write fast programs as long as most operations work on arrays or matrices
instead of scalars. In comparison, MATLAB boasts a large number of additional toolboxes,
notably Simulink, whereas NumPy is intrinsically integrated with Python, a more modern and
complete programming language. Moreover, complementary Python packages are available;
SciPy is a library that adds more MATLAB-like functionality and Matplotlib is aplotting
package that providesMATLAB-like plotting functionality. Internally, both MATLAB and
NumPy rely on BLAS and LAPACK for efficient linear algebra computations. Python bindings
of the widely used computer vision library OpenCV utilize NumPy arrays to store and operate
on data. Since images with multiple channels are simply represented as three-dimensional
arrays, indexing, slicing or masking with other arrays are very efficient ways to access specific
pixels of an image. The NumPy array as universal data structure in OpenCV for images,
extracted feature points, filter kernels and many more vastly simplifies the programming
workflow and debugging. Limitations: Inserting or appending entries to an array is not as
trivially possible as it is with Python's lists. The np.pad(...) routine to extend arrays actually
creates new arrays of the desired shape and padding values, copies the given array into the new
one and returns it. NumPy'snp.concatenate([a1,a2]) operation does not actually link the two
arrays but returns a new one, filled with the entries from both given arrays in sequence.
Reshaping the dimensionality of an array with np.reshape(...) is only possible as long as the
number of elements in the array does not change. These circumstances originate from the fact
that NumPy's arrays must be views on contiguous memory buffers. A replacement package
called Blaze attempts to overcome this limitation.
Algorithms that are not expressible as a vectorized operation will typically run slowly because
they must be implemented in "pure Python", while vectorization may increase memory
complexity of some operations from constant to linear, because temporary arrays must be
created that are as large as the inputs. Runtime compilation of numerical code has been
Page 12
Application of Deep Learning for sign Language Gesture Recognition with Efficient Hand Gesture Representation

implemented by several groups to avoid these problems; open source solutions that
interoperate with NumPy include scipy.weave, numexpr and Numba. Cython and Pythran are
static-compiling alternatives to these.

2.1.5 Neural Networks:


A neural network is a series of algorithms that endeavors to recognize underlying
relationships in a set of data through a process that mimics the way the human brain operates.
In this sense, neural networks refer to systems of neurons, either organic or artificial in nature.
Neural networks can adapt to changing input; so the network generates the best possible result
without needing to redesign the output criteria. The concept of neural networks, which has its
roots in artificial intelligence, is swiftly gaining popularity in the development of trading
systems. A neural network works similarly to the human brain’s neural network. A “neuron”
in a neural network is a mathematical function that collects and classifies information
according to a specific architecture. The network bears a strong resemblance to statistical
methods such as curve fitting and regression analysis. A neural network contains layers of
interconnected nodes. Each node is a perceptron and is similar to a multiple linear regression.
The perceptron feeds the signal produced by a multiple linear regression into an activation
function that may be nonlinear.
In a multi-layered perceptron (MLP), perceptrons are arranged in interconnected layers. The
input layer collects input patterns. The output layer has classifications or output signals to
which input patterns may map. Hidden layers fine-tune the input weightings until the neural
network’s margin of error is minimal. It is hypothesized that hidden layers extrapolate salient
features in the input data that have predictive power regarding the outputs. This describes
feature extraction, which accomplishes a utility similar to statistical techniques such as
principal component analysis.
Earlier versions of neural networks such as the first perceptrons were shallow, composed of
one input and one output layer, and at most one hidden layer in between. More than three
layers (including input and output) qualifies as “deep” learning. So deep is not just a buzzword
to make algorithms seem like they read Sartre and listen to bands you haven’t heard of yet. It is
a strictly defined term that means more than one hidden
16
layer. In deep-learning networks, each
layer of nodes trains on a distinct set of features based on the previous layer’s output. The
further you advance into the neural net, the more complex the features your nodes can
recognize, since they aggregate and recombine features from the previous layer. This is known
as feature hierarchy, and it is a hierarchy of increasing complexity and abstraction. It makes
deep-learning networks capable of handling very large, high- dimensional data sets with
billions of parameters that pass through nonlinear functions.

Page 13
Application of Deep Learning for sign Language Gesture Recognition with Efficient Hand Gesture Representation

Above all, these neural nets are capable of discovering latent structures within unlabeled,
unstructured data, which is the vast majority of data in the world. Another word for
unstructured data is raw media; i.e. pictures, texts, video and audio recordings. Therefore, one
of the problems deep learning solves best is in processing and clustering the world’s raw,
unlabeled media, discerning similarities and anomalies in data that no human has organized in
a relational database or ever put a name to.
For example, deep learning can take a million images, and cluster them according to their
similarities: cats in one corner, ice breakers in another, and in a third all the photos of your
grandmother. This is the basis of so-called smart photo albums.
Deep-learning networks perform automatic feature extraction without human intervention,
unlike most traditional machine-learning algorithms. Given that feature extraction is a task that
can take teams of data scientists years to accomplish, deep learning is a way to circumvent the
chokepoint of limited experts. It augments the powers of small data science teams, which by
their nature do not scale. When training on unlabeled data, each node layer in a deep network
learns features automatically by repeatedly trying to reconstruct the input from which it draws
its samples, attempting to minimize the difference between the network’s guesses and the
probability distribution of the input data itself. Restricted Boltzmann machines, for examples,
create so-called reconstructions in this manner. In the process, these neural networks learn to
recognize correlations between certain relevant features and optimal results – they draw
connections between feature signals and what those features represent, whether it be a full
reconstruction, or with labeled data. A deep-learning network trained on labeled data can then
be applied to unstructured data, giving it access to much more input than machine-learning
nets.

2.2 Convolution neural network:


Convolutional neural networks (CNN) is a special architecture of artificial neural
networks, proposed by Yann LeCun in 1988. CNN uses some features of the visual cortex. One
of the most popular uses of this architecture is image classification. For example Facebook uses
CNN for automatic tagging algorithms, Amazon — for generating product recommendations
and Google — for search through among users’ photos. Instead of the image, the computer sees
an array of pixels. For example, if image size is 300 x 300. In this case, the size of the array
will be 300x300x3. Where 300 is width, next 300 is height and 3 is RGB channel values. The
computer is assigned a value from 0 to 255 to each of these numbers. Тhis value describes the
intensity of the pixel at each point.
To solve this problem the computer looks for the characteristics of the baselevel. In
human understanding such characteristics are for example the trunk or large ears. For the

Page 14
Application of Deep Learning for sign Language Gesture Recognition with Efficient Hand Gesture Representation

computer, these characteristics are boundaries or curvatures. And then through the groups of
convolutional layers the computer constructs more abstract concepts.In more detail: the image
is passed through a series of convolutional, nonlinear, pooling layers and fully connected
layers, and then generates the output.
Comparing all the collected data with already existing data in the database to match a face
with a name. A similar process is followed for scene labeling as well. Analyzing Documents.
Convolutional neural networks can also be used for document analysis. This is not just useful
for handwriting analysis, but also has a major stake in recognizers. For a machine to be able to
scan an individual's writing, and then compare that to the wide database it has, it must execute
almost a million commands a minute. It is said with the use of CNNs and newer models and
algorithms, the error rate has been brought down to a minimum of 0.4% at a character level,
though it's complete testing is yet to be widely seen.

Fig 2.1: Layers involved in CNN

Page 15
Application of Deep Learning for sign Language Gesture Recognition with Efficient Hand Gesture Representation

CHAPTER 3
METHODOLOGY

Page 16
Application of Deep Learning for sign Language Gesture Recognition with Efficient Hand Gesture Representation

3.1 Existing System


Communication plays a crucial part in human life. It encourages a man to pass on his
sentiments, feelings and messages by talking, composing or by utilizing some other medium.
Gesture based communication is the main method for Communication for the discourse and
hearing weakened individuals. Communication via gestures is a dialect that utilizations
outwardly transmitted motions that consolidates hand signs and development of the hands,
arms, lip designs, body developments and outward appearances, rather than utilizing discourse
or content, to express the individual's musings. Gestures are the expressive and important body
developments that speaks to some message or data. Gestures are the requirement for hearing
and discourse hindered, they pass on their message to others just with the assistance of
motions. Gesture Recognition System is the capacity of the computer interface to catch, track
and perceive the motions and deliver the yield in light of the caught signals. It enables the
clients to interface with machines (HMI) without the any need of mechanical gadgets. There
are two sorts of sign recognition methods: image- based and sensor- based strategies. Image
based approach is utilized as a part of this project that manages communication via gestures
motions to distinguish and track the signs and change over them into the relating discourse
and content.

3.2 Proposed System


Our proposed system is sign language recognition system using convolution neural
networks which recognizes various hand gestures by capturing video and converting it into
frames. Then the hand pixels are segmented and the image it obtained and sent for comparison
to the trained model. Thus our system is more robust in getting exact text labels of letters.

Page 17
Application of Deep Learning for sign Language Gesture Recognition with Efficient Hand Gesture Representation

Fig 3.1:Architecture of Sign Language recognition System

3.2.1 Training Module


Supervised machine learning: It is one of the ways of machine learning where the
model is trained by input data and expected output data. Тo create such model, it is necessary
to go through the following phases:
1. model construction
2. model training
3. model testing

Page 18
Application of Deep Learning for sign Language Gesture Recognition with Efficient Hand Gesture Representation

4. model evaluation
Model construction: It depends on machine learning algorithms. In this projectscase, it was
neural networks.Such an agorithm looks like:
1. begin with its object: model = Sequential()
2. then consist of layers with their types: model.add(type_of_layer())
3. after adding a sufficient number of layers the model is compiled. At this moment Keras
communicates with TensorFlow for construction of the model. During model compilation it is
important to write a loss function and an optimizer algorithm. Before model training it is
important to scale data for their further use.

Model training:
After model construction it is time for model training. In this phase, the model is trained using training
data and expected output for this data. It’s look this way: model.fit(training_data, expected_output).
Progress is visible on the console when the script runs. At the end it will report the final accuracy of the
model.

Model Testing:
During this phase a second set of data is loaded. This data set has never been seen by the model and
therefore it’s true accuracy will be verified. After the model training is complete, and it is understood that
the model shows the right result, it can be saved by: model.save(“name_of_file.h5”). Finally, the saved
model can be used in the real world. The name of this phase is model evaluation. This means that the
model can be used to evaluate new data.

3.2.2 Preprocessing:
Uniform aspect ratio
Understanding aspect ratios:
An aspect ratio is a proportional relationship between an image's width and height. Essentially,
it describes an image's shape.Aspect ratios are written as a formula of width to height, like
this: For example, a square image has an aspect ratio of 1:1, since the height and width are the
same. The image could be 500px × 500px, or 1500px × 1500px, and the aspect ratio would still
be 1:1.As another example, a portrait-style image might have a ratio of 2:3. With this aspect
ratio, the height is 1.5 times longer than the width. So the image could be 500px × 750px,
1500px × 2250px, etc.

Page 19
Application of Deep Learning for sign Language Gesture Recognition with Efficient Hand Gesture Representation

Cropping to an aspect ratio


Aside from using built in site style options , you may want to manually crop an image to a
certain aspect ratio. For example, if you use product images that have same aspect ratio, they'll
all crop the same way on your site. 7
Option 1 - Crop to a pre-set shape
Use the built-in Image Editor to crop images to a specific shape. After opening the editor,
use the crop tool to choose from preset aspect ratios.
Option 2 - Custom dimensions
To crop images to a custom aspect ratio not offered by our built-in Image Editor, use a
third-party editor. Since images don’t need to have the samedimensions to have the same
aspect ratio, it’s better to crop them to a specific ratio than to try to matchtheir exact
dimensions. For best results, crop the shorter side based on the longer side.
• For instance, if your image is 1500px × 1200px, and you want an aspect ratio of 3:1, crop the
shorter side to make the image 1500px × 500px.
• Don't scale up the longer side; this can make your image blurry.
Image scaling:
In compuer graphics and digital imaging , image scaling refers to the resizing of a digital
image. In video technology, the magnification of digital
material is known as upscaling or resolution enhancement .
When scaling a vector graphic image, the graphic primitives that make up he image can be
scaled using geometric transformations, with no loss of image quality. When scaling a raster
graphics image, a new image with a higher or lower number of pixels must be generated. In the
case of decreasing the pixel number (scaling down) this usually results in avisible quality loss.
From the standpoint of digital signal processing, the scaling of raster graphics is a two-
dimensional example of sample-rate conversion, the conversion of a discrete signal from a
sampling rate (in this case the local sampling rate) to another.
3.3 Datasets Used For Training

Fig3.2 Dataset used for training the model

Page 20
Application of Deep Learning for sign Language Gesture Recognition with Efficient Hand Gesture Representation

Fig3.3:Sample pictures of training data

Fig. 3.4 Training data for letter A

Page 21
Application of Deep Learning for sign Language Gesture Recognition with Efficient Hand Gesture Representation

3.3.2 Optimizer(Adam):
Adam can be looked at as a combination ofRMSprop and Stochastic Gradient Descent with
momentum. It uses the squared gradients to scale the learning rate like RMSprop and it takes
advantage of momentum by using moving average of the gradient instead of gradient itself
like SGD with momentum. Adam is an adaptive learning rate method, which means, it
computes individual learning rates for different parameters. Its name is derived from adaptive
moment estimation, and the reason it’s called that is because Adam uses estimations of first
and second moments of gradient to adapt the learning rate for each weight of the neural
network. Now, what is moment ? N-th moment of a random variable is defined as the expected
value of that variable to the power of n. More formally

3.3.3 Loss Function(categorical cross entrophy):


Categorical crossentropy is a loss function that is used for single label categorization. This is
when only onecategory isapplicable for each data point. In other words, an example can belong
to one class only.
Note. The block before the Target block must use the activation function Softmax.

3.4 Segmentation
Image segmentation is the process of partitioning a digital image into multiple
segments(sets of pixels, also known as image objects). The goal of segmentation is to simplify
and/or change the representation of an image into something that is more meaningful and easier
to analyze. Modern image segmentation techniques are powered by deep learning technology.
Here are several deep learning architectures used for segmentation: If we take an example of
Autonomous Vehicles, they need sensory input devices like cameras, radar, and lasers to allow
the car to perceive the world around it, creating a digital map. Autonomous driving is not even
possible without object detection which itself involves image classification/segmentation.
How Image Segmentation works
Image Segmentation involves converting an image into a collection of regions of pixels that
are represented by a mask or a labeled image. By dividing an image into segments, you can
process only the important segments of the image instead of processing the entire image. A
common technique is to look for abrupt discontinuities in pixel values, which typically indicate
edges that define a region.Another common approach is to detect similarities in the regions of
an image. Some techniques that follow this approach are region growing, clustering, and

Page 22
Application of Deep Learning for sign Language Gesture Recognition with Efficient Hand Gesture Representation

thresholding. A variety of other approaches to perform image segmentation have been


developed over the years using domain-specific knowledge to effectively solve segmentation
problems in specific application areas.

3.5 Classification : Convolution Neural Network


Image classification is the process of taking an input(like a picture) and outputting its
class or probability that the input is a particular class. Neural networks are applied in the
following steps:
1) One hot encode the data: A one-hot encoding can be applied to the integer representation.
This is where the integer encoded variable is removed and a new binary variable is added for
each unique integer value.
2) Define the model: A model said in a very simplified form is nothing but a function that is
used to take in certain input, perform certain operation to its beston the given input (learning
and then predicting/classifying) and produce the suitable output.
3) Compile the model: The optimizer controls the learning rate. We will be using ‘adam’ as our
optmizer. Adam is generally a good optimizer to use for many cases. The adam optimizer
adjusts the learning rate throughout training. The learning rate determines how fast the optimal
weights for the model are calculated. A smaller learning rate may lead to more accurate
weights (up to a certain point), but the time it takes to compute the weights will be longer.
4) Train the model: Training a model simply means learning (determining) good values for all
the weights and the bias from labeled examples. In supervised learning, a machine learning
algorithm builds a model by examining many examples and attempting to find a model that
minimizes loss; this process is called empirical risk minimization.
5) Test the model
A convolutional neural network convolves learned featured with input data and uses 2D
convolution layers.

3.6 Convolution Operation:


In purely mathematical terms, convolution is a function derived from two given
functions by integration which expresses how the shape of one is modified by the other.
Convolution formula:

Page 23
Application of Deep Learning for sign Language Gesture Recognition with Efficient Hand Gesture Representation

Equation…..(3.1) …….

Here are the three elements that enter into the convolution operation:
• Input image
• Feature detector
• Feature map
Steps to apply convolution layer:
• You place it over the input image beginning from the top-left corner within the
borders you see demarcated above, and then you count the number of cells in which the
feature detector matches the input image.
• The number of matching cells is then inserted in the top-left cell of the feature map
• You then move the feature detector one cell to the right and do the same thing. This
movement is called a and since we are moving the feature detector one cell at time, that would
be called a stride of one pixel.
• What you will find in this example is that the feature detector's middle-left cell with
the number 1 inside it matches the cell that it is standing over inside the input image. That's the
only matching cell, and so you write “1” in the next cell in the feature map, and so on and so
forth.
• After you have gone through the whole first row, you can then move it over to the
next row and go through the same process.
There are several uses that we gain from deriving a feature map. These are the most important
of them: Reducing the size of the input image, and you should know that the larger your strides
(the movements across pixels), the smaller your feature map.

3.6.1 Relu Layer:


Rectified linear unit is used to scale the parameters to non negativevalues.We get pixel
values as negative values too . Inthis layer we make them as 0’s. The purpose of applying the
rectifier function is to increase the non-linearity in our images. The reason we want to do that
is that images are naturally non-linear. The rectifier serves to break up the linearity even
further in order to make up for the linearity that we might impose an image when we put it
through the convolution operation. What the rectifier function does to an image like this is

Page 24
Application of Deep Learning for sign Language Gesture Recognition with Efficient Hand Gesture Representation

remove all the black elements from it, keeping only those carrying a positive value (the grey
and white colors).The essential difference between the non-rectified version of the image and
the rectified one is the progression of colors. After we rectify the image, you will find the
colors changing more abruptly. The gradual change is no longer there. That indicates that the
linearity has been disposed of.

3.6.2 Pooling Layer:


The pooling (POOL) layer reduces the height and width of the input. It helps reduce
computation, as well as helps make feature detectors more invariant to its position in the input
This process is what provides the convolutional neural network with the “spatial variance”
capability. In addition to that, pooling serves to minimize the size of the images as well as the
number of parameters which, in turn, prevents an issue of “overfitting” from coming up.
Overfitting in a nutshell is when you create an excessively complex model in order to account
for the idiosyncracies we just mentioned The result ofusing a pooling layer and creating down
sampled or pooled feature maps is a summarized version of the features detected in the input.
They are useful as small changes in the location of the feature in the input detected by the
convolutional layer will result in a pooled feature map with the feature in the same location.
Thiscapability added by pooling is called the model’s invariance to local translation.

3.6.3 Fully Connected Layer:


The role of the artificial neural network is to take this data and combine the features
into a wider variety of attributes that make the convolutional network more capable of
classifying images, which is the whole purpose from creating a convolutional neural network.
It has neurons linked to each other ,and activates if it identifies patterns and sends signals to
output layer .the outputlayer gives output class based on weight values, For now, all you need
to know is that the loss function informs us of how accurate our network is, which we then use
in optimizing our network in order to increase its effectiveness. That requires certain things to
be altered in our network. These include the weights (the blue lines connecting the neurons,
which are basically the synapses), and the feature detector since the network often turns out to
be looking for the wrong features and has to be reviewed multiple times for the sake of
optimization.This full connection process practically works as follows:
• The neuron in the fully-connected layer detects a certain feature; say, a nose.
• It preserves its value.
• It communicates this value to the classes trained images.
Page 25
Application of Deep Learning for sign Language Gesture Recognition with Efficient Hand Gesture Representation

Page 26
Application of Deep Learning for sign Language Gesture Recognition with Efficient Hand Gesture Representation

CHAPTER 4
DESIGN

Page 27
Application of Deep Learning for sign Language Gesture Recognition with Efficient Hand Gesture Representation

4.1 Dataflow Diagram


The DFD is also known as bubble chart. It is a simple graphical formalism that can be
used to represent a system in terms of the input data to the system, various processing carried
out on these data, and the output data is generated by the system. It maps out the flow of
information for any process or system, how data is processed in terms of inputs and outputs. It
uses defined symbols like rectangles, circles and arrows to show data inputs, outputs, storage
points and the routes between each destination. They can be used to analyse an existing system
or model of a new one. A DFD can often visually “say” things that
would be hard to explain in words and they work for both technical and non- technical. There
are four components in DFD:
1. External Entity
2. Process
3. Data Flow
4. data Store

Fig 4.1:Dataflow Diagram for Sign Language Recognition

Page 28
Application of Deep Learning for sign Language Gesture Recognition with Efficient Hand Gesture Representation

4.2 Uml Diagrams


UML stands for Unified Modeling Language. Taking SRS document of analysis
as input to the design phase drawn UML diagrams. The UML is only language so is just one
part of the software development method. The UML is process independent, although
optimally it should be used in a process that should be driven, architecture-centric, iterative,
and incremental. The UML is language for visualizing, specifying, constructing, documenting
the articles in a software- intensive system. It is based on diagrammatic representations of
software components.
A modeling language is a language whose vocabulary and rules focus on the conceptual
and physical representation of the system. A modeling language such as the UML is thus a
standard language for software blueprints.
The UML is a graphical language, which consists of all interesting systems. There are also
different structures that can transcend what can be represented in a programming language.
These are different diagrams in UML.

4.2.1 Sequence Diagram


Sequence diagram displays the time sequence of the objects participating in the
interaction. This consists of the vertical dimension(time) and horizontal dimension (different
objects).
Objects: Object can be viewed as an entity at a particular point in time with specific value and
as a holder of identity.
A sequence diagram shows object interactions arranged in time sequence. It depicts the
objects and classes involved in the scenario and the sequence of messages exchanged between
the objects needed to carry out the functionality of the scenario. Sequence diagrams are
typically associated with use case realizations in the Logical View of the system under
development. Sequence diagrams are sometimes called event diagrams or event scenarios.
A sequence diagram shows, as parallel vertical lines (lifelines), different processes or objects
that live simultaneously, and, as horizontal arrows, the messages exchanged between them, in
the order in which they occur. This allows the specification of simple runtime scenarios in a
graphical manner.
If the lifeline is that of an object, it demonstrates a role. Leaving the instance name blank can
represent anonymous and unnamed instances.
Messages, written with horizontal arrows with the message name written above them, display
interaction. Solid arrow heads represent synchronous calls, open arrow heads represent
asynchronous messages, and dashed lines represent reply messages. If a caller sends a
Page 29
Application of Deep Learning for sign Language Gesture Recognition with Efficient Hand Gesture Representation

synchronous message, it must wait until the message is done, such as invoking a subroutine. If
a caller sends an asynchronous message, it can continue processing and doesn’t have to wait
for a response. Asynchronous calls are present in multithreaded applications, event-driven
applications and in message-oriented middleware. Activation boxes, or method-call boxes, are
opaque rectangles drawn on top of lifelines to represent that processes are being performed in
response to the message (ExecutionSpecifications in UML).
Objects calling methods on themselves use messages and add new activation boxes on top of
any others to indicate a further level of processing. If an object is destroyed (removed from
memory), an X is drawn on bottom of the lifeline, and the dashed line ceases to be drawn
below it. It should be the result of a message, either from the object itself, or another. A
message sent from outside the diagram can be represented by a message originating from a
filled-in circle (found message in UML) or from a border of the sequence diagram (gate in
UML).
UML has introduced significant improvements to the capabilities of sequence diagrams. Most
of these improvements are based on the idea of interaction fragmentswhich represent smaller
pieces of an enclosing interaction. Multiple interaction fragments are combined to create a
variety of combined fragments, which are then used to model interactions that include
parallelism, conditional branches, optional interactions

Page 30
Application of Deep Learning for sign Language Gesture Recognition with Efficient Hand Gesture Representation

Fig 4.2: Sequence diagram of sign language recognition system

Page 31
Application of Deep Learning for sign Language Gesture Recognition with Efficient Hand Gesture Representation

CHAPTER 5
IMPLEMENTATION AND RESULTS

Page 32
Application of Deep Learning for sign Language Gesture Recognition with Efficient Hand Gesture Representation

5.1 Screenshot

Fig. 5.1 Editor Window for IDE


Visual Studio Code is a code editor redefined and optimized for building and debugging modern web and cloud applications can be redefined
code editor, optimized for building and debugging modern web and cloud applications.

Page 33
Application of Deep Learning for sign Language Gesture Recognition with Efficient Hand Gesture Representation

Fig. 5.2 Recognition of hand gesture as Thumbs UP


The thumb position was pointing vertically up and can be recognized as thumbs up gesture

Page 34
Application of Deep Learning for sign Language Gesture Recognition with Efficient Hand Gesture Representation

Fig. 5.3 Recognition of hand gesture as Thumbs Down


Above gesture in the colour segmented image should be recognized as one object before it can be interpreted and can be recognized as thumbs down
gesture

Page 35
Application of Deep Learning for sign Language Gesture Recognition with Efficient Hand Gesture Representation

Fig. 5.4 Recognition of hand gesture as Call Me


The gesture in the colour segmented image should be recognized as one object before it can be interpreted as calling function

Page 36
Application of Deep Learning for sign Language Gesture Recognition with Efficient Hand Gesture Representation

Fig. 5.5 Recognition of hand gesture as Fist


Two wrist points are the two ending points of the wrist line across the bottom of the hand. The wrist points are important points for hand gesture
recognition.

Page 37
Application of Deep Learning for sign Language Gesture Recognition with Efficient Hand Gesture Representation

Fig. 5.6 Recognition of hand gesture as Live Long


The back portion of the palm with two finger pointing was recognized as live long as far as the data stored in the database

Page 38
Application of Deep Learning for sign Language Gesture Recognition with Efficient Hand Gesture Representation

Fig. 5.7 Recognition of hand gesture as Peace


The front portion of the palm pointing two fingers in V position recognized as peace in the system.

Page 39
Application of Deep Learning for sign Language Gesture Recognition with Efficient Hand Gesture Representation

Fig. 5.8 Recognition of hand gesture as Smile


The Finger index finger and thumb point opposite to each other was recognized as the smile gesture

Page 40
Application of Deep Learning for sign Language Gesture Recognition with Efficient Hand Gesture Representation

Fig. 5.9 Recognition of hand gesture as Stop


The output of the hand detection is a binary image in which the white pixels are the members of the hand region, while the black pixels belong to the
background. 

Page 41
Application of Deep Learning for sign Language Gesture Recognition with Efficient Hand Gesture Representation

CHAPTER 6
COMPARISON BETWEEN PREVIOUS SYSTEM AND PROPOSED
SYSTEM

Page 42
Application of Deep Learning for sign Language Gesture Recognition with Efficient Hand Gesture Representation

Table 6.1 Comparison with previous and proposed system


Sr.No. Previous System Proposed System
1 Accuracy is less in previous Prediction system makes accuracy
ratio high
2 Algorithm was not properly synchronized in Model creation and comparison
the previous system help to recognized the gesture
properly
3 Dataset recording is not possible Due to use of python libraries
such as pandas it become easy to
labeled the data and creation of
model
4 Due to less point identification parameters it Point by point the fingers can be
become difficult to recognized the point identified and can be notify
clearly properly
5 Data entry for gesture recognized increases It is easy due to sklearn and cv2
complexity packages

Table 6.2 Parameter comparison chart values


Previous(Accuracy %) Proposed (Accuracy %)
Recognition 47 79

Point assembling 59 81

Data retrieving 56 83
Trained
65 91
parameters
Model creation 55 85

Page 43
Application of Deep Learning for sign Language Gesture Recognition with Efficient Hand Gesture Representation

160
140
120
100
80
60
40 Proposed (Accuracy %)
Previous(Accuracy %)
20
0
n
lin
g
in
g rs on
itio b e v ete ati
gn m r i m e
co se et ra cr
Re as tar pa del
t o
in Da ed M
Po r ain
T

Fig. 6.1 Graphical representation of the parameter values

CHAPTER 7
CONCLUSION AND FUTURE SCOPE

Page 44
Application of Deep Learning for sign Language Gesture Recognition with Efficient Hand Gesture Representation

7.1 Conclusion
Nowadays, applications need several kinds of images as sources of information for elucidation
and analysis. Several features are to be extracted so as to perform various applications. When
an image is transformed from one form to another such as digitizing, scanning, and
communicating, storing, etc. degradation occurs. Therefore the output image has to
undertake a process called image enhancement, which contains of a group of methods that
seek to develop the visual presence of an image. Image enhancement is fundamentally
enlightening the interpretability or awareness of information in images for human listeners and
providing better input for other automatic image processing systems. Image then undergoes
feature extraction using various methods to make the image more readable by the
computer.Sign language recognition system is a powerful tool to preparae an expert
knowledge, edge detect and the combination of inaccurate information from different
sources. the intend of convolution neural network is to get the appropriate classification.

7.2 Future work


The proposed sign language recognition system used to recognize sign language letters
can be further extended to recognize gestures facial expressions. Instead of displaying letter
labels it will be more appropriate to display sentences as more appropriate translation of
language.This also increases readability.The scope of different sign languages can be
increased. More training data can be added to detect the letter with more accuracy. This
project can further be extended to convert the signs to speech.

Page 45
Application of Deep Learning for sign Language Gesture Recognition with Efficient Hand Gesture Representation

REFERENCES
[1] Ahmed, Mohamed Aktham, et al. ”A review on systems-based sensory gloves for sign
language recognition state of the art between 2007 and 2017.” Sensors 18.7 (2018).
[2] Han, Rui, et al. ”A Data Glove-based KEM Dynamic Gesture Recognition Algorithm.”
International Journal of Performability Engineering 14.11 (2018).
[3] Ronchetti, Franco, Facundo Quiroga, César Armando Estrebou, and Laura Cristina Lanzarini.
"Handshape recognition for argentinian sign language using probsom." Journal of Computer
Science & Technology 16 (2016).
[4] Abhishek, Kalpattu S., Lee Chun Fai Qubeley, and Derek Ho. ”Glovebased hand gesture
recognition sign language translator using capacitive touch sensor.” Electron Devices and
Solid-State Circuits (EDSSC), 2016 IEEE International Conference on. IEEE, 2016.
[5] Ronchetti, Franco, Facundo Quiroga, César Armando Estrebou, Laura Cristina Lanzarini, and
Alejandro Rosete. "LSA64: An Argentinian Sign Language Dataset." In XXII Congreso
Argentino de Ciencias de la Computación (CACIC 2016). 2016.
[6] Das, Abhinandan, et al. ”Smart glove for Sign Language communications.” Accessibility to
Digital World (ICADW), 2016 International Conference on. IEEE, 2016.
[7] Abadi, Martín, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro,
Greg S. Corrado et al. "Tensorflow: Largescale machine learning on heterogeneous
distributed systems." arXiv preprint arXiv:1603.04467 (2016).
[8] Lokhande, Priyanka, Riya Prajapati, and Sandeep Pansare. ”Data gloves for sign language
recognition system.” International Journal of Computer Applications (2015): 11-14.
[9] Singha, Joyeeta, and Karen Das. "Automatic Indian Sign Language Recognition for

Page 46
Application of Deep Learning for sign Language Gesture Recognition with Efficient Hand Gesture Representation

Continuous Video Sequence." ADBU Journal of Engineering Technology 2, no. 1 (2015).


[10] Tripathi, Kumud, and Neha Baranwal GC Nandi. "Continuous Indian Sign Language Gesture
Recognition and Sentence Formation." Procedia Computer Science 54 (2015): 523531.
[11] Kingma, Diederik, and Jimmy Ba. "Adam: A method for stochastic optimization." arXiv
preprint arXiv:1412.6980 (2014).
[12] Chouhan, Tushar, et al. ”Smart glove with gesture recognition ability for the hearing and
speech impaired.” 2014 IEEE Global Humanitarian Technology Conference-South Asia
Satellite (GHTC-SAS),2014.
[13] Tavari, Neha V., A. V. Deorankar, and P. N. Chatur. ”Hand gesture recognition of indian sign
language to aid physically impaired people.” International Journal of Engineering Research
and Applications (2014): 60-66.
[14] Zhang, Chenyang, Xiaodong Yang, and YingLi Tian. "Histogram of 3D facets: A
characteristic descriptor for hand gesture recognition." In Automatic Face and Gesture
Recognition (FG), 2013 10th IEEE International Conference and Workshops on, pp. 18.
IEEE, 2013.
[15] Cooper, Helen, EngJon Ong, Nicolas Pugeault, and Richard Bowden. "Sign language
recognition using subunits." Journal of Machine Learning Research 13, no. Jul (2012): 2205-
2231.
[16] Cabrera, Maria Eugenia, Juan Manuel Bogado, Leonardo Fermin, Raul Acuna, and Dimitar
Ralev. ”Glove-based gesture recognition system.” In Adaptive Mobile Robotics, pp. 747-753.
2012.
[17] Cooper, Helen, Brian Holt, and Richard Bowden. "Sign language recognition." In Visual
Analysis of Humans, pp. 539562. Springer London, 2011.
[18] Nandy, Anup, Jay Shankar Prasad, Soumik Mondal, Pavan Chakraborty, and Gora Chand
Nandi. "Recognition of isolated indian sign language gesture in real time." Information
Processing and Management (2010): 102107.
[19] Ahmed, Syed Faiz, Syed Muhammad Baber Ali, and Sh Saqib Munawwar Qureshi.
”Electronic speaking glove for speechless patients, a tongue to a dumb.” Sustainable
Utilization and Development in Engineering and Technology (STUDENT), 2010 IEEE
Conference on. IEEE, 2010.
[20] Hahnloser, Richard HR, Rahul Sarpeshkar, Misha A. Mahowald, Rodney J. Douglas, and H.
Sebastian Seung. "Digital selection and analogue amplification coexist in a cortexinspired
silicon circuit." Nature 405, no. 6789 (2000): 947951.12 Bottou, Léon. "Largescale machine
learning with stochastic gradient descent." In Proceedings of COMPSTAT'2010, pp. 177186.
[21] Hochreiter, Sepp, and Jürgen Schmidhuber. "Long shortterm memory." Neural computation

Page 47
Application of Deep Learning for sign Language Gesture Recognition with Efficient Hand Gesture Representation

9, no. 8 (1997): 17351780.


[22] Bengio, Yoshua, Patrice Simard, and Paolo Frasconi. "Learning longterm dependencies with
gradient descent is difficult." IEEE transactions on neural networks 5, no. 2 (1994): 157166.

Page 48

You might also like