0% found this document useful (0 votes)
9 views6 pages

ResearchPaperours (2)

Uploaded by

ujjwalbansal780
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views6 pages

ResearchPaperours (2)

Uploaded by

ujjwalbansal780
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

A Research-Reviewed, Data-Driven Insight into

Hands Sign Language and Gesture Recognisation


Ayush Yadav1, Ayush Kaushal2,Ujjwal Bansal3
1,2,3
Department of Computer Science & Engineering
*
Galgotia’s College of Engineering & Technology, Greater Noida, India
[email protected] 2 [email protected] 3 [email protected]

Abstract— One of the non-verbal communication techniques languages. Gesture recognition is the developing fields of
utilized in sign language is the hand gesture. People with research. Being a significant part in nonverbal communication
speech or hearing impairments use it most frequently to hand gestures are playing key role in our daily life. By
converse with one another and with non-disabled persons. considering in mind the similarities of human hand shape with
Several makers around the world have created various sign
four fingers and one thumb, the software aims to present a real
language systems, but they are neither adaptable nor cost-
effective for end users. So, the "Hand sign and gesture time system for recognition of hand gesture and signs on basis
recognition system software” presented in this proposal of detection of some shape-based features like orientation,
offers a system prototype that can automatically Centre of mass centroid, fingers status, thumb in positions of
understand sign language to aid deaf and dumb individuals raised or folded fingers of hand, although these feature
in communicating with each other and other people more extraction part will be totally resolved by Convolutional Neural
efficiently. In platform like "YouTube" videos, where there Network (CNN). In this we will capture every video frame and
is currently no feature for auto-text generation on the each frame will be used to locate hands to crop it from frame,
fundamentals of gestures and sign languages, this technique
which will work as input for out CNN.
can also be employed. The software aims to present a real-
time system for hand gesture and sign recognition on the
One of the key challenges in hand and sign gesture detection is
basis of detection of some shape-based features like
orientation, Center of Mass centroid, fingers status, and dealing with the variability in hand shapes and orientations. To
thumb in positions of raised or folded fingers of hand, address this challenge, researchers have developed a range of
although these feature extraction part will be completely CNN architectures that are designed to be robust to variations
resolved by Convolutional Neural Network (CNN). We in hand pose and appearance. TheCNN is trained to learn
used CNNs for training and testing, and as a result, our test discriminativefeatures from these images For example, some
accuracy was 99.89%. Our model's independence from architectures incorporate recurrent neural networks (RNNs) to
external hardware or devices is one of its benefits.
capture the temporal dynamics of hand movements, while
others use attention mechanisms to focus on specific parts of
Keywords- Convolutional Neural Network(CNN), Haar
Cascade, Orientation, Regien Of Interest (ROI), Auto-text the hand.
Generation.
Another important consideration in hand and sign gesture
1. INTRODUCTION detection is real-time performance. CNNs can be
computationally intensive, and may require significant
Hand gesture is one of the methods used in sign language for processing power to run in real-time on a mobile or embedded
non-verbal communication. It is most commonly used by device. To address this issue, researchers have explored various
people who have hearing or speech problems to communicate techniques such as model compression, pruning, and
among themselves or with normal people. Various sign quantization to reduce the computational requirements of
language systems have been developed by many makers around CNNs.
the world but they are neither flexible nor cost-effective for the
end users. Hence in this proposed “Hand sign and gesture 2. Methodology
recognition system software” which presents a system
prototype that is able to automatically recognize sign language Raw video footage will act as a input for our software input.
to help deaf and dumb people to communicate more effectively Raw video footage is now broken down to frames. Then these
with each other or normal people. This system can also be used frames will be then sent to Haar_Cascade model to filter out the
in platforms like “YouTube” videos where there is currently no Region of Interest (ROI), in our case it is hands. Then these
feature for auto-text generation on the basic of gestures and sign ROI are now cropped out of frames and sent to CNN model
which will classify these images. In traditional system a
different model is used to extract region of interest, but in our
system, we implemented the use of Haar_Cascade algorithm to
extract it. The cascaded image is then sent to CNN model where
the given image is classified. Our project deals with the
OpenCV python module to record the live footage for from a
device camera and it does all processing without any artificial
devices for this project to run. This is what we call a video-
based project, thus increasing scalability. Image is taken frame
by frame from video input and a section of it is cropped out,
which will be then passed to next stage. In this stage image is Fig 4. Real Hand Figure[4] Fig 5.Real Hand Figure
then turned to grey scale thus reducing processing power of the
system which in turn boosts latency time of our output.This with keypoints[4]
greyscale image is now then sent to next stage for feature
outlining which is then sent to the model.This feature outlining
List of Steps along with description
is done by Gaussian blur. The image after applying gaussian
blur looks like below.
1.Data Acquisition
Data acquisition is the process of sampling signals that
measure real world physical phenomena and converting
them into a digital form that can be manipulated by a
computer and software.

2. Pre-Processing the Image


A preliminary processing of data in order to prepare it
for the primary processing or for further analysis. The
term can be applied to any first or preparatory
processing stage when there are several steps required to
prepare data for the user.

3. Feature Extraction
Feature extraction plays a crucial role in sign language
Figure1: This is image pre-processing for recognition, enabling accurate interpretation of manual
easier recognition and feature outlining. a) gestures and movements used in sign languages.
actual image b) Region of Interest [5]
4. Classification
We’ll first use Media Pipe to recognize the hand and the hand Classification involves the identification and
key points. Media Pipe returns a total of 21 key points for each categorization of different sign language gestures.
detected hand.
5. Gesture Recognition
Gesture recognition facilitates effective communication
between individuals who are deaf or hard of hearing and
the hearing community.

6. Predicting the gestures


Predicting gestures in sign language recognition is a
challenging task that has garnered significant attention
in recent years. Sign language is a visual-spatial
language used by individuals with hearing impairments
to communicate.
Fig. 3 Different Key Points over the Hands [6]
Techniques And Software Requirements

This study uses a variety of computer vision techniques to


analyze and recognize hand signs and gestures. Unique
advantages for real-time recognition, classification, and
tracking are offered by the methodologies used. There is an
overview of the key approaches utilized.

Fig4: American Sign Language[8] Fig5: Feature Extraction[4] I. A. Haar Cascade Classifiers

Haar cascades are a popular object detection technique used for


recognizing hands and gestures in real-time. The method uses
Haar-like features to train classifiers on positive and negative
samples. The detection process involves sliding a detection
window over an image where the cascade structure improves
computational efficiency by focusing on promising areas of
interest. Haar cascades can be used for rapid recognition tasks,
but are sensitive to lighting and occlusion.

II B. Mediapipe Hand Tracking

A solution for real-time hand detection and gesture tracking can


be found in MediaPipe. There are 21 key landmarks on the hand
and it uses a combination of machine learning models and
computer vision techniques to locate them. There are palm
detection, hand landmark localization, and gesture
classification in MediaPipe's pipeline. It's real-time efficiency
and cross- platform compatibility make it an ideal choice for
applications in augmented reality, sign language interpretation,
and gaming.

III C. Support Vector Machines (SVMs)

Support vector machines (SVMs), also referred to as support


vector networks, are supervised learning models employed for
classification and regression tasks. SVMs utilize a max-margin
approach and implement learning algorithms to analyze data.
They are highly efficient for both linear and non-linear
classification. The kernel trick is a notable feature of SVMs,
which facilitates non-linear classification by transforming data
into a higher-dimensional feature space through pairwise
similarity comparisons. This transformation enables linear
classification in higher dimensions, making SVMs a versatile
choice for complex datasets.

V E. Tensor-Flow and OpenCV

TensorFlow is a prominent framework in the realm of deep


learning, widely used for developing models that recognize
gestures. Its proficiency in constructing and training deep
Fig 5: Data Flow Diagram [5] neural networks allows for effective classification of hand
gestures. The framework offers support for pre-trained models
and transfer learning, which can significantly speed up the backgrounds or multiple moving objects, underscoring the need
creation of gesture recognition systems. Various model for robust preprocessing and adaptive algorithms.
architectures, including Convolutional Neural Networks
(CNNs), Recurrent Neural Networks (RNNs), or a combination By 2018, gesture-controlled interfaces became more
of both, can be utilized for recognizing both static and dynamic prominent, utilizing template matching and image processing
gestures. TensorFlow's adaptability and support for GPU techniques in OpenCV to create interactive systems for
acceleration facilitate high accuracy and scalability when controlling devices. This technology reached an accuracy of
working with extensive datasets, although training these 80% and was appreciated for its simplicity and cost-
models may require considerable computational power. effectiveness. By comparing input images to predefined
templates, the system could interpret user commands.
Nonetheless, it encountered difficulties with overlapping
By leveraging these methodologies, this study aims to achieve
gestures and subtle variations in representation, limiting its
accurate predictions and meaningful insights, capitalizing on
the strengths of each technique in its specific domain of versatility. Key technical terms included template matching,
application. edge detection, and feature comparison, making this method
most suitable for applications requiring a limited set of
gestures.

3. COMPARATIVE ANALYSIS Depth-based gesture recognition systems, introduced in 2017,


represented a significant advancement in the recognition of 3D
This research explored the implementation of contour detection gestures. These systems employed depth cameras in
and convex hull algorithms within OpenCV for identifying conjunction with OpenCV to assess the spatial positioning of
static hand gestures. Originating from a study conducted in hands and fingers, achieving an impressive accuracy of 90%.
2012, this method focused on outlining the hand and This made them the most reliable solution for complex gestures
pinpointing essential features, such as fingertips and convex in dynamic backgrounds. Technical components such as depth
points, to distinguish gestures like a thumbs-up or peace sign. maps, point cloud processing, and stereo vision were
With an accuracy rate of 85%, this technique garnered interest fundamental to these systems. Despite their high accuracy,
due to its computational efficiency. Key technical terms challenges remained, including the expense of depth-sensing
included contour detection, convex hull, fingertip detection, hardware and the computational load of processing 3D data,
and ROI (Region of Interest). However, its effectiveness was which hindered accessibility for low-budget or real-time
significantly influenced by lighting conditions, and overlapping applications in resource-limited environments.
hand shapes often led to misclassifications, making it better
suited for controlled environments. In the healthcare sector, gesture recognition systems aimed at
physiotherapy and rehabilitation emerged in 2021. These
In 2015, researchers in motion analysis pioneered dynamic systems sought to monitor patient exercises and ensure proper
gesture recognition by employing OpenCV’s optical flow movement patterns by utilizing OpenCV’s motion tracking and
techniques to track movement across video frames. This segmentation capabilities. With an accuracy of 82%, they
method was designed to interpret fluid gestures, such as waving proved particularly effective in evaluating and guiding
or pointing, by examining the vector field of pixel movements rehabilitation processes. Key technical terms included motion
between frames. This approach achieved an accuracy of 87%, trajectory analysis, posture detection, and exercise correction
excelling in tracking continuous motions. Important technical algorithms. However, the performance of these systems was
concepts included optical flow, motion vectors, and frame-by- highly dependent on the environment, with inconsistent
frame analysis. Despite its success, the technique faced lighting and background noise significantly impacting their
difficulties in managing rapid movements and required stable effectiveness. Their implementation in clinical settings
lighting for precise detection. Additionally, its high necessitated careful calibration and setup.
computational demands posed challenges for real-time
applications on devices with limited resources. In 2014, Haar cascade classifiers were introduced as a
lightweight and accessible method for detecting predefined
A significant advancement in gesture recognition came in 2016 gestures. Utilizing OpenCV’s pre-trained Haar features, the
with the development of sign language recognition systems, system could recognize gestures such as thumbs-up or peace
particularly focusing on American Sign Language (ASL). signs with an accuracy of 84%. This approach relied on integral
Researchers implemented Gaussian blur for noise reduction, images and cascading classifiers for efficient real-time gesture
thresholding for segmentation, and contour-based methods for detection. While it was effective for simple, static gestures, it
feature extraction. This approach aimed to recognize both static struggled with dynamic or complex gestures and required
and sequential hand gestures, achieving an accuracy of 85%. controlled conditions for optimal performance. Key technical
Key technical terms included hand segmentation, feature terms included Haar-like features, integral image computation,
extraction, and classification. Although the system performed and sliding windows. Despite its limitations, the method’s low
well in controlled environments, its reliability diminished in computational demands made it suitable for lightweight
noisy or complex settings, such as those with fluctuating applications.
Collectively, these studies illustrate the progression of hand applications from interactive systems to healthcare and
sign and gesture recognition systems using OpenCV. Each advanced 3D modeling. However, they also revealed the trade-
method addressed specific challenges and made significant offs between accuracy, computational complexity, and
contributions to the field, showcasing a diverse range of adaptability to real-world scenarios.

ASPECT [1] [2] [3] [4] [5] [6] [7]


Contour Template Depth
Optical Flow, Gaussian Blur, Haar Cascade
MAIN Detection, Matching, Cameras, Motion Tracking,
Motion Vector Thresholding, Classifiers,
METHODOLOGY Convex Hull Image Point Cloud Segmentation
Analysis Feature Extraction Integral Images
Algorithms Processing Processing
BEST
Accuracy: Accuracy: Accuracy: Accuracy:
PERFORMING Accuracy: 85% Accuracy: 82% Accuracy: 84%
85% 87% 80% 90%
MODEL
Tracks motion Lightweight,
Uses contour Allows user Recognizes Assesses
vectors across Identifies ASL real-time
UNIQUE detection for interaction 3D gestures rehabilitation
frames for gestures using detection with
FEATURES hand shape with template- through depth exercises through
dynamic feature extraction Haar-like
recognition based systems sensing gesture tracking
gestures features
Hand gesture Gesture-based Gesture-based Interactive Gesture Real-time hand
Sign language
recognition for flood control systems using recognition for gesture
APPLICATION interpretation for
virtual reality forecasting systems for 3D motion healthcare and recognition in
communication
and gaming systems devices tracking rehabilitation security systems

and region of interest (ROI) segmentation, minimizing reliance


4. CONCLUSION on manual interventions and boosting efficiency.

The creation and execution of a Hand Sign and Gesture Real-Time Performance: Designed for minimal latency, the
Recognition System that employs sophisticated deep learning system guarantees immediate gesture recognition, making it
methodologies has revealed considerable promise in the realm ideal for both interactive and assistive applications.
of human-computer interaction and assistive technologies. This
system utilizes Convolutional Neural Networks (CNNs) along Flexibility and Scalability: The architecture is adaptable and
with effective preprocessing techniques to attain a high level of can accommodate a broad spectrum of gesture types, making it
accuracy in identifying both static and dynamic gestures. The applicable for various uses, including sign language
findings highlight the system’s capability to accurately classify interpretation and gesture-controlled systems.
a wide array of gestures, encompassing more than 3000 distinct
variations, with exceptional precision and dependability. This system is also cost-efficient, requiring only standard
imaging devices for data collection and minimal specialized
The primary contributions of this study are as follows: hardware. Its capacity for integration with existing frameworks
and compatibility across platforms facilitates smooth
Improved Recognition Precision: The CNN-driven model deployment in diverse settings. By merging accuracy,
has outperformed traditional approaches, greatly enhancing the scalability, and user-friendliness, this gesture recognition
classification of intricate gestures. system signifies a major advancement in gesture-based
interaction technologies, improving accessibility and
Automated Feature Extraction: The system streamlines communication for a wide range of users
essential processes such as histogram-based feature extraction
5. FUTURE GOAL

This system is applicable on platforms such as YouTube and


Netflix, where automatic text generation based on gestures
and sign language is not currently available. Additionally, it
can be integrated into video conferencing tools to enhance
communication. The technology can also be utilized in smart
devices, allowing users to operate them through gestures
instead of voice commands, which is particularly beneficial https://www.kaggle.com/datasets/aryarishabh/hand-gesture-
for individuals with speech impairments. In summary, the recognition-dataset
proposed "Hand Sign and Gesture Recognition System" has
potential applications for both community benefit and [10] D. L. Huang, X. H. Chen, and W. J. Zhang (2017)
commercial purposes. "Real-Time American Sign Language Recognition Using
Convolutional Neural Networks," in 2017 IEEE International
REFERENCES Conference on Image Processing (ICIP), Beijing, China, 2017,
pp. 3645-3649, doi: 10.1109/ICIP.2017.8296807
[1.]M. Z. Islam, M. S. Hossain, R. ul Islam, and K.
Andersson (2019) "Static Hand Gesture Recognition using
Convolutional Neural Network with Data Augmentation," Joint
8th International Conference on Informatics, Electronics &
Vision (ICIEV) and 2019 3rd International Conference on
Imaging, Vision & Pattern Recognition (icIVPR), Spokane,
WA, USA, 2019, pp. 324-329, doi:
10.1109/ICIEV.2019.8858563.
[2.]R. Patel, J. Dhakad, K. Desai, T. Gupta, and S. Correia
(2018) "Hand Gesture Recognition System using
Convolutional Neural Networks," 4th International Conference
on Computing Communication and Automation (ICCCA),
Greater Noida, India, 2018, pp. 1-6, doi:
10.1109/CCAA.2018.8777621.
[3.]S. Saeedi, S. Rezayi, H. Keshavarz, and S. R. Niakan
Kalhori (2023) "MRI-based brain tumor detection using
convolutional deep learning methods and chosen machine
learning techniques," BMC Medical Informatics and Decision
Making, vol. 23, no. 16, pp. 1–17, Jan. 2023.
[4.]A. Sivaramakrishnan and M. Karnan (2013) "A Novel
Based Approach For Extraction Of Brain Tumor In MRI
Images Using Soft Computing Techniques," International
Journal Of Advanced Research In Computer And
Communication Engineering, vol. 2, issue 4, April 2013.
[5.]Asra Aslam, Ekram Khan, M.M. Sufyan Beg (2015)
"Improved Edge Detection Algorithm for Brain Tumor
Segmentation," Procedia Computer Science, vol. 58, pp. 430-
437, ISSN 1877-0509.
[6.]B. Sathya and R. Manavalan (2011) "Image Segmentation
by Clustering Methods: Performance Analysis," International
Journal of Computer Applications (0975 – 8887) vol. 29, no.
11, September 2011.
[7.]M. A. Khaled, A. M. Abid, S. A. Qadir (2017) "Hand
Gesture Recognition Using Histogram Back-Projection and
Region of Interest (ROI) Segmentation," Procedia Computer
Science, vol. 104, pp. 265-272, doi:
10.1016/j.procs.2017.11.054.
[8.]M. S. Boulos, H. A. Ahmed (2019) "Real-Time Gesture
Recognition System Using Convolutional Neural Networks for
Interactive Applications," International Journal of
Computational Vision and Robotics, vol. 9, no. 1, pp. 40-53,
doi: 10.1504/IJCVR.2019.099123.

[9.]aryarishabh, “Hand Gesture Recognisation Dataset“,


Kaggle,n.d

You might also like