0% found this document useful (0 votes)
34 views

Mini project report_merged 1

Uploaded by

likithgn17
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views

Mini project report_merged 1

Uploaded by

likithgn17
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

VISVESVARAYA TECHNOLOGICAL UNIVERSITY

“JNANA SANGAMA”, Belagavi-590018, Karnataka.

Submitted in partial fulfilment as a requirement of VI semester for Bachelor of Engineering


in

Artificial Intelligence and Data Science


A
Mini-Project Report
On

“VIRTUAL MOUSE”
Submitted by:

LIKHITH GN 4YG21AD025
CHIRANTH HL 4YG21AD010
GK KISHORE URS 4YG22AD018
NIKHIL HK 4YG22AD034

Under the guidance of


Mrs. Mitha D B.E., MTech.,
Assistant Professor
Department of Artificial Intelligence and Data Science

DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

NAVKIS COLLEGE OF ENGINEERING, HASSAN


HASSAN-573201
2023-2024

NAVKIS COLLEGE OF ENGINEERING, HASSAN


(Affiliated to Visvesvaraya Technological University, Belagavi)

DEPARTMENT OF
ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

CERTIFICATE
This is to Certify that the Mini-Project work entitled “VIRTUAL MOUSE” is a Bonafede
work carried out by LIKHITH GN (4YG21AD025),CHIRANTH HL(4YG21AD010), GK
KISHORE URS (4YG22AD018) and NIKHIL HK (4YG22AD034) in partial fulfilment for
the requirement of 6th Semester Bachelor of Engineering in Artificial Intelligence and Data
Science of the Visvesvaraya Technological University, Belagavi, Karnataka during the year
2023-2024. It is certified that all corrections/suggestions indicated for the Internal Assessment
have been incorporated in the report. The Mini-Project report has been approved as it satisfies
all the academic requirements in respect of Mini-Project work prescribed for the 6th Semester
Bachelor of Engineering.

Signature of Guide Signature of HOD

Mrs. Mitha D B.E., MTech., Mr. Vivekananda B.E., M.Tech., (Ph.D),


Assistant Professor HOD
Deptof AI & DS, Dept. of AI & DS,
NCE, Hassan NCE, Hassan
DECLARATION
We, the undersigned students of 6th semester Artificial Intelligence & Data Science, Navkis
College of Engineering, Hassan. Solemnly declare that our project work entitled “VIRTUAL
MOUSE” is a Bonafede work of ours. Our project is neither a copy nor by means a
modification of any other engineering project.

We also declare that this project was not entitled for submission to any other university in the
past and shall remain the only submission made and will not be submitted by us to any other
university in the future.

Name USN Signature


LIKHITH GN 4YG21AD025
CHIRANTH HL 4YG21AD010
KISHORE URS 4YG22AD018
NIKHIL HK 4YG22AD034

ii
ACKNOWLEDGEMENT
A great deal of time and lot of effort has gone into completing this Mini Project work
and documenting it. The number of hours spent in getting through various books and other
materials related to this topic chosen by me have reaffirmed its power and utility in doing this
report work.

Several special people have contributed significantly to this effort. First of all, I am
grateful to my institution NAVKIS COLLEGE OF ENGINEERING, Hassan, which
provided me an opportunity in fulfilling my most cherished desire of reaching the goal.

I acknowledge and express my sincere thanks to our Principal Dr. M. Venu Gopala
Rao for their valuable suggestions and continued encouragement and support in the academic
endeavours.

I wish to express my deep sense of gratitude to Mr. Vivekananda HOD, Department


of Artificial Intelligence and Data Science, for all the guidance and who still remains a constant
driving force and motivated through innovative ideas with tireless support and advice during
the course and helpful suggestions offered.

I express my sincere thanks to my guide Mrs. Mitha D Assistant Professor, Dept of


Artificial Intelligence and Data Science, for her meticulous attention to details, which has
contributed immeasurably to the quality of the seminar report.

My sincere and hearty thanks to my beloved parents, friends and others for their all- time
support and co-operation.

LIKHITH GN (4YG21AD025)
CHIRANTH HL (4YG21AD010)
KISHORE URS (4YG22AD018)
NIKHIL HK (4YG22AD034)

iii
ABSTRACT

The Al Virtual Mouse uses computer vision techniques to track hand movements and
translates them int cursor movements on the screen. The system is designed to be
intuitive and user-friendly, allowing users to interact with their computer without the
need for a physical mouse. The virtual mouse is developed using Python and
OpenCV libraries. The project includes the implementation of various image
processing algorithms, such as hand segmentation, feature extraction, and
classification. Moreover, it is robust to various lighting conditions, backgrounds, and
hand sizes. The developed system provides an alternative to conventional mouse
devices, particularly for individuals with disabilities or those who prefer a more
natural way of interacting with their computers. The target of this project is the
invention of something new in the world of technology that helps an individual work
without the help of a physical mouse. It will save the user money and time. Real-time
images will be continuously collected by the Virtual Mouse color recognition
program and put through a number of filters and conversions. When the procedure is
finished, the program will use an image processing technique to extract the
coordinates for the position of the as accuracy, speed, and robustness, and compared
with existing virtual mouse systems. The trial findings demonstrated a high degree of
accuracy 97.37%; the system can operate well in actual scenarios with just one CPU.
Following that, it will compare the current color schemes within the frames to a list of
color combinations, where various combinations correspond to various mouse
operations. If the current color scheme matches, the program will execute a mouse
command, which will be converted into a real mouse command on the user's
computer

iv
CONTENTS

Page No
Chapter-1
Introduction 1-4
1.1 Introduction about the project 1
1.1.1 Motivation 1
1.1.2 Problem Statement 1-2
1.1.3 Scope of the Project 2
1.1.4 Objectives 2-3
1.2 Review of Literature 3-4

Chapter-2
System Requirement Specification 5-6
2.1 Specific Requirements 5
2.2 Hardware Requirements 5
2.3 Software Requirements 6

Chapter-3
Detailed Design 7
3.1 Design Consideration 7
3.1.1 Functional Requirements 7
3.1.2 Non Functional Requirements 8
3.1.3 Data Requirements 8
3.2 Data flow diagram 8
3.3 Design Overview 9-11
3.4 Functional model 11-12
3.4.1 Plam detection model 12-14
Chapter-4
Implementation 15
4.1 Implementation steps 15
4.1.1 Data Requirements 16
4.1.2 Programming Language Selection 16
4.1.3 Key Features of Programming Language Selected 17
4.2 Coding Guidelines 17
4.3 Pseudo Code for each Module 18

Chapter-5
System Testing 19
5.1 Test Procedure 19
5.2 Unit Test Cases 20
5.3 Integrated test for the System 21

Chapter-6
Results 22
6.1 Results 22
6.2 Some snap shots 24

Chapter-7
Conclusion and Future Enhancements 25
7.1 Conclusion 25

REFERENCES 26
Virtual Mouse 2023-24

CHAPTER 1
INTRODUCTION

1.1 INTRODUCTION ABOUT THE PROJECT

With the development technologies in the areas of augmented reality and devices that we use in
our daily life, these devices are becoming compact in the form of Bluetooth or wireless
technologies. This paper proposes an AI virtual mouse system that makes use of the hand
gestures and hand tip detection for performing mouse functions in the computer using
computer vision. The main objective of the proposed system is to perform computer mouse
cursor functions and scroll function using a web camera or a built-in camera in the computer
instead of using a traditional mouse device. Hand gesture and hand tip detection by using
computer vision is used as a HCI with the computer.

1.2 MOTIVATION

It is fair to say that the Virtual Mouse will soon to be substituting the traditional physical
mouse in the near future, as people are aiming towards the lifestyle where that every
technological device can be controlled and interacted remotely without using any peripheral
devices such as the remote, keyboards, etc. it doesn't just provide convenience, but it's cost
effective as well.

1.3 PROBLEM STATEMENT

It's no surprised that all technological devices have its own limitations, especially when it
comes to computer devices. After the review of various type of the physical mouse, the
problems are identified and generalized. The following describes the general problem that the
current physical mouse suffers:
• Physical mouse is subjected to mechanical wear and tear.
• Physical mouse requires special hardware and surface to operate.
• Physical mouse is not easily adaptable to different environments and its
performance varies depending on the environment.

Dept. of AI&DS, NCE Hassan Page | 1


Virtual Mouse 2023-24

• Mouse has limited functions even in present operational environments.


• All wired mouse and wireless mouse have its own lifespan.

1.4 SCOPE OF THE PROJECT

Virtual Mouse that will soon to be introduced to replace the physical computer mouse to
promote convenience while still able to accurately interact and control the computer system.
To do that, the software requires to be fast enough to capture and process every image, in
order to successfully track the user's gesture. Therefore, this project will develop a software
application with the aid of the latest software coding technique and the open-source computer
vision library also known as the OpenCV. The scope of the project is as below:
• Real time application.
• User friendly application.
• Removes the requirement of having a physical mouse.
The process of the application can be started when the user's gesture was captured in real time by
the webcam, which the captured image will be processed for segmentation to identify which
pixels values equals to the values of the defined colour. After the segmentation is completed,
the overall image will be converted to Binary Image where the identified pixels will show as
white, while the rest are black. The position of the white segment in the image will be
recorded and set as the position of the mouse pointer, thus resulting in simulating the mouse
pointer without using a physical computer mouse. The software application is compatible with
the Windows platform. The functionality of the software will be coded with python
programming language code with the integration of an external library that does the
image processing known as the OpenCV.

1.5 OBJECTIVES

The purpose of this project is to develop a Virtual Mouse application that targets a few aspects
of significant development. For starters, this project aims to eliminate the needs of having a
physical mouse while able to interact with the computer system through webcam by using
various image processing techniques. Other than that, this project aims to develop a Virtual
Mouse application that can be operational on all kind of surfaces and environment.
The following describes the overall objectives of this project:

Dept. of AI&DS, NCE Hassan Page | 2


Virtual Mouse 2023-24

• To design to operate with the help of a webcam. The Virtual Mouse application will be
operational with the help of a webcam, as the webcam are responsible to capture
• To design a virtual input that can operate on all surface. The Virtual Mouse application
will be operational on all surface and indoor environment, as long the users are facing
the webcam while doing the motion gesture.
• To program the camera to continuously capturing the images, which the images will be
analysed, by using various image processing techniques. As stated above, the Virtual
Mouse application will be continuously capturing the images in real time, where the
images will be undergoing a series of process, this includes HSV conversion, Binary
Image conversion, salt and pepper noise filtering, and more.
• To convert hand gesture/motion into mouse input that will be set to a particular screen
position. The Virtual Mouse application will be programmed to detect the position of
the defined colours where it will be set as the position of the mouse pointers.
Furthermore, a combination of different colours may result in triggering different types
of mouse events, such as the right/left clicks, scroll up/down, and more.

1.6 PAPERS REVIEWED

In 2016, S. Shetty et al. constructed a virtual mouse system using colour detection. They used
webcam for detecting mouse cursor movement and click events using OpenCV built-in
functions. A mouse driver, written in java, is required as well. This system fails to perform
well in rough background. P. C. Shindheetal. expanded a method for mouse free cursor control
where mouse cursor operations are controlled by using hand fingers. They have collected hand
gestures via webcam using color detection principles. The built-in function of image processing
toolbox in MATLAB and a mouse driver, written in java are used in this approach. The pointer
was not too efficient on the air as the cursor was very sensitive to the motion.
In 2019, K. Hassan et al. presented a system to design and develop a hand gesture based
virtual mouse. They captured different gestures via webcam and performed mouse functions
according to the gestures. This system achieved 78%-90% accuracy. The system does not
work efficiently in the complex or rough background. In 2021, S. Shriram presented a
handgesture based virtual mouse system using deep-learning. This system achieved 99%
accuracy but didn’t develop keyboard and there is less accuracy in right-click function,
clicking and dragging operations are less accurate.

Dept. of AI&DS, NCE Hassan Page | 3


Virtual Mouse 2023-24

In 2010, Y. Adajania et. al developed a Virtual Keyboard Using Shadow Analysis. This
system detects keyboard, hands shadow and finger tips using color. This system can analyze 3
frames per second.
In 2011, S. Hernanto et al. built a method for virtual keyboard using webcam. In this approach,
two functions are used for finger detection and location. This system used two different
webcams which are used to detect skin and location separately. The average time
per character of this virtual keyboard is 2.92 milliseconds and the average accuracy of this
system is 88.61%.
In 2015, I. Patil et al. constructed a virtual keyboard interaction system using eye gaze and eye
blinking. Their system first detects face and then detects eye and nose region to recognize an eye
blink. The OpenCV java framework is used in this approach. In 160X120 frame size, this
approach achieves 48% accuracy and in 1280X960 frame size, 98% accuracy is achieved.
In 2016, Hubert Cecotti developed a system for disabled people named a multi- modal gaze
controlled virtual keyboard. The virtual keyboard has 8 main commands for menu selection to
spell 30 different characters and a delete button to recover from error. They evaluated the
performance of the system using the speed and information transfer rate at both the command
and application levels.
In 2017, S. Bhuvana et al. constructed a virtual keyboard interaction system using webcam.
This system can detect the hand position over the virtual keyboard. This system provides a
white paper virtual keyboard image and detects which character is pointed. This approach used
built-in function of Image Processing Toolbox in MATLAB.

Dept. of AI&DS, NCE Hassan Page | 4


Virtual Mouse 2023-24

CHAPTER 2

SYSTEM REQUIREMENT SPECIFICATION

2.1 SPECIFIC REQUIREMENTS

• Gesture Recognition
• The system shall detect various hand gestures such as pointing, clicking, dragging, and
scrolling.
• The system shall differentiate between different gestures with high accuracy.
• Gesture Categories
• The system shall recognize basic gestures such as open hand, closed fist, and pointing
finger.
• The system shall identify specific gestures for left-click, right-click, double-click,
scrolling, and dragging.

2.2 HARDWARE REQUIREMENTS

The following describes the hardware needed in order to execute and develop the Virtual
Mouse application:
• Computer Desktop or Laptop The computer desktop or a laptop will be utilized to run
the visual software in order to display what webcam had captured. A notebook which is
a small, lightweight and inexpensive laptop computer is proposed to increase mobility.
System will be using
• Processor: Core2Duo Main Memory: 4GB RAM Hard Disk: 320GB Display: 14"
Monitor
• WEBCAM: A webcam is a digital video device commonly built into a computer. Its
main function is to transmit pictures over the Internet. It is popularly used with instant
messaging services and for recording images. A webcam is video camera that feeds or
streams an image or video in real time to or through a computer network, such as the
internet. Webcams are typically small cameras that sit on a desk, attach to a user's
monitor, or are built into the hardware. Webcams can be used during a video chat session
involving two or more people, with conversations that include live audio and video.

Dept. of AI&DS, NCE Hassan Page | 5


Virtual Mouse 2023-24

2.3 SOFTWARE REQUIREMENTS

PYTHON (version 8.0 and above):


Python is a general-purpose interpreted, interactive, object-oriented, and high- level
programming language. Python is designed to be highly readable. It uses english keywords
frequently where as other languages use punctuation, and it has fewer syntactical
constructions than other languages.

OPENCV:
Sample code for web-cam on import cv2 vid = cv2.VideoCapture(0) while (True):
cv2.imshow('frame', frame) if cv2.waitKey(1) & 0xFF == ord('q'):
break vid.release() cv2.destroyAllWindows()

PYCHARM:
This complete project is executed in Pycharm platform which is user-friendly. PyCharm is an
integrated development environment (IDE) used in computer programming, specifically for
the Python programming language.

Dept. of AI&DS, NCE Hassan Page | 6


Virtual Mouse 2023-24

CHAPTER 3
DETAILED DESIGN

3.1 DESIGN CONSIDERATIONS


3.1.1 Functional Requirements

Cursor Control:
• Cursor Movement
The system shall map hand movements to the cursor on the screen, allowing users to
move the cursor by moving their hand.
The system shall ensure smooth and responsive cursor movement, minimizing lag.2.2
• Sensitivity Adjustment
The system shall allow users to adjust the sensitivity of the cursor movement to
accommodate different user preferences and hand movements.3.
• Double-Click
• Left click
• Right click
The system shall recognize a gesture for double-clicking (e.g., a quick double tap) and
perform a double-click at the cursor's location.

Scrolling:
• Vertical Scrolling
The system shall recognize gestures for vertical scrolling (e.g., an up or down swipe)
and perform the corresponding scroll action on the screen.
• Horizontal Scrolling
The system shall recognize gestures for horizontal scrolling (e.g., a left or right swipe)
and perform the corresponding scroll action on the screen.

Calibration and Configuration:


• Calibration Process
The system shall provide a calibration process to ensure accurate detection and
mapping of hand gestures.
The system shall guide users through the calibration process with clear instructions.

Dept. of AI&DS, NCE Hassan Page | 7


Virtual Mouse 2023-24

3.1.2 Non-Functional Requirements

• The system shall process hand gestures in real-time with minimal latency to ensure a
seamless and responsive user experience.
• The system shall operate efficiently on standard computing hardware without
significant performance degradation.
• The system shall achieve high accuracy in hand gesture recognition, with an
accuracy rate of 95% or higher.
• The system shall minimize false positives and false negatives in gesture detection.

3.1.3 Data Requirements

• The system shall be compatible with major operating systems, including Windows,
macOS, and Linux.
• The system shall integrate seamlessly with a wide range of applications and
software, ensuring broad usability.

3.2 DATA FLOW DIAGRAM

Figure 3.1: Data Flow Diagram

Dept. of AI&DS, NCE Hassan Page | 8


Virtual Mouse 2023-24

The proposed model for a virtual mouse using OpenCV in Python leverages computer vision
to enable mouse control through hand gestures. This model uses the MediaPipe library for
hand detection and tracking, which provides the landmarks of the hand. Capturing video from
the webcam, the model processes each frame to detect hand landmarks, particularly focusing
on the index finger tip for mouse movement. By mapping the coordinates of the index finger tip
to screen coordinates, the model can control the mouse pointer using PyAutoGUI.
Additionally, the model can recognize specific gestures, such as the distance between the
index finger and thumb, to perform mouse clicks. This allows for an intuitive and handsfree way
to interact with a computer. Enhanced features like cursor movement, left click, right- click,
double click, screenshot can be implemented by recognizing different gestures or
combinations of finger movements, making the system more versatile and user-friendly. To
achieve mouse clicks, the model measures the distance between the index finger tip and the
thumb tip. When this distance falls below a certain threshold, it simulates a mouse click using
PyAutoGUI. This threshold-based gesture recognition ensures that clicks are intentional
and reduces the chances of false positives. The model's design can be extended to include
more complex gestures for additional mouse functions, such as right-clicking, scrolling, and
drag-and-drop operations. For example, a right-click could be triggered by detecting a specific
hand gesture involving multiple fingers, while scrolling could be controlled by detecting
vertical hand movements.
Overall, this virtual mouse model offers a robust and adaptable solution for touchless
computer interaction. It is particularly beneficial for accessibility, providing an alternative for
users with limited mobility or those looking for a more hygienic way to interact with their
devices. The system's reliance on widely available hardware (a standard webcam) and open-
source software libraries makes it accessible and implementable for a broad range of users and
applications.
The key features of the proposed model for a virtual mouse using OpenCV in Python are:
1. Hand Detection and Tracking:
• Utilizes MediaPipe for robust and accurate hand detection and tracking.
• Identifies and tracks hand landmarks in real-time.
2. Mouse Pointer Control:
• Maps the coordinates of the index finger tip to screen coordinates for precise mouse
pointer movement.
• Ensures smooth and responsive cursor movement using PyAutoGUI.

Dept. of AI&DS, NCE Hassan Page | 9


Virtual Mouse 2023-24

3. Gesture-Based Click Detection:


• Implements gesture recognition to detect mouse clicks by measuring the distance between
the index finger tip and thumb tip.
• Uses a threshold distance to trigger left mouse clicks, ensuring accuracy and reducing
false positives.
4. Right-Click Functionality:
• Recognizes additional gestures or finger combinations to perform right-click actions.
• Enhances usability by allowing more complex interactions.
5. Scrolling Mechanism:
• Detects vertical hand movements or specific gestures to enable scrolling.
• Provides a seamless way to navigate through documents and web pages.
6. Real-Time Performance:
• Processes video frames in real-time to ensure immediate response to user gestures.
• Optimizes performance for smooth and lag-free interaction.
7. Adaptability:
• Can be customized and extended to recognize additional gestures for various mouse
functions.
• Supports different lighting conditions and hand sizes with appropriate calibration.
8. Accessibility and Hygiene:
• Provides an alternative input method for users with limited mobility.
• Offers a touchless interaction option, enhancing hygiene by reducing physical contact
with devices.

3.3 DESIGN OVERVIEW

Python is an interpreted, high-level, general-purpose programming language. Created by


Guido van Rossum and first released in 1991, Python's design philosophy emphasizes code
readability with its notable use of significant whitespace. Its language constructs and object-
oriented approach aim to help programmers write clear, logical code for small and large-scale
projects. Python is dynamically typed and garbage-collected. It supports multiple
programming paradigms, including procedural, object-oriented, and functional programming.
Python is often described as a "batteries included" language due to its comprehensive standard
library. Python was conceived in the late 1980s as a successor to the ABC language. Python

Dept. of AI&DS, NCE Hassan Page | 10


Virtual Mouse 2023-24

2.0, released in 2000, introduced features like list comprehensions and a garbage collection
system capable of collecting reference cycles. Python 3.0, released in 2008, was a major
revision of the language that is not completely backward-compatible, and much Python 2 code
does not run unmodified on Python 3. The Python 2 language, i.e. Python 2.7.x, was officially
discontinued on January 1, 2020 (first planned for 2015) after which security patches and other
improvements will not be released for it. With Python 2's end-of-life, only Python 3.5.x and
later are supported. Python interpreters are available for many operating systems. A global
community of programmers develops and maintains CPython, an open-source reference
implementation. A non-profit organization, the Python Software Foundation, manages and
directs resources for Python and Python development. Python was conceived in the late 1980s
by Guido van Rossum at Centrum Wiskunde & Informatica (CWI) in the Netherlands as a
successor to the ABC language (itself inspired by SETL), capable of exception handling and
interfacing with the Amoeba operating system. Its implementation began in December 1989.
Van Rossum shouldered sole responsibility for the project, as the lead developer, until July 12,
2018, when he announced his "permanent vacation" from his responsibilities as Python's
Benevolent Dictator for Life, a title the Python community bestowed upon him to reflect his
long-term commitment as the project's chief decision-maker. He now shares his leadership as a
member of a five- person steering council. In January, 2019, active Python core developers
elected Brett Cannon, Nick Coghlan, Barry Warsaw, Carol Willing and Van Rossum to a
five- member "Steering Council" to lead the project. Python 2.0 was
released on 16 October 2000 with many major new features, Dept. Of CSE, NCE Hassan Page
| 13 AI-Powered Legal Documentation Assistant 2023-24 including a cycle- detecting garbage
collector and support for Unicode. Python 3.0 was released on 3 December 2008. It was a
major revision of the language that is not completely backward-compatible. Many of its major
features were backported to Python 2.6.x and 2.7.x version series. Releases of Python 3
include the 2to3 utility, which automates (at least partially) the translation of Python 2 code to
Python 3. Python 2.7's end-of-life date was initially set at 2015 then postponed to 2020 out of
concern that a large body of existing code could not easily be forward-ported to Python.

3.4 FUNCTIONAL MODEL

In this project we have used mainly Landmarking Algorithm in which we will be having palm
model and hand landmark model. This algorithm uses machine leaning algorithm and is
present in MediaPipe package. Palm model and hand landmark model are described below.

Dept. of AI&DS, NCE Hassan Page | 11


Virtual Mouse 2023-24

Land marking algorithm

3.4.1 Palm Detection Model

To detect initial hand locations, we designed a single-shot detector model optimized for mobile
real-time uses in a manner similar to the face detection mode in MediaPipe Face Mesh.
Detecting hands is a decidedly complex task: our lite model and full model have to work
across a variety of hand sizes with a large-scale span (~20x) relative to the image frame and be
able to detect occluded and self- occluded hands. Whereas faces have high contrast patterns,
e.g., in the eye and mouth region, the lack of such features in hands makes it comparatively
difficult to detect them reliably from their visual features alone.
Our method addresses the above challenges using different strategies. First, we train a palm
detector instead of a hand detector, since estimating bounding boxes of rigid objects like palms
and fists is significantly simpler than detecting hands with articulated fingers. In addition, as
palms are smaller objects, the non- maximum suppression algorithm works well even for two-
hand self-occlusion cases, like handshakes. Moreover, palms can be modeled using square
bounding boxes (anchors in ML terminology) ignoring other aspect ratios, and
feature extractor is used for bigger scene context awareness even for small objects (similar to
the Retina Net approach). Lastly, we minimize the focal loss during training to support a large
number of anchors resulting from the high scale variance.
With the above techniques, we achieve an average precision of 95.7% in palm detection. Using
a regular cross entropy loss and no decoder gives a baseline of 29 just 86.22%.

3.4.2 Hand Landmark Model

After the palm detection over the whole image our subsequent hand landmark model performs
precise keypoint localization of 21 3D hand-knuckle coordinates inside the detected hand
regions via regression that is direct coordinate prediction. The model learns a consistent internal
hand pose representation and is robust even to partially visible hands and self- occlusions.
To obtain ground truth data, we have manually annotated ~30K real-world images with 21 3D
coordinates, as shown below (we take Z-value from image depth map, if it exists per
corresponding coordinate). To better cover the possible hand poses and provide additional
supervision on the nature of hand geometry, we also render a high- quality synthetic hand
model over various backgrounds and map it to the corresponding 3D coordinates.
The mechanism of detecting the famous human-made sculptures, buildings, and monuments

Dept. of AI&DS, NCE Hassan Page | 12


Virtual Mouse 2023-24

inside an image is defined as Landmark Detection. You can simply compare it with the
famous application of google known as google Landmark Detection, which is used by google
maps. Landmarks are favored picture highlights for an assortment of computer vision errands
such as picture mensuration, enlistment, camera calibration, movement examination, 3D scene
remaking, and protest acknowledgment.
In this project we are considering landmarks of hand shown in Fig 4.2.1. According to this,
particular operation will be done when particular finger is raised. This is in-built present in
MediaPipe package which is trained already and is present in Python programming language.

Fig 3.1: Landmarks of a hand

The ability to perceive the shape and motion of hands can be a vital component in improving
the user experience across a variety of technological domains and platforms. For example, it
can form the basis for sign language understanding and hand gesture control, and can also
enable the overlay of digital content and information on top of the physical world in
augmented reality. While coming naturally to people, robust real-time hand perception is a
decidedly challenging computer vision task, as hands often occlude themselves or each other
(e.g: finger/palm occlusions and handshakes) and lack high contrast patterns.
MediaPipe hand is a high-fidelity hand and finger tracking solution. It employs machine
learning (ML) to infer 21 3D landmarks of a hand from just a single frame. Whereas current
state-of-the-art approaches rely primarily on powerful desktop environments for inference, our
method achieves real-time performance on a mobile phone, and even scales to multiple hands.
We hope that providing this hand perception functionality to the wider research and
development community will result in an emergence of creative use cases, stimulating new

Dept. of AI&DS, NCE Hassan Page | 13


Virtual Mouse 2023-24

applications and new research avenues.


MediaPipe hands utilize an ML pipeline consisting of multiple models working together.ML
pipeline is described below.

3.4.3 OpenCV

OpenCV is the huge open-source library for the computer vision, machine learning, and image
processing and now it plays a major role in real-time operation which is very important in
today’s systems. By using it, one can process images and videos to identify objects, faces, or
even handwriting of a human. When it integrated with various libraries, such as NumPy,
python is capable of processing the OpenCV array structure for analysis. To identify image
pattern and its various features we use vector space and perform mathematical operations on
these features.
The first OpenCV version was 1.0. OpenCV is released under a BSD license and hence it’s free
for both academic and commercial use. It has C++, C, Python and Java interfaces and supports
Windows, Linux, Mac OS, iOS and Android. When OpenCV was designed the main focus was
real-time applications for computational efficiency.
All things are written in optimized C/C++ to take advantage of multi-core processing.

Applications of OpenCV: There are lots of applications which are solved using OpenCV,
some of them are listed below
Face recognition
Automated inspection and surveillance number of people – count Video/image search and
retrieval
Robot and driver-less car navigation and control OpenCV Functionality:
• Image/video I/O, processing, display
• Object/feature detection
• Geometry-based monocular or stereo computer vision (calib3d, stitching, video stab)
• Computational photography (photo, video, suppress)
• Machine learning & clustering (ml, flan)

Dept. of AI&DS, NCE Hassan Page | 14


Virtual Mouse 2023-24

CHAPTER 4
IMPLEMENTATION

Implementing a virtual mouse using hand gestures involves a multifaceted approach that
integrates hardware, software, and machine learning components to create a seamless user
experience. The process begins with selecting the appropriate hardware, typically a
highresolution RGB camera or a depth sensor like Intel RealSense or Microsoft Kinect, to
accurately capture hand movements. The camera needs to be strategically positioned to ensure
a clear and unobstructed view of the user's hand. On the software side, developers must install
and configure essential libraries and frameworks. OpenCV and MediaPipe are commonly used
for computer vision tasks, while TensorFlow or PyTorch are preferred for building and
training machine learning models.
Python serves as the primary programming language due to its robust support for these
libraries. The next step involves collecting a diverse dataset of hand gestures, capturing images
or videos of various gestures under different conditions to ensure robustness. Preprocessing
this data is crucial and includes steps such as resizing images, normalizing pixel values, and
segmenting the hand from the background using tools like OpenCV. For gesture recognition,
developers must select a suitable model architecture; CNNs are effective for static gestures,
whereas Spatio-Temporal Transformers or RNNs are ideal for dynamic gestures involving
sequences of frames.
The chosen model is trained on the preprocessed dataset, employing data augmentation
techniques to improve generalization and prevent overfitting. In real-time applications, frames
are continuously captured from the camera, preprocessed, and fed into the trained model for
gesture detection. The recognized gestures are then mapped to corresponding mouse actions,
such as cursor movement, clicking, or scrolling, using Python libraries like pyautogui.
Implementing smooth and responsive cursor control is essential to ensure a natural user
experience. A calibration process is necessary to fine-tune the system, guiding users through
predefined gestures to adjust sensitivity and recognition thresholds.

4.1 IMPLEMENTATION STEPS

Creating a virtual mouse using OpenCV in Python involves several steps. Below is an outline of
the implementation process, focusing on hand gesture to control the mouse pointer. This
example uses the Mediapipe library for hand tracking and pyautogui for controlling the mouse.

Dept. of AI&DS, NCE Hassan Page | 15


Virtual Mouse 2023-24

1. Install Required Libraries

pip install opencv-python


pip install mediapipe
pip install pyautogui
pip install protobuf

2. Import Libraries

Python import cv2


import mediapipe as mp
import pyautogui

3. Initialize Mediapipe and PyAutoGUI

python mp_hands = mp.solutions.hands


hands=mp_hands.Hands(min_detection_confidence=0.7,min_tracking_confidence=0.7)
mp_draw = mp.solutions.drawing_utils

4. Capture Video from Webcam

Python cap = cv2.VideoCapture(0)


screen_width, screen_height = pyautogui.size()

5. Process Each Frame

Python while True:


ret, frame = cap.read()
frame = cv2.flip(frame, 1)
# Flip the frame horizontally frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
results = hands.process(frame_rgb)
if results.multi_hand_landmarks:
for hand_landmarks in results.multi_hand_landmarks:
mp_draw.draw_landmarks(frame,hand_landmarks,mp_hands.HAND_CONNECTIONS)
index_finger_tip=hand_landmarks.landmark[mp_hands.HandLandmark.INDEX_FINGER_TIP]
x, y = int(index_finger_tip.x * frame.shape[1]), int(index_finger_tip.y * frame.shape[0])

Dept. of AI&DS, NCE Hassan Page | 16


Virtual Mouse 2023-24

# Map the coordinates to the screen size


screen_x = int(screen_width * (1 - index_finger_tip.x)) screen_y = int(screen_height *
index_finger_tip.y) pyautogui.moveTo(screen_x, screen_y) cv2.imshow("Virtual Mouse",
frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()

6. Gesture Recognition for Clicks (Optional)

You can add gesture recognition to simulate mouse clicks, such as detecting a specific
gesture for clicking.
Python def is_clicking(landmarks):
# Implement logic to determine if the hand is making a clicking gesture pass while True:
ret, frame = cap.read() frame = cv2.flip(frame, 1)
frame_rgb=cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) results = hands.process(frame_rgb)
if results.multi_hand_landmarks:
for hand_landmarksin
results.multi_hand_landmarks
mp_draw.draw_landmarks(frame,hand_landmarks,mp_hands.HAND_CONNECTIONS)
index_finger_tip=hand_landmarks.landmark[mp_hands.HandLandmark.INDEX_FINGER_TIP]
x, y = int(index_finger_tip.x * frame.shape[1]), int(index_finger_tip.y * frame.shape[0])
screen_x = int(screen_width * (1 - index_finger_tip.x))
screen_y = int(screen_height * index_finger_tip.y) pyautogui.moveTo(screen_x,screen_y)
if is_clicking(hand_landmarks):
pyautogui.click()
cv2.imshow("Virtual Mouse", frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()

Dept. of AI&DS, NCE Hassan Page | 17


Virtual Mouse 2023-24

Explanation:

• Initialization: Mediapipe is initialized for hand tracking, and pyautogui is used for
controlling the mouse.
• Video Capture: Captures frames from the webcam and processes each frame.
• Hand Detection: Uses Mediapipe to detect hand landmarks and draw them on the
frame.
• Mouse Movement: Maps the coordinates of the index finger tip to the screen size and
moves the mouse pointer accordingly.
• Gesture Recognition: Optionally, you can implement gesture recognition to simulate
mouse clicks or other actions.
This implementation provides a basic virtual mouse. You can extend it further by adding
more gestures and refining the detection and mapping processes.

Dept. of AI&DS, NCE Hassan Page | 18


Virtual Mouse 2023-24

CHAPTER 5

SYSTEM TESTING

5.1 TEST PROCEDURES

Testing a virtual mouse system that uses hand gestures is a multi-phase process critical to
ensuring that the system is accurate, responsive, and user-friendly. This process involves a
series of rigorous tests, from initial unit tests of individual components to extensive user
testing to gather feedback and make iterative improvements. Each phase of testing aims to
validate different aspects of the system, ensuring that it functions correctly under various
conditions and provides a seamless user experience.

Unit Testing:
Unit testing is the first step in the testing process and involves verifying the functionality of
individual components of the system. This includes testing the camera setup, image
preprocessing functions, and the machine learning models used for gesture recognition. For
example, each function that processes video frames, extracts hand features, or classifies
gestures needs to be thoroughly tested to ensure it performs as expected. Unit tests are
typically automated using frameworks like pytest for Python, which allows for consistent and
repeatable testing. These tests check for correct input-output behavior, error handling, and
performance under different conditions. By ensuring that each component functions correctly
in isolation, unit testing helps prevent bugs and ensures a solid foundation for the more
complex interactions that occur in the system.

Integration Testing:
Integration testing involves verifying that different components of the system work together as
intended. In the context of a virtual mouse using hand gestures, this means testing the
interaction between the camera, the gesture recognition model, and the cursor control
mechanism. Integration tests ensure that data flows correctly between components, that
preprocessing steps do not corrupt data, and that recognized gestures accurately translate into
cursor movements and clicks. For example, tests might involve simulating a series of gestures
and verifying that the cursor moves and clicks as expected.

Dept. of AI&DS, NCE Hassan Page | 19


Virtual Mouse 2023-24

System Testing:
System testing involves evaluating the entire system as a whole to ensure it meets the specified
requirements. This phase of testing focuses on the overall functionality, performance, and
reliability of the virtual mouse system. Testers perform a series of tasks that mimic real-world
usage scenarios, such as navigating through files, opening applications, and performing drag-
and-drop operations using hand gestures. These tests assess the system's ability to handle
various gestures, its responsiveness to rapid movements, and its stability over extended periods
of use. System testing also includes stress testing to evaluate how the system performs under
heavy load, such as when multiple applications are running simultaneously or when gestures
are performed rapidly in succession. Testers recruit a diverse group of users, including those
with varying levels of technical proficiency, to perform common tasks using the hand gesture
system. During UAT, users are asked to provide feedback on the ease of use, accuracy of
gesture recognition, and overall satisfaction with the system. This feedback is invaluable for
identifying any usability issues

User Acceptance Testing (UAT):


User Acceptance Testing (UAT) involves real users interacting with the system to ensure it
meets their needs and expectations. This phase of testing is crucial for gathering feedback on
the usability and practicality of the virtual mouse system. Testers recruit a diverse group of
users, including those with varying levels of technical proficiency, to perform common tasks
using the hand gesture system. During UAT, users are asked to provide feedback on the ease of
use, accuracy of gesture recognition, and overall satisfaction with the system. This feedback is
invaluable for identifying any usability issues or areas where the system could be improved.
Developers can then make adjustments based on this feedback, such as refining gesture
recognition algorithms, improving the user interface, or adjusting the sensitivity of the system.

Performance Testing:
Performance testing is conducted to evaluate the responsiveness and efficiency of the system.
This involves measuring the system's response time to gestures, the accuracy of gesture
recognition under different conditions, and the system's ability to process gestures in real time.
Performance tests might include scenarios with varying lighting conditions, different
backgrounds, and multiple users to ensure the system performs consistently across different
environments. Tools like OpenCV can be used to simulate different conditions, and profiling
tools can help identify any performance bottlenecks in the code.

Dept. of AI&DS, NCE Hassan Page | 20


Virtual Mouse 2023-24

Security Testing:
Security testing is an important aspect, especially if the virtual mouse system is intended for
environments where security is a concern, such as workplaces or public kiosks. This phase
involves ensuring that the system cannot be easily manipulated or spoofed. For example, tests
might involve trying to confuse the system with pictures of hands or gestures performed by
different people. Security testing also involves ensuring that any data captured by the system,
such as video feeds, is handled securely and does not expose users to privacy risks. This may
involve encrypting data streams, implementing secure storage for any captured data, and
ensuring compliance with relevant privacy regulations.

Regression Testing:
Regression testing is conducted whenever changes are made to the system to ensure that new
updates or bug fixes do not introduce new issues. This involves re-running previous tests to
verify that the system still functions as expected after modifications. Automated testing
frameworks are particularly useful for regression testing, as they allow for quick and
consistent retesting of the system. Regression tests ensure that the system remains stable and
reliable over time, even as new features are added or existing ones are improved.

Usability Testing:
Usability testing focuses specifically on the user experience and interface of the virtual mouse
system. This involves observing users as they interact with the system to identify any areas
where they encounter difficulties or confusion. Testers may use methods such as thinkaloud
protocols, where users verbalize their thoughts while using the system, or eye- tracking to see
where users focus their attention. The goal of usability testing is to make the system as
intuitive and user-friendly as possible, ensuring that users can perform tasks efficiently and
without frustration. Feedback from usability testing can lead to improvements in the design of
the user interface, the responsiveness of the system, and the clarity of the instructions provided
to users.

Continuous Testing and Monitoring:


After the initial testing phases, continuous testing and monitoring are essential to maintain the
quality and reliability of the virtual mouse system. This involves setting up automated tests to
run regularly changes to the system. Continuous monitoring can track the performance and
usage of the system in real-time, identifying any issues arise during actual use.

Dept. of AI&DS, NCE Hassan Page | 21


Virtual Mouse 2023-24

CHAPTER 6
RESULTS AND ANALYSIS
6.1 RESULTS

This AI virtual mouse system and virtual keyboard can be created totally utilizing open-
source software. So, anyone can utilize anywhere with computers, no particular preparing
ought to be required to function the framework. They just need to know the hand gestures for
particular operation. This project uses the concept of advancing the HCI using computer
vision. In this proposed system, there is no drawback of detecting of different skin colors of
hand.
The proposed systems use the following tools i.e., Python3.8 and above, OpenCV, MediaPipe,
Numpy, Autopy, PyAutoGUI and time. This complete process is implemented in the PyCharm
platform. Once after running the program, the camera of your device will be automatically
accessed and you can start operating your system with different hand gestures. Different hand
gestures for computer to perform mouse operations are given below:

Figure 6.1: Recognizing hand (No operation)

Dept. of AI&DS, NCE Hassan Page | 22


Virtual Mouse 2023-24

Figure 6.2: For the cursor movement

Figure 6.3: Gesture for scroll down function

Dept. of AI&DS, NCE Hassan Page | 23


Virtual Mouse 2023-24

Figure 6.4: To stop the movement of cursor

Figure 6.5: For the Mouse to Perform Double Click

Dept. of AI&DS, NCE Hassan Page | 24


Virtual Mouse 2023-24

CONCLUSION

In conclusion, it’s no surprised that the physical mouse will be replaced by a virtual
nonphysical mouse in the Human-Computer Interactions (HCI), where every mouse
movement can be executed with a swift of your fingers everywhere and anytime without any
environmental restrictions. This project had developed a colour recognition program with the
purpose of replacing the generic physical mouse without sacrificing the accuracy and
efficiency, it is able to recognize colour movements, combinations, and translate them into
actual mouse functions. Due to accuracy and efficiency plays an important role in making the
program as useful as an actual physical mouse, a few techniques had to be implemented.

First and foremost, the coordinates of the colours that are in charge of handling the cursor
movements are averaged based on a collection of coordinates, the purpose of this technique is
to reduce and stabilize the sensitivity of cursor movements, as slight movement might lead to
unwanted cursor movements. Other than that, several colour combinations were implemented
with the addition of distance calculations between two colours within the combination, as
different distance triggers different mouse functions. The purpose of this implementation is to
promote convenience in controlling the program without much of a hassle.

Therefore, actual mouse functions can be triggered accurately with minimum trial and errors.
Furthermore, to promote efficient and flexible tracking of colours, calibrations phase was
implemented, this allows the users to choose their choices of colours on different mouse
functions, as long the selected colours doesn't fall within the same/similar RGB values (e.g.
blue and sky-blue). Other than that, adaptive calibrations were also implemented as well, it is
basically allows the program to save different set of HSV values from different angles where it
will be used during the recognition phase.

Dept. of AI&DS, NCE Hassan Page | 25


Virtual Mouse 2023-24

REFERENCES

[1] J. Katona, “A review of human–computer interaction and virtual reality research fields in
cognitive Info Communications,” Applied Sciences, vol. 11, no. 6, p. 2646, 2021.
[2] P. M. Game and A. R Mahajan, "A gestural user interface to Interact with computer system
", International Journal on Science and Technology (IJSAT), vol. II, no. I, pp. 018- 027, Jan. -
Mar.2011.
[3] Erdem, E. Yardimci, Y. Atalay, V. Cetin, 2002. Computer vision -based mouse,
Proceedings. (ICASS). IEEE International Conference.
[4] Crowley, J., Bérard, F., and Coutaz, J.“Finger tacking as an input device for augmented
reality, Automatic Face and Gesture Recognition”, Zurich, 195--200,1995.
[5] G.R. Bradski, “Computer video face tracking for use in a perceptual user interface”, 2nd
Quarter, Intel Technology Journal, 1998.
[6] Tsung-Hsiang Chang, Tom Yeh, and Robert C Miller. 2010. “GUI Testing Using Computer
Vision”. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
(2010), 1535–1544.
[7] T. Murase, et al., "Gesture keyboard requiring only one camera," Proc. of the 24th Annual
ACM Symposium Adjunct on User Interface Software and Technology, Santa Barbara, USA,
pp. 9-10, Oct.2011.
[8] Cecotti, H.: A multimodal gaze-controlled virtual keyboard. IEEE Transactions on Human-
Machine Systems 46(4), 601–606(2016).
[9] P. Nandhini, J. Jaya, and J. George, “Computer vision system for food quality evaluation—
a review,” in Proceedings of the 2013 International Conference on Current Trends in
Engineering and Technology (ICCTET), pp. 85–87, Coimbatore, India, July2013.
[10] Hernanto, S., Suwardi, I.S.: Webcam virtual keyboard. In: Proceedings of the 2011
International Conference on Electrical Engineering and Informatics. pp. 1–5. IEEE (2011).

Dept. of AI&DS, NCE Hassan Page | 26

You might also like