New Final Year Project
New Final Year Project
I have read the paper, and in my opinion, it is fully adequate in the scope and quality of Jimma
Institute of Technology Faculty of Electrical and Computer engineering.
Advisor Signature
Mr. Sileshi A. ___________
I
ACKLOWLEDGEMENT
First and foremost, we would like to express our gratitude to the almighty God for keeping us
safe and providing us with the opportunity to arrive here after overcoming various hurdles along
the way. The joy that accompanies the successful completion of any endeavor would be
incomplete without the unwavering cooperation of the individuals whose support made it
possible, and whose constant guidance and encouragement crowned all efforts with success.
Special thanks for Mr. Sileshi A, our project advisor, for his guidance, inspiration, and
constructive suggestions, all of which significantly contributed to the project's success.
We also extend our gratitude to other individuals who supported the execution of this project.
Their friendship, empathy, and great sense of humor have been invaluable. We are deeply
humbled and grateful to acknowledge our indebtedness to all those who have assisted us in
developing these ideas. The successful completion of any endeavor, at any level, is impossible
without the support and guidance of our parents and friends. Therefore, we express our heartfelt
thanks to our friends and parents for their assistance in gathering information, providing financial
support, collecting data, and guiding us throughout the completion of this project.
II
TABLE OF CONTENT
DECLARATION ............................................................................................................................. I
ACKLOWLEDGEMENT .............................................................................................................. II
INTRODUCTION ...........................................................................................................................1
III
SYSTEM DESIGN METHODOLOGY ....................................................................................... 11
REFERENCE ................................................................................................................................ 45
APPENDIX ................................................................................................................................... 47
IV
LIST OF FIGURE
Figure 1 : Methodology of the Project ............................................................................................ 4
Figure 2 : Block Diagram of the proposed system .......................................................................11
Figure 3 : Block diagram of an AI design section .......................................................................12
Figure 4 : Image Source Resources .............................................................................................13
Figure 5 : Image dataset of Mr. Ashebir ....................................................................................... 15
Figure 6 : Image dataset of Mr.Adane ........................................................................................16
Figure 7 : Image dataset of Eng. Kris ............................................................................................16
Figure 8 : Image dataset of Mr. Gadisa ........................................................................................ 16
Figure 9 : Image dataset of Zelalem ..............................................................................................16
Figure 10 : Image dataset of Tewodros ......................................................................................... 17
Figure 11 : Image dataset of Temesgen ........................................................................................17
Figure 12 : Image dataset of Yosef ............................................................................................. 17
Figure 13 : CNN Architecture ....................................................................................................... 21
Figure 14 : Hardware Design ........................................................................................................ 23
Figure 15 : Arduino Uno ............................................................................................................... 24
Figure 16 : Mobile camera ............................................................................................................ 25
Figure 17 : Speaker ........................................................................................................................25
Figure 18 : Switch ......................................................................................................................... 25
Figure 19 : Software Design ..........................................................................................................26
Figure 20 : Flow Chart for object or person detection .................................................................. 27
Figure 21 : Flow Chart for Text-To-Speech Device ..................................................................... 28
Figure 22 : Google Colab .............................................................................................................. 29
Figure 23 : Python IDE ................................................................................................................. 30
Figure 24 : Arduino IDE ............................................................................................................... 30
Figure 25 : TensorFlow ................................................................................................................. 31
Figure 26 : Kera’s ..........................................................................................................................32
Figure 27 : Visual Studio Code Editor .......................................................................................... 32
Figure 28 : OpenCV ...................................................................................................................... 33
Figure 29 : Pickle ......................................................................................................................... 33
Figure 30 : Integration Part ........................................................................................................... 34
V
Figure 31 : The Prototype of Text-To-Speech Device ................................................................. 35
Figure 32 : Working Principle of object or face detection ............................................................ 36
Figure 33 : Working Principle of the Text-To-Speech Device ..................................................... 37
Figure 34 : Accuracy and loss ....................................................................................................... 39
Figure 36 : Result from testing the system ....................................................................................42
VI
LIST OF TABLES
Table 1 : Data Collection Process ................................................................................................. 13
Table 2 : Face Detections .............................................................................................................. 15
Table 3 : Table of Training Data ................................................................................................... 18
Table 4 : Selected Objects ............................................................................................................. 18
Table 5 : Data Augmentation Parameters ......................................................................................20
Table 6 : Hardware Components Used .........................................................................................24
Table 7 : Hyper-Parameters ...........................................................................................................38
VII
ACRONYMS
AI Artificial Intelligence
CNN Convolutional Neural Network
GPU Graphics Processing Unit
GUI Graphical User Interface
IDE Integrated Development Environment
LMICs Low- and middle-income countries
MLP Multilayer Perceptron
OCR Optical Character Recognition
OpenCV Open Source Computer Vision Library
TF TensorFlow
TTS Text-to-Speech (already defined)
USB Universal Serial Bus
VCS Version Control Systems
VS Code Visual Studio Code
WHO World Health Organisation
mAP mean Average Precision
YOLOv8 You Only Look Once version 8
VIII
EXECUTIVE SUMMARY
The challenges faced by visually impaired individuals in navigating their environments,
recognizing people, understanding object types and distances, and accessing written materials
pose significant barriers to their independence and quality of life, especially in developing
countries with limited resources for specialized technologies and services. Visually impaired
individuals face challenges in identifying people, objects, and distances in a room, which can
hinder their daily activities and interactions. To address this issue, the proposed project aims to
provide a comprehensive solution by developing a system that can detect and recognize
individuals in a room, provide distance information, identify objects, and even read books aloud
for visually impaired individuals. By leveraging computer vision techniques, face recognition
algorithms, distance estimation technologies, and text-to-speech capabilities, the project
empowers visually impaired individuals to navigate their surroundings with greater ease and
confidence. The benefits of this project extend to society as a whole, especially in developing
countries where resources for visually impaired individuals may be limited. By enhancing the
independence and safety of visually impaired individuals, this project contributes to creating a
more inclusive and accessible environment for all individuals, regardless of their visual abilities.
By bridging the gap between perception and information, our project enhances the independence,
safety and inclusion of visually impaired individuals across various settings, including
educational institutions, workplaces, and public spaces. Through collaboration with local
communities and organizations, we aim to deploy scalable and cost-effective solutions that
significantly improve access to education, employment opportunities, and social interactions for
this marginalized population, promoting inclusivity and dignity for all members of society. This
project developed a precise Machine Vision-Based Assistance System for visually impaired
individuals, using CNNs and optimized hyperparameters for real-time identification. Testing
verified exceptional accuracy, guaranteeing effectiveness in diverse settings, enhancing
independence, safety, and inclusion by providing real-time information for social interactions,
mobility, and accessing written materials, thereby promoting a more inclusive environment.
Keywords: AI, Face Recognition, Computer vision, Image Processing, Object recognition,
Optical Character Recognition.
IX
CHAPTER ONE
INTRODUCTION
1.1 Background
Visual impairment poses significant challenges in the daily lives of affected individuals,
impacting various aspects such as social interaction, education, and mobility safety. The inability
to see clearly affects their ability to engage socially, access education materials, and navigate
their environment safely. While traditional tools like white canes and guide dogs have been
instrumental in enhancing independence and mobility, they often fall short in providing the
comprehensive support necessary for seamless integration into society. These tools, while
beneficial, do not fully address the diverse and complex needs of visually impaired individuals.
The prevalence of visual impairment is staggering, with the World Health Organisation (WHO)
reporting a massive increase from 733 million people worldwide in 2010 to 2.2 billion in late
2019 [1]. This increase is attributed to various factors, including insufficient access to healthcare
and rehabilitative support services, especially in low- and middle-income countries (LMICs).
Studies in LMICs have highlighted the lack of access to healthcare services, medical
rehabilitation, and assistive devices, leading to delays in needed medical evaluations and
preventive care.
Visual impairment leads to disability across the severity spectrum, affecting virtually all aspects
of an individual's life. It hinders the ability to complete critical activities of daily living, reduces
mobility and social participation, and increases the risk of depression. Factors such as
physical/environmental barriers and social factors like discrimination can magnify the impact of
vision loss [2].
Recent advancements in Artificial Intelligence (AI) have ushered in a new era of assistive
technologies tailored to the unique needs of the visually impaired community. These AI-powered
systems offer sophisticated functionalities that surpass the capabilities of traditional tools. For
example, facial recognition technology has integrated advanced algorithms with real-time
distance estimation, improving social engagement and mobility. Additionally, the integration of
Optical Character Recognition (OCR) technology with real-time image capture has enabled
instant reading of printed text, enhancing information access [3].
1|Page
AI has also been applied to object recognition technology, environmental sound analysis, and
AI-powered companions, further enhancing accessibility and inclusivity for visually impaired
individuals. These advancements have greatly improved the lives of visually impaired
individuals, offering them greater independence and access to information. As AI continues to
advance, there is great potential for further enhancements in assistive technologies, ensuring that
visually impaired individuals have the tools and support they need to navigate the world
confidently and independently.
The proposed project addresses the lack of accessibility and independence for visually impaired
individuals in developing countries, hindering their ability to understand and navigate their
environment. These individuals face challenges that impede their daily activities, leading to
isolation, dependency, and limited access to resources.
Specifically, they struggle with limited awareness of individuals in their environment, resulting
in social barriers and communication difficulties. They face difficulty identifying objects and
distances, affecting their mobility and safety, and have restricted access to written materials like
books due to a lack of braille materials or assistive technologies.
In Ethiopia, an estimated 1.2 million people are visually impaired, according to the World Health
Organization. This population encounters numerous challenges daily, including navigating
crowded environments, accessing public transportation, participating in educational activities,
and engaging socially, due to the lack of accessible infrastructure, scarce resources for assistive
technologies, and societal stigma[4].
The lack of appropriate accommodations and support in educational settings can create
significant barriers for visually impaired individuals, limiting their ability to fully participate in
academic activities and access essential learning materials. Without access to braille materials,
adaptive technologies, or trained educators, visually impaired students may struggle to keep up
with their peers and face challenges in understanding and retaining information. This can result
in lower academic achievement, decreased self-esteem, and a sense of exclusion from the
educational environment.
2|Page
Furthermore, it is estimated that most visually impaired individuals in Ethiopia face significant
barriers to employment, leading to a substantial loss of opportunities. This impacts their
economic independence and overall quality of life, exacerbating poverty and social exclusion.
The proposed solution for the project aimed at assisting visually impaired individuals is a
cutting-edge system designed to provide real-time information about their surroundings. By
providing real-time information about their surroundings through sound cues, this innovative
system enables visually impaired individuals to effortlessly detect and recognize individuals in a
room, as well as understand the distance of each person from them.
Additionally, the system can identify objects in the environment and convey this information to
the user, specifying the type of object and its distance from the individual. Furthermore, the
project includes a feature that enables the system to read books aloud to the visually impaired
person, enhancing their access to written materials.
By combining these functionalities, the project aims to significantly improve the independence
and autonomy of visually impaired individuals, allowing them to navigate their surroundings
with greater ease and engage more fully in social interactions and educational activities. This
comprehensive solution represents a significant step towards creating a more inclusive and
accessible environment for visually impaired individuals.
1.3 Objective
3|Page
at a prototype level
To test functionality of the proposed system
1.4 Methodology of the Project
To ensure efficient performance, this project was meticulously organized and coordinated
through several key stages. First, societal issues were identified through observation and
investigation. A specific real-world problem impacting visually impaired individuals in a chosen
community was then selected for deeper analysis. Following the selection of a study area, a
comprehensive review of existing literature on visually impaired individuals and related
challenges was conducted. This research informed the project's scope definition, outlining its
goals and limitations. Additionally, a suitable methodology was established to guide the project's
execution. Next, the focus shifted to system design and development. This involved in-depth
consideration of how different components would be integrated to effectively address the
identified issue. A block diagram was created to visually represent this overall system integration.
Finally, based on the defined scope, methodology, and system design, the device itself was
designed, developed and deployed in compatible environments where it could be utilized
effectively.
Problem Identification Regarding Community
4|Page
1.5 Scope and Limitation of the project
5|Page
Cost-Effectiveness
Crucial in developing countries with limited resources for specialized technologies.
Ensures accessibility to assistive technologies for a larger segment of the visually
impaired population.
Reduces financial burdens on individuals and families.
Social Inclusion
Detects and recognizes individuals, providing information about proximity and actions.
Fosters meaningful social interactions.
Promotes mutual understanding and awareness.
Educational and Informational Access
Reads books and written materials aloud.
Enhances access to education and information regardless of geographical location or
economic status.
Promotes lifelong learning and personal development.
Opens doors to employment opportunities and active civic participation.
6|Page
CHAPTER TWO
LITRATURE REVIEW
2.1 Overview
Visual impairment presents significant challenges in the daily lives of affected individuals,
affecting various aspects such as social interaction, education, and mobility safety. While
traditional tools like white canes and guide dogs have been instrumental in improving
independence and mobility, they often fall short in providing the comprehensive support required
for seamless integration into society. These tools, while effective to some extent, do not fully
address the diverse and complex needs of visually impaired individuals in navigating the world
around them. Recent advancements in Artificial Intelligence (AI) have brought about a new era
of assistive technologies specifically designed to meet the unique needs of the visually impaired
community. AI-powered systems offer innovative solutions that go beyond the capabilities of
traditional tools. These systems leverage the power of AI algorithms to provide more
sophisticated functionalities, enhancing the overall quality of life for visually impaired
individuals.
One of the key areas where AI has made significant advancements is in facial recognition
technology. Early facial recognition systems were limited in their ability to accurately estimate
distances, which is essential for effective social interaction. However, recent developments have
integrated advanced algorithms with real-time distance estimation capabilities, enabling visually
impaired individuals to assess the distance to individuals and objects in their surroundings more
accurately. This enhancement in facial recognition technology has not only improved the
accuracy of identifying individuals but has also enhanced situational awareness, leading to richer
social connections and improved overall quality of life.
Furthermore, AI has revolutionized information access technologies for the visually impaired.
While Text-to-Speech and Optical Character Recognition (OCR) tools have existed for some
time, their effectiveness has been limited by their reliance on pre-existing digital content. This
constraint has hindered the real-time accessibility of printed text for visually impaired
individuals. However, recent projects have integrated OCR technology with real-time image
capture capabilities, allowing users to instantly read printed text in their surroundings without the
7|Page
need for prior digitization. Additionally, the integration of AI-based narration has further
enhanced the reading experience by providing contextually relevant information, improving
overall comprehension of the text, and overcoming the limitations of traditional methods. recent
advancements in AI have significantly improved assistive technologies for visually impaired
individuals. These advancements have not only addressed the limitations of traditional tools but
have also opened up new possibilities for enhancing independence, accessibility, and inclusivity
for the visually impaired community[5].
By combining advanced facial recognition algorithms with real-time distance estimation, these
systems can now not only identify individuals but also accurately assess the distance to them and
other objects in the surrounding environment. This integration has greatly enhanced situational
awareness for visually impaired individuals, allowing them to navigate social interactions more
effectively. Moreover, the ability to assess distances to both individuals and obstacles in the
environment has significantly improved safety and mobility for visually impaired individuals[7].
In practical terms, this advancement means that facial recognition technology can now provide
more comprehensive support to visually impaired individuals in various scenarios. For example,
in a social setting, the technology can help individuals identify and locate friends or
acquaintances accurately. In a more complex environment, such as a crowded street or public
transportation, the technology can assist individuals in navigating safely by detecting and
alerting them to obstacles in their path. the integration of advanced algorithms with real-time
distance estimation in facial recognition technology represents a significant step forward in
assistive technology for visually impaired individuals. It not only improves the accuracy of
8|Page
identifying individuals but also enhances situational awareness, fostering richer social
connections, and improving overall quality of life[8].
In the realm of information access technologies, significant progress has been made to improve
accessibility for visually impaired individuals. While Text-to-Speech (TTS) and Optical
Character Recognition (OCR) tools have been available, their effectiveness has been hindered by
their reliance on pre-existing digital content. This limitation has restricted real-time accessibility
for visually impaired individuals, as they often require printed text to be digitized before it can
be read aloud.to address this challenge, several projects have focused on integrating OCR
technology with real-time image capture capabilities. But our projects allows users to instantly
read printed text in their surroundings without the need for prior digitization. By capturing
images in real-time and processing them through OCR algorithms, these systems convert printed
text into a format that can be read aloud by a TTS engine. This integration not only provides a
more immediate reading experience but also enhances accessibility by eliminating the need for
pre-existing digital content[9].
Moreover, the integration of AI-based narration further enhances the reading experience for
visually impaired individuals. By providing contextually relevant information and improving
overall comprehension, AI-based narration ensures that the text is more easily understood and
accessible. This combination of real-time OCR and AI-based narration offers a more seamless
and accessible reading experience, effectively overcoming the limitations of traditional methods.
these advancements in information access technologies have significantly improved accessibility
for visually impaired individuals, providing them with greater independence and accessibility in
their daily lives[10].
9|Page
these systems can identify the sound of a car approaching or a doorbell ringing, alerting the user
to potential obstacles or events in their environment. This enhances the user's situational
awareness and safety[11].
AI-powered companions are also revolutionizing the field of assistive technologies. These
companions can provide emotional support and social interaction, reducing feelings of isolation
and improving overall well-being. They can engage users in conversations, provide reminders,
and assist with daily tasks, enhancing the user's quality of life. the potential impact of AI on
assistive technologies for visually impaired individuals is significant. AI-powered systems offer
comprehensive solutions that enhance accessibility and inclusivity, enabling visually impaired
individuals to live more independently and fulfillingly. Continued research and development in
AI integration hold promise for a future where assistive technologies empower all individuals,
regardless of visual impairment, to lead more fulfilling lives[12].
10 | P a g e
CHAPTER THREE
SYSTEM DESIGN METHODOLOGY
This chapter presented the design methodology of the proposed system. At a high level view, the
project incorporated two broad design sections, an AI and Electrical design section with their
respective subsections. As revealed on Figure 2, the AI section has four subsections (Data
collection and preparation, Data preprocessing, Modeling and Testing). Also, The Electrical
section consisted the hardware and software components in an integrated form.
11 | P a g e
3.1 AI Design Section of the Project
Testing Modeling
12 | P a g e
Captured by camera From online source From stored source
Trained Dataset
The data collection process involves identifying subjects via survey, setting up a controlled
environment, capturing images with varied conditions, manually annotating and organizing them,
applying data augmentation, and conducting a quality assurance review.
Table 1: Data Collection Process
Identify Subjects Survey and Selection Identify the various objects and individuals to
be included in the dataset through a
preliminary survey.
13 | P a g e
Setup Environment Controlled Environment Arrange a controlled environment with
Setup consistent lighting and background for
capturing images.
Vary Conditions Multiple Angles Take images from different angles to ensure
diversity in the dataset.
Annotate Images Manual Labeling and Label each image with relevant tags such as
Tagging "chair", "table", "person1", "person2" using
CVAT open data annotation platform.
Organize Dataset Structured Folder System Store the annotated images in a structured
folder system categorized by object type and
individual identity.
Review Quality Quality Assurance Check Conduct a thorough review of the dataset to
ensure high-quality and accurate annotations.
In the case of tabular data, a data set corresponds to one or more database tables, where every
column of a table represents a particular variable, and each row corresponds to a given record of
the data set in question. The data set lists values for each of the variables, such as the recognize
and identify of person, for each member of the data set. In this project, we have selected eight
different persons for detections.
14 | P a g e
Table 2: Face Detections
Number Persons
1 Yosef Delesa
2 Zelalem Kibiru
3 Tewodros Asfaw
4 Temesgen Gezahegn
5 Eng. Kris
6 Mr. Adane
7 Mr. Ashebir
8 Mr. Gadisa
We have sampled eight persons of face detection dataset. In general, there are seven (8) persons
of datasets labeled as “Mr. Adane”, “Mr. Ashebir”, “Mr. Kris”, “Mr. Gadisa”, “Temesgen”,
“Zelelem” , “Yosef” and “Tewodros”. We have collected 100 image datasets for all persons
before augmentation. Figures below are data set samples for face detection of persons
15 | P a g e
Figure 6: Image dataset of Mr.Adane
16 | P a g e
Figure 10: Image dataset of Tewodros
We selected eleven essential objects for this project. The selection process involved consulting
with the JIT campus and reviewing various research studies. The criteria for selection were based
on the presence of these objects in schools or workplaces. The list of these objects, which are
basic necessities for visually impaired individuals, is shown in the following table. Each attribute
has a value of 0 which means false and 1 which means true. The training data structure as shown
in table 3. In this study, there are 11 labels or lists of objects to be trained.
17 | P a g e
Table 3: Table of Training Data
Detection1 1 0 Object A
Detection2 0 1 Object B
Detection3 0 1 Object C
Detection N 1 1 Object N
Number Objects
1 Chair
2 Computer
3 Blackboard
4 Keyboard
5 Mouse
6 Door
7 Window
8 Wall
18 | P a g e
9 Birr
10 Stair
11 Person
Image data augmentation is a technique used to artificially expand the size of a training dataset
by creating modified versions of images in the dataset. It maximizes the dataset by applying
simple techniques like rotating and zooming images. Training deep learning neural network
models on more data can result in more skillful models, and the augmentation techniques can
create variations of the images that can improve the ability of the fit models to generalize what
they have learned to new images. The dataset was augmented in this project to improve accuracy,
as evidenced in Table 5.
19 | P a g e
Table 5: Data Augmentation Parameters
3.1.3 Modeling
During the modeling phase of the system, machine learning algorithms are utilized to perform
tasks such as identifying individual faces and detecting objects. This intricate process involves
training the model on meticulously curated datasets, enabling the algorithm to learn intricate
patterns and make accurate classifications. By consistently feeding the algorithm with such data,
it becomes adept at recognizing individuals and discerning objects within the visual field of a
visually impaired person, thereby enhancing their autonomy and safety.
Training the model on curated data is a pivotal step in ensuring its accuracy and reliability in
real-world applications. The systematic exposure to diverse datasets allows the algorithm to
generalize its learnings, thus improving its ability to identify and classify objects and faces
20 | P a g e
accurately. This iterative process of training and refining the model serves as the backbone for
developing robust and effective machine learning systems, especially in scenarios where precise
object detection and facial recognition are critical for user interactions and safety measures. The
data training phase is a critical component of the AI design section, where machine learning
algorithms are trained to perform task such as identifying individual faces and detecting objects.
This phase involves several key steps, each contributing to the accuracy and reliability of the
final model.
Most effective machine learning models for image processing use neural networks and deep
learning. Deep learning uses neural networks for solving complex tasks similarly to the way
human brain solves them. Different types of neural networks can be deployed for solving
different image processing tasks, from simple binary classification (whether an image does or
doesn’t match a specific criteria) to instance segmentation.
Convolutional Neural Networks (ConvNets or CNNs) are a class of deep learning networks that
were created specifically for image processing with AI. However, CNNs have been successfully
applied on various types of data, not only images. In these networks, neurons are organized and
connected similarly to how neurons are organized and connected in the human brain. In contrast
to other neural networks, CNNs require fewer preprocessing operations. Plus, instead of using
hand-engineered filters (despite being able to benefit from them), CNNs can learn the necessary
filters and characteristics during training. CNN are made up of node levels, each of which has an
input layer, one or more hidden layers and an output layer. Each node is connected to the others
and has a weight and threshold assigned to it. If a node’s output exceeds a certain threshold value,
the node is activated, and the data is sent to the next tier of the network.
21 | P a g e
Forward Pass of Convolutional Neural Network
By learning visual attributes from small squares of input data, convolution preserves the link
between pixels. A convolutional layer is made up of a set of learnable filters or kernels that serve
as the network’s weights. The neuron decides whether the information is transmitted based on or
without the aggregate total and activation function.
CNNs are a type of multilayer perceptron influenced by its structure or multilayer perceptron. In
contrast to MLPs, where each neuron has its own weight vector, neurons in CNNs share weights.
This means the weight at one neuron is input for the next neuron. As a result of this weight
sharing, the total number of trainable weights in is CNN reduced.
3.1.4 Testing
In the testing phase the concern is evaluating performance of the proposed system. By
comparing the model's predictions with the actual identification, the accuracy of the system can
be evaluated. Additionally, during testing, parameters such as thresholds, sensitivity settings, and
other variables are fine-tuned to optimize the performance of the model. This iterative process
helps ensure that the system can consistently and accurately identify obejects and individuals.
3.2 Electrical Design Section of the project
As presented under 3.2.1 to 3.2.3, this design section encompassed three subsections: Hardware
design, Software design and Prototype development.
22 | P a g e
Input Part
Processing Part
Output Part
23 | P a g e
Table 6: Hardware Components Used
Arduino Uno
Arduino Uno is a microcontroller board based on the ATmega328P. It is an open-source
electronics platform that allows users to easily create interactive electronic projects by using a
simple programming language and a variety of pre-built libraries. The Arduino Uno can be used
to control various devices, such as LEDs, motors, and sensors, by sending and receiving digital
and analog signal [13]. In this project it is used for controlling the servo.
24 | P a g e
Figure 16: Mobile camera
Speaker
The speaker in this project serves to audibly communicate crucial information to visually
impaired individuals, enhancing their system accessibility. By converting digital signals into
audio cues, it delivers real-time feedback and notifications, ensuring effective user interaction
and promoting inclusivity for visually impaired users, reflecting the project's accessibility
commitment.
Switch
A switch is a fundamental component in electrical and electronic circuits that controls the flow
of electricity by opening or closing a circuit. It is a mechanical device that allows users to easily
turn devices on or off, change the direction of current flow, or select between different circuits
[14].
25 | P a g e
initial phase involves the input part, primarily focused on image acquisition. This step entails capturing
images of the surrounding using the mobile camera, which serves as the primary source of visual data for
further analysis. Moving on to the processing part, this stage is pivotal in the system's functionality as it
involves retrieving the trained model for image identification. The AI algorithm detects and recognizes
face of individuals and it also detects the objects within the surrounding environment. Finally, in the
output part, speaker is used to audibly convey information to visually impaired individuals.. This
systematic approach in software design ensures efficient processing.
Input Part
Image acquisition
Processing Part
Image Model
Identification retrieval
on
Output Part
Audio output
26 | P a g e
Start
NO
NO Is the image
an object?
YES
Object detect
Audio output
ON
Switch
OFF
27 | P a g e
System flow chart for the Text-To-Speech Device
The system flow chart involves a user starting the process by inserting a paper image or book
into the device and turns on the switch. The device then captures the image and uses an image
processing algorithm to extract text from it. The extracted text is converted to audio using a text-
to-speech engine, which is played through a speaker or headphones. The device checks if the
switch is on or off and if the switch is on continuous the process if not it shuts down the system.
Start
ON
Switch
OFF
End
28 | P a g e
3.3.2.2 Software Requirements
Software tools used in the project are the followings:
Google Collab online code editor
Arduino IDE
Python IDE
Coding language: Arduino programming (C & C++) and Python
TensorFlow
Kera’s
Visual Studio Code Editor
OpenCV
Pickle
Google Colab
Collaboratory, or “Colab” for short, is a product from Google Research. Colab allows anybody
to write and execute arbitrary python code through the browser and is especially well suited to
machine learning, data analysis, and education. More technically, Colab is a hosted Jupyter
notebook service that requires no setup to use, while providing access free of charge to
computing resources including GPUs [15]. In this project, we have used Colab to train our model
online with free GPU to get more speed. Also, we choose this platform to overcome the
challenges that may come from installing libraries on python supporting IDEs for offline use.
The Python IDE is a specialized Integrated Development Environment utilized for computer
programming, particularly designed for the Python programming language. This software is
29 | P a g e
developed and maintained by JetBrains, a Czech-based company previously recognized as
IntelliJ. Offering an extensive array of capabilities, it encompasses sophisticated features such as
code analysis, a comprehensive graphical debugger, seamless integration with unit testing
frameworks, as well as cohesive support for version control systems (VCSs). Moreover, it
provides robust compatibility for web development using frameworks like Django and
proficiently supports data science endeavors through seamless integration with Anaconda (as
documented in [16]). Within the scope of this project, the Python IDE has been instrumental for
conducting offline project demonstrations and composing Python code tailored for serial
communication tasks. Notably, the demonstration unfolds through two distinct pathways: firstly,
via the provision of image paths and secondly, by real-time image capture through webcam
integration.
Arduino IDE
Arduino integrated development environment (IDE) is a cross-platform application written in the
programming language Java. It is originated from the IDE for the languages Processing and
Wiring. It provides a simple one-click mechanism to compile and load programs to an Arduino
board [17]. In this project, we have used Arduino IDE to write an Arduino code to get the label
sent from Python through serial communication and give the direction of rotation of the servo
motor and also sensing the touch of the people from touch sensor.
30 | P a g e
TensorFlow
TensorFlow is an open-source end-to-end platform for creating Machine Learning applications.
It is a symbolic math library that uses dataflow and differentiable programming to perform
various tasks focused on training and inference of Vision assistance for visually impaired
individual Using Intelligent System. The TensorFlow software handles data sets that are arrayed
as computational nodes in graph form. The edges that connect the nodes in a graph can represent
multidimensional vectors or matrices, creating what are known as tensors. Because TensorFlow
programs use a data flow architecture that works with generalized intermediate results of the
computations. TensorFlow platform is used in this project for the model development using
deep learning algorithm CNN.
Kera’s
Kera’s Is an open-source software library that provides a Python interface for artificial neural
networks. Keras acts as an interface for the TensorFlow library. It is designed to enable fast
experimentation with deep neural networks, it focuses on being user-friendly, modular, and
extensible. Keras contains numerous implementations of commonly used neural-network
building blocks such as layers, objectives, activation functions, optimizers, and a host of tools to
make working with image and text data easier to simplify the coding necessary for writing a code.
In addition to standard neural networks, Keras has support for convolutional and recurrent neural
networks. It supports other common utility layers like dropout, batch normalization, and pooling.
Keras allows users to productize deep models on smartphones (iOS and Android), on the web, or
on the Java Virtual Machine. Keras library is used in this project for model development using
deep learning algorithm CNN.
31 | P a g e
Figure 26: Kera’s
OpenCV
OpenCV, known as the Open Source Computer Vision Library, serves as a pivotal tool in our
development process for its robust capabilities in computer vision and machine learning tasks.
By leveraging OpenCV, we seamlessly integrate functionalities such as live video capture, image
preprocessing, and GUI display into our applications. This organized approach not only enhances
the visual aspects of our projects but also streamlines the implementation of complex computer
vision algorithms. The structured nature of utilizing OpenCV ensures a formal framework for
handling visual data and contributes to the overall organization of our Python development
workflow[18].
32 | P a g e
Figure 28: OpenCV
Pickle
The utilization of the Pickle module in our development process exemplifies a formal and
organized approach to managing Python object structures. By leveraging Pickle for serializing
and deserializing face encodings, we establish a systematic method for efficiently storing and
retrieving crucial facial recognition data. This structured implementation not only enhances the
organization of our codebase but also streamlines the handling of complex object structures,
contributing to the overall robustness and scalability of our applications. The formal integration
of Pickle underscores our commitment to implementing best practices in data management and
reinforces the reliability of our facial recognition system.
33 | P a g e
Software Part Hardware Part
System Testing
System Testing
Hardware-Software Interface
The Hardware-Software Interface involves establishing communication between the physical
hardware components (Camera, speaker, Switch, Arduino) and the software that controls and
processes data. This interface enables the transfer of information between the hardware and
software components, allowing for seamless operation of the system. Processing the data in the
34 | P a g e
software to identify the images and sending processed data from the software to speaker to tell
the visually impaired person about the environment.
System Testing
System Testing is a comprehensive evaluation phase that occurs after the integration of hardware
and software components. The primary objective is to validate that the integrated system
operates harmoniously and meets the specified requirements. This testing phase encompasses
various methodologies, including functional testing to verify individual functions, performance
testing to assess system response under different conditions, and stress testing to evaluate the
system's stability under extreme loads. System Testing aims to identify and rectify any issues
arising from the collaboration between hardware and software.
Deployment
Deployment marks the transition from the development and testing phases to the operational
stage. After successful testing and validation, the system is deemed ready for deployment in its
target environment. This involves the physical installation of the hardware and software
components in the designated setting. Configuration settings are adjusted to align with the
specific requirements of the operational environment. The deployment phase is a critical step in
realizing the intended benefits of the system, as it transitions from a controlled testing
environment to real-world use.
3.3.3.2 Prototype of the proposed system
The prototype of the machine is completely implemented. It is shown in figure below.
35 | P a g e
3.3.3.3 Working Principle of the prototype
Working principle of object or person detection and recognition
The project for assisting visually impaired individuals functions through a combination of
advanced computer vision, machine learning, and auditory feedback systems. When a visually
impaired person enters a room, the system utilizes cameras to capture the surroundings. The
captured images are processed in real-time using machine learning algorithms to detect and
recognize individuals and objects within the environment. The system identifies each person and
object, determines their distance from the user using depth sensors, and converts this information
into audio feedback. The user receives auditory notifications about the identities of individuals,
their relative distances and the types of objects present along with their locations. This
continuous auditory feedback enables the visually impaired person to navigate and interact with
their environment more effectively and independently.
Distance measurement
Model
Audio Output
36 | P a g e
Working principle of Text-To-Speech Device
The working principle of the proposed project idea involves a systematic flow of key stages that
facilitate the accessibility of printed text for visually impaired individuals. The process begins
with the image capture stage, where the device utilizes a camera to capture images of the printed
material. These captured images are then subjected to image processing techniques, specifically
OCR, which enables the extraction of text from the images. Subsequently, the extracted text
undergoes conversion into audio format through a TTS engine. This conversion process ensures
that the visually impaired individuals can listen to the content of the printed material rather than
relying on visual cues. Finally, the converted audio output is played back through a speaker or
headphones, providing a seamless and accessible means for visually impaired individuals to
engage with printed text. By following this organized and detailed working principle, the system
effectively enhances independence and inclusivity for visually impaired individuals in various
reading scenarios.
37 | P a g e
CHAPTER FOUR
RESULTS AND DISCUSSION
4.1 Introduction
We have designed, developed, and implemented Machine Vision-Based Assistance System for
Visually Impaired Individuals. The project required a combination of both software and
hardware work, with the software portion consisting of an intelligent system used to identify
objects and individuals. We utilized Google Colab to train our model, tweaking different
parameters such as "Epoch" and "Batch Size" to achieve the most accurate and reliable results
possible by suing data we have in hand.
4.2 Classification Accuracy with machine vision
In the realm of machine vision, the code showcases the development of a CNN that excels in
image classification tasks. Specifically, it demonstrates high accuracy in identifying images.To
evaluate performance, accuracy serves as the primary metric, measuring the proportion of
correctly classified images. Key hyper parameters and training details include 50 epochs of
training to balance optimization with overfitting prevention, a batch size of 4 for efficient model
updates, and image resizing to 256x256 pixels for computational efficiency. Factors contributing
to the model's accuracy include: thoughtful data preprocessing, data augmentation techniques to
enhance generalization, a well-structured CNN architecture for feature extraction, the Adam
optimizer for efficient weight updates, and sparse categorical crossentropy loss for multi-class
classification. In conclusion, the code exemplifies the effectiveness of CNNs in achieving high
classification accuracy in machine vision applications. Its results underscore the potential of
CNNs in various image-based tasks, inviting further exploration and optimization.
Table 7: Hyper-Parameters
No Hyper-parameter Value
1 Input Size 118x118
2 Batch Size 4
3 Epoch 50
4 Channel 3
38 | P a g e
Figure 34: Accuracy and loss
39 | P a g e
The training of the CNN model reveals an outstanding level of performance, underscoring its
remarkable efficacy. Throughout the training process, the model consistently demonstrates high
accuracy and minimal loss, showcasing its ability to effectively learn and generalize from the
training data to the validation set. The progressive increase in accuracy over the epochs is
particularly noteworthy. This improvement reflects the model's capacity to continuously enhance
its ability to correctly classify images in the validation set. Notably, the model achieves a
validation accuracy exceeding 98% in several instances, a remarkable feat that underscores its
proficiency in making accurate predictions. This high level of accuracy indicates that the model
is not only learning effectively but also generalizing well to new, unseen data.
Moreover, the consistently low validation loss throughout the training process is indicative of the
model's precision and reliability. Low loss values suggest that the model makes predictions with
minimal errors, further highlighting its robustness in handling diverse and complex data. This
reliability is crucial for applications requiring high precision and low error rates.
The high accuracy and low loss demonstrated by the CNN model make it an ideal candidate for
applications designed to assist visually impaired individuals. With its exceptional performance,
the model can effectively differentiate between various types of individuals and objects. This
capability ensures that only the highest-quality images are selected for further processing.
The model's high accuracy means it can reliably identify individuals and objects, a crucial
feature for aiding visually impaired individuals. Additionally, the model's low loss indicates its
ability to make predictions with minimal errors, thereby reducing the likelihood of
misclassification. This precision and reliability in making accurate identifications contribute to
more efficient and effective identification processes. Its ability to accurately and reliably classify
images can lead to significant improvements in the quality of assistance provided to visually
impaired individuals, ensuring better outcomes and enhanced support.
40 | P a g e
4.3 Test Performed
41 | P a g e
Figure 36: Result from testing the system
We have tested the model first by splitting 20% of the dataset for testing. The result we have got
from this was very good. As depicted in the figure, it can be observed that the test accuracy is
exceptionally high, demonstrating its remarkable ability to classify with utmost precision.Upon
careful examination of the accompanying figure, it becomes evident that the test accuracy
exhibits a remarkably high level. This remarkable accuracy enables the classification process to
be conducted with exceptional precision and accuracy, nearly approaching perfection.
42 | P a g e
CHAPTER FIVE
CONCLUSION AND RECOMMENDATION
5.1 Conclusion
In addressing the challenges faced by visually impaired individuals, our project represents a
significant step towards enhancing their independence, safety, and inclusion in society. By
leveraging cutting-edge technology, our system provides real-time auditory feedback to visually
impaired individuals, offering insights into their surroundings, including individual recognition,
and object detection. This problem solving solution not only bridges the gap between perception
and information but also empowers users to navigate diverse environments with confidence and
autonomy.
The benefits of our project extend beyond individual users to encompass broader societal
impacts. The cost-effective nature of the technology opens doors for widespread adoption,
ensuring affordability and accessibility for visually impaired individuals across diverse
socioeconomic backgrounds and geographic locations. Moreover, by promoting independence
and reducing reliance on external assistance, our project contributes to greater societal inclusion
and equal opportunities for visually impaired individuals in various facets of life, from education
to employment and social interactions.
While the project successfully addresses key challenges faced by visually impaired individuals,
there are areas that remain unexplored due to time limitations and material costs. Future work
could focus on expanding the functionalities of the project, improving accuracy and efficiency,
and incorporating advanced features to further enhance the user experience.
Despite the constraints of time and cost, the project team strived to overcome these challenges by
optimizing resources and focusing on delivering a practical solution within the given timeframe.
The project underscores the importance of problem solving and dedication in creating inclusive
solutions for individuals with disabilities and serves as a stepping stone for future developments
in assistive technology for visually impaired individuals.
5.2 Recommendation
We highly recommend that to focus on enhancing the accuracy and precision of the system
within the constraints of limited resources. Despite the challenges of having a small budget that
restricts the acquisition of advanced technologies such as sensors and processors, there are
43 | P a g e
several ways to optimize the project's functionalities. Here are some recommendations to
consider:
44 | P a g e
REFERENCE
[1] Rizzo, John-Ross, et al. "The global crisis of visual impairment: an emerging global health
priority requiring urgent action." Disability and Rehabilitation: Assistive Technology 18.3 (2023):
240-245.
[3] Bustos-López, Maritza, et al. "Emotion Detection in Learning Environments Using Facial
Expressions: A Brief Review." Handbook on Decision Making: Volume 3: Trends and
Challenges in Intelligent Decision Support Systems (2022): 349-372.
[4] Berhane, Yemane, et al. "National survey on blindness, low vision and trachoma in Ethiopia:
methods and study clusters profile." Ethiopian Journal of Health Development 21.3 (2007): 185-
203.
[5] Zamir, Muhammad Farid, et al. "Smart reader for visually impaired people based on Optical
Character Recognition." Intelligent Technologies and Applications: Second International
Conference, INTAP 2019, Bahawalpur, Pakistan, November 6–8, 2019, Revised Selected Papers
2. Springer Singapore, 2020.
[6] Nagarajan, R., Sainarayanan, G., Yaacob, S. and Porle, R.R., 2004. Object Identification and
Colour Recognition for Human Blind. In ICVGIP (pp. 210-215).
[7] Pu, Ying-Hung, et al. "Aerial face recognition and absolute distance estimation using drone
and deep learning." The Journal of Supercomputing (2022): 1-21.
[8] Rahman, Md Atikur, and Muhammad Sheikh Sadi. "IoT enabled automated object
recognition for the visually impaired." Computer methods and programs in biomedicine update 1
(2021): 100015.
[9] Sarma, Minerva, et al. "Development of a Text-to-Speech Scanner for Visually Impaired
People." Design and Development of Affordable Healthcare Technologies. IGI Global, 2018.
218-238.
[10] Zamir, Muhammad Farid, et al. "Smart reader for visually impaired people based on Optical
45 | P a g e
Character Recognition." Intelligent Technologies and Applications: Second International
Conference, INTAP 2019, Bahawalpur, Pakistan, November 6–8, 2019, Revised Selected Papers
2. Springer Singapore, 2020.
[11] Shanker, Amit, and Ravi Kant. "Assistive technologies for visually impaired: Exploring the
barriers in inclusion." Research Highlights 8.3 (2021): 70.
[12] Smith, Emma M., et al. "Artificial intelligence and assistive technology: risks, rewards,
challenges, and opportunities." Assistive Technology 35.5 (2023): 375-377.
46 | P a g e
APPENDIX
Appendix A: face recognition and distance estimation
import cv2
import numpy as np
import pyttsx3
KNOWN_DISTANCE = 76.2 # centimeter
KNOWN_WIDTH = 14.3 # centimeter
WHITE = (255, 255, 255)
recognizer = cv2.face.LBPHFaceRecognizer_create()
recognizer.read('trainer/trainer.yml')
#iniciate id counter
id = 0
cam = cv2.VideoCapture(0)#'http://10.180.16.114:8080/video'
cam.set(3, 640) # set video widht
def focal_length(measured_distance, real_width, width_in_rf_image):
focal_length_value = (width_in_rf_image * measured_distance) / real_width
return focal_length_value
47 | P a g e
minNeighbors = 5,
# minSize = (int(minW), int(minH)),
)
if (confidence < 100):
id = name_list[id]
confidence = " {0}%".format(round(100 - confidence))
else:
id = "unknown"
confidence = " {0}%".format(round(100 - confidence))
cv2.putText(img, str(id), (x+5,y-5), font, 1, (255,255,255), 2)
cv2.putText(img, str(confidence), (x+5,y+h-5), font, 1, (255,255,0), 1)
if face_width_in_frame != 0:
Distance = distance_finder(focal_length_found, KNOWN_WIDTH, face_width_in_frame)
cv2.imshow('camera',img)
k = cv2.waitKey(10) & 0xff
if k == 27:
print("\n [INFO] Exiting Program")
cam.release()
cv2.destroyAllWindows()
Appendix B : object detection and distance estimation
import argparse
import csv
import os
import platform
import sys
from pathlib import Path
import torch
if str(ROOT) not in sys.path:
sys.path.append(str(ROOT)) # add ROOT to PATH
ROOT = Path(os.path.relpath(ROOT, Path.cwd())) # relative
from ultralytics.utils.plotting import Annotator, colors, save_one_box
from models.common import DetectMultiBackend
if is_url and is_file:
source = check_file(source)
48 | P a g e
save_dir = increment_path(Path(project) / name, exist_ok=exist_ok) # increment run
(save_dir / "labels" if save_txt else save_dir).mkdir(parents=True, exist_ok=True) # make dir
device = select_device(device)
model = DetectMultiBackend(weights, device=device, dnn=dnn, data=data, fp16=half)
stride, names, pt = model.stride, model.names, model.pt
bs = 1 # batch_size
if webcam:
view_img = check_imshow(warn=True)
dataset = LoadStreams(source, img_size=imgsz, stride=stride, auto=pt, vid_stride=vid_stride)
bs = len(dataset)
elif screenshot:
dataset = LoadScreenshots(source, img_size=imgsz, stride=stride, auto=pt)
else:
dataset = LoadImages(source, img_size=imgsz, stride=stride, auto=pt, vid_stride=vid_stride)
vid_path, vid_writer = [None] * bs, [None] * bs
model.warmup(imgsz=(1 if pt or model.triton else bs, 3, *imgsz)) # warmup
seen, windows, dt = 0, [], (Profile(device=device), Profile(device=device), Profile(device=device))
for path, im, im0s, vid_cap, s in dataset:
with dt[0]:
im = torch.from_numpy(im).to(model.device)
im = im.half() if model.fp16 else im.float() # uint8 to fp16/32
im /= 255 # 0 - 255 to 0.0 - 1.0
if len(im.shape) == 3:
im = im[None] # expand for batch dim
if model.xml and im.shape[0] > 1:
ims = torch.chunk(im, im.shape[0], 0)
# Inference
with dt[1]:
visualize = increment_path(save_dir / Path(path).stem, mkdir=True) if visualize else False
if model.xml and im.shape[0] > 1:
pred = None
for image in ims:
csv_path = save_dir / "predictions.csv" print_args(vars(opt))
49 | P a g e
return opt
def main(opt):
"""Executes YOLOv5 model inference with given options, checking requirements before running the
model."""
check_requirements(ROOT / "requirements.txt", exclude=("tensorboard", "thop"))
run(**vars(opt))
if __name__ == "__main__":
opt = parse_opt()
main(opt)
Appendix C: Book reading
import os
import cv2
import pytesseract
from pygame import mixer
from gtts import gTTS
from playsound import playsound
import matplotlib.pyplot as plt
pytesseract.pytesseract.tesseract_cmd = 'C:\\Program Files\\Tesseract-OCR\\tesseract.exe'
answer = "y"
mixer.init()
while answer.lower() in ["y", "yes"]
video = cv2.VideoCapture("https://192.168.0.43:8080/video")
video.set(3, 640)
video.set(4, 480)
if video.isOpened():
if video.isOpened():
check, frame = video.read()
if check:
cv2.imwrite("frame.jpg", frame)
video.release()
data = pytesseract.image_to_data("frame.jpg")
if z != 0:
if len(a) == 12:
50 | P a g e
x, y = int(a[6]), int(a[7])
w, h = int(a[8]), int(a[9])
cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 0, 255), 2)
cv2.putText(frame, a[11], (x - 15, y), cv2.FONT_HERSHEY_PLAIN, 2, (0, 0, 255), 1)
filewrite.write(a[11] + " ")
filewrite.close()
fileread = open("String.txt", "r")
language = 'en'
line = fileread.read()
fileread.close()
if line != "":
speech = gTTS(text=line, lang=language, slow=False)
speech.save("test.mp3")
mixer.music.load("test.mp3")
mixer.music.set_volume(0.7)
mixer.music.play()
while True:
51 | P a g e
Appendix D: Arduino code
#include <Servo.h>
#include <Wire.h>
#include <Firmata.h>
SerialFirmata serialFeature;
int analogInputsToReport = 0;
unsigned long currentMillis;
unsigned long previousMillis;
struct i2c_device_info {
byte addr;
int reg;
byte bytes;
byte stopTX;
};
Servo servos[MAX_SERVOS];
byte servoPinMap[TOTAL_PINS];
byte detachedServos[MAX_SERVOS];
byte detachedServoCount = 0;
byte servoCount = 0;
void sysexCallback(byte, byte, byte*);
{
#if ARDUINO >= 100
Wire.write((byte)data);
#else
Wire.send(data);
#endif
}
byte wireRead(void)
{
#if ARDUINO >= 100
return Wire.read();
#else
return Wire.receive();
#endif
52 | P a g e
}
if (IS_PIN_DIGITAL(pin)) {
Firmata.write(PIN_MODE_SERVO);
Firmata.write(14);
}
void systemResetCallback()
{
isResetting = true;
#ifdef FIRMATA_SERIAL_FEATURE
serialFeature.reset();
#endif
if (isI2CEnabled) {
disableI2CPins();
}
for (byte i = 0; i < TOTAL_PORTS; i++) {
reportPINs[i] = false; // by default, reporting off
portConfigInputs[i] = 0; // until activated
previousPINs[i] = 0;
}
for (byte i = 0; i < TOTAL_PINS; i++) {
if (IS_PIN_ANALOG(i)) {
setPinModeCallback(i, PIN_MODE_ANALOG);
} else if (IS_PIN_DIGITAL(i)) {
setPinModeCallback(i, OUTPUT);
}
servoPinMap[i] = 255;
}
for (byte i=0; i < TOTAL_PORTS; i++) {
outputPort(i, readPort(i, portConfigInputs[i]), true);
}
isResetting = false;
serialFeature.update();
#endif
}
53 | P a g e