0% found this document useful (0 votes)
80 views63 pages

New Final Year Project

The document presents a thesis on a Machine Vision-Based Assistance System aimed at aiding visually impaired individuals, developed by a group of students at Jimma University. The project addresses significant challenges faced by this population, such as navigating environments and accessing written materials, particularly in developing countries. By leveraging AI and computer vision technologies, the system enhances independence and social inclusion for visually impaired individuals through real-time object recognition and text-to-speech capabilities.

Uploaded by

jarageda9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
80 views63 pages

New Final Year Project

The document presents a thesis on a Machine Vision-Based Assistance System aimed at aiding visually impaired individuals, developed by a group of students at Jimma University. The project addresses significant challenges faced by this population, such as navigating environments and accessing written materials, particularly in developing countries. By leveraging AI and computer vision technologies, the system enhances independence and social inclusion for visually impaired individuals through real-time object recognition and text-to-speech capabilities.

Uploaded by

jarageda9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 63

Jimma University

Jimma Institute of Technology


Faculty of Electrical And Computer Engineering
Title: Machine Vision-Based Assistance System for Visually
Impaired Individuals
By:
Group members ID Number
1. Yosef Delesa Ru 0243/12
2. Tewodros Asfaw Ru 0031/12
3. Zelalem Kibru Ru 1724/12
4. Temesgen Gezahegn Ru 2092/12

Advisor: Mr. Sileshi A.


A Thesis Submitted to Jimma University, Jimma Institute of Technology, Faculty of Electrical
and Computer Engineering In partial fulfillment of the Requirement for a Degree of Bachelor of
Science in Electrical and Computer Engineering (Computer Stream)

Submission date: May 31, 2024


Jimma, Ethiopia
DECLARATION
The project “Machine Vision-Based Assistance System for Visually Impaired Individuals” is
based on our own work carried out under the supervision and advice of Mr. Sileshi A. We affirm
that the assertions stated and the conclusions reached are the result of our project work. We
further state that the work in the report is original and that it was completed by us. There has
been no submission of the work to any other institution for any other diploma, degree or
certificate. We have provided acknowledgment to other sources whenever we have used
material from them in the text of the report, as well as providing their contact information in the
reference section.

Name ID Email Signature


1. Yosef Delesa Ru 0243/12 [email protected] ________
2. Tewodros Asfaw Ru 0031/12 [email protected] ________
3. Zelalem Kibru Ru 1724/12 [email protected] ________
4. Temesgen Gezahegn Ru 2092/12 [email protected] ________

I have read the paper, and in my opinion, it is fully adequate in the scope and quality of Jimma
Institute of Technology Faculty of Electrical and Computer engineering.
Advisor Signature
Mr. Sileshi A. ___________

I
ACKLOWLEDGEMENT
First and foremost, we would like to express our gratitude to the almighty God for keeping us
safe and providing us with the opportunity to arrive here after overcoming various hurdles along
the way. The joy that accompanies the successful completion of any endeavor would be
incomplete without the unwavering cooperation of the individuals whose support made it
possible, and whose constant guidance and encouragement crowned all efforts with success.
Special thanks for Mr. Sileshi A, our project advisor, for his guidance, inspiration, and
constructive suggestions, all of which significantly contributed to the project's success.

We also extend our gratitude to other individuals who supported the execution of this project.
Their friendship, empathy, and great sense of humor have been invaluable. We are deeply
humbled and grateful to acknowledge our indebtedness to all those who have assisted us in
developing these ideas. The successful completion of any endeavor, at any level, is impossible
without the support and guidance of our parents and friends. Therefore, we express our heartfelt
thanks to our friends and parents for their assistance in gathering information, providing financial
support, collecting data, and guiding us throughout the completion of this project.

II
TABLE OF CONTENT
DECLARATION ............................................................................................................................. I

ACKLOWLEDGEMENT .............................................................................................................. II

LIST OF FIGURE .......................................................................................................................... V

ACRONYMS ............................................................................................................................. VIII

EXECUTIVE SUMMARY ...........................................................................................................IX

CHAPTER ONE ..............................................................................................................................1

INTRODUCTION ...........................................................................................................................1

1.1 Background ............................................................................................................................1

1.2 Problem Identified and Solution Proposed ............................................................................ 2

1.2.1 Problem Identified ...........................................................................................................2

1.2.2 Solution Proposed ........................................................................................................... 3

1.3 Objective ................................................................................................................................3

1.3.1 General Objective ........................................................................................................... 3

1.3.2 Specific Objectives ......................................................................................................... 3

1.4 Methodology of the Project ................................................................................................... 4

1.5 Scope and Limitation of the project .......................................................................................5

1.5.1 Scope of the Project ........................................................................................................ 5

1.5.2 Limitation of the Project ................................................................................................. 5

1.6 Significance of the Project .....................................................................................................5

CHAPTER TWO .............................................................................................................................7

LITRATURE REVIEW .................................................................................................................. 7

2.1 Overview ................................................................................................................................7

2.2 Existing Systems ....................................................................................................................8

CHAPTER THREE .......................................................................................................................11

III
SYSTEM DESIGN METHODOLOGY ....................................................................................... 11

3.1 AI Design Section of the Project ......................................................................................... 12

3.1.1 Dataset Collecting and Preparation ...............................................................................12

3.1.2 Data Pre-processing ...................................................................................................... 19

3.1.3 Modeling ....................................................................................................................... 20

3.1.4 Testing ...........................................................................................................................22

3.2 Electrical Design Section of the project ............................................................................. 22

3.2.1 Hardware Design ...........................................................................................................22

3.2.2 Software Design ............................................................................................................25

3.3.3 Prototype Development .................................................................................................33

CHAPTER FOUR ......................................................................................................................... 38

RESULTS AND DISCUSSION ................................................................................................... 38

4.1 Introduction ..........................................................................................................................38

4.2 Classification Accuracy with machine vision ......................................................................38

4.3 Test Performed .....................................................................................................................41

CHAPTER FIVE ...........................................................................................................................43

CONCLUSION AND RECOMMENDATION ............................................................................ 43

5.1 Conclusion ........................................................................................................................... 43

5.2 Recommendation ................................................................................................................ 43

REFERENCE ................................................................................................................................ 45

APPENDIX ................................................................................................................................... 47

IV
LIST OF FIGURE
Figure 1 : Methodology of the Project ............................................................................................ 4
Figure 2 : Block Diagram of the proposed system .......................................................................11
Figure 3 : Block diagram of an AI design section .......................................................................12
Figure 4 : Image Source Resources .............................................................................................13
Figure 5 : Image dataset of Mr. Ashebir ....................................................................................... 15
Figure 6 : Image dataset of Mr.Adane ........................................................................................16
Figure 7 : Image dataset of Eng. Kris ............................................................................................16
Figure 8 : Image dataset of Mr. Gadisa ........................................................................................ 16
Figure 9 : Image dataset of Zelalem ..............................................................................................16
Figure 10 : Image dataset of Tewodros ......................................................................................... 17
Figure 11 : Image dataset of Temesgen ........................................................................................17
Figure 12 : Image dataset of Yosef ............................................................................................. 17
Figure 13 : CNN Architecture ....................................................................................................... 21
Figure 14 : Hardware Design ........................................................................................................ 23
Figure 15 : Arduino Uno ............................................................................................................... 24
Figure 16 : Mobile camera ............................................................................................................ 25
Figure 17 : Speaker ........................................................................................................................25
Figure 18 : Switch ......................................................................................................................... 25
Figure 19 : Software Design ..........................................................................................................26
Figure 20 : Flow Chart for object or person detection .................................................................. 27
Figure 21 : Flow Chart for Text-To-Speech Device ..................................................................... 28
Figure 22 : Google Colab .............................................................................................................. 29
Figure 23 : Python IDE ................................................................................................................. 30
Figure 24 : Arduino IDE ............................................................................................................... 30
Figure 25 : TensorFlow ................................................................................................................. 31
Figure 26 : Kera’s ..........................................................................................................................32
Figure 27 : Visual Studio Code Editor .......................................................................................... 32
Figure 28 : OpenCV ...................................................................................................................... 33
Figure 29 : Pickle ......................................................................................................................... 33
Figure 30 : Integration Part ........................................................................................................... 34

V
Figure 31 : The Prototype of Text-To-Speech Device ................................................................. 35
Figure 32 : Working Principle of object or face detection ............................................................ 36
Figure 33 : Working Principle of the Text-To-Speech Device ..................................................... 37
Figure 34 : Accuracy and loss ....................................................................................................... 39
Figure 36 : Result from testing the system ....................................................................................42

VI
LIST OF TABLES
Table 1 : Data Collection Process ................................................................................................. 13
Table 2 : Face Detections .............................................................................................................. 15
Table 3 : Table of Training Data ................................................................................................... 18
Table 4 : Selected Objects ............................................................................................................. 18
Table 5 : Data Augmentation Parameters ......................................................................................20
Table 6 : Hardware Components Used .........................................................................................24
Table 7 : Hyper-Parameters ...........................................................................................................38

VII
ACRONYMS

AI Artificial Intelligence
CNN Convolutional Neural Network
GPU Graphics Processing Unit
GUI Graphical User Interface
IDE Integrated Development Environment
LMICs Low- and middle-income countries
MLP Multilayer Perceptron
OCR Optical Character Recognition
OpenCV Open Source Computer Vision Library
TF TensorFlow
TTS Text-to-Speech (already defined)
USB Universal Serial Bus
VCS Version Control Systems
VS Code Visual Studio Code
WHO World Health Organisation
mAP mean Average Precision
YOLOv8 You Only Look Once version 8

VIII
EXECUTIVE SUMMARY
The challenges faced by visually impaired individuals in navigating their environments,
recognizing people, understanding object types and distances, and accessing written materials
pose significant barriers to their independence and quality of life, especially in developing
countries with limited resources for specialized technologies and services. Visually impaired
individuals face challenges in identifying people, objects, and distances in a room, which can
hinder their daily activities and interactions. To address this issue, the proposed project aims to
provide a comprehensive solution by developing a system that can detect and recognize
individuals in a room, provide distance information, identify objects, and even read books aloud
for visually impaired individuals. By leveraging computer vision techniques, face recognition
algorithms, distance estimation technologies, and text-to-speech capabilities, the project
empowers visually impaired individuals to navigate their surroundings with greater ease and
confidence. The benefits of this project extend to society as a whole, especially in developing
countries where resources for visually impaired individuals may be limited. By enhancing the
independence and safety of visually impaired individuals, this project contributes to creating a
more inclusive and accessible environment for all individuals, regardless of their visual abilities.
By bridging the gap between perception and information, our project enhances the independence,
safety and inclusion of visually impaired individuals across various settings, including
educational institutions, workplaces, and public spaces. Through collaboration with local
communities and organizations, we aim to deploy scalable and cost-effective solutions that
significantly improve access to education, employment opportunities, and social interactions for
this marginalized population, promoting inclusivity and dignity for all members of society. This
project developed a precise Machine Vision-Based Assistance System for visually impaired
individuals, using CNNs and optimized hyperparameters for real-time identification. Testing
verified exceptional accuracy, guaranteeing effectiveness in diverse settings, enhancing
independence, safety, and inclusion by providing real-time information for social interactions,
mobility, and accessing written materials, thereby promoting a more inclusive environment.
Keywords: AI, Face Recognition, Computer vision, Image Processing, Object recognition,
Optical Character Recognition.

IX
CHAPTER ONE
INTRODUCTION
1.1 Background

Visual impairment poses significant challenges in the daily lives of affected individuals,
impacting various aspects such as social interaction, education, and mobility safety. The inability
to see clearly affects their ability to engage socially, access education materials, and navigate
their environment safely. While traditional tools like white canes and guide dogs have been
instrumental in enhancing independence and mobility, they often fall short in providing the
comprehensive support necessary for seamless integration into society. These tools, while
beneficial, do not fully address the diverse and complex needs of visually impaired individuals.

The prevalence of visual impairment is staggering, with the World Health Organisation (WHO)
reporting a massive increase from 733 million people worldwide in 2010 to 2.2 billion in late
2019 [1]. This increase is attributed to various factors, including insufficient access to healthcare
and rehabilitative support services, especially in low- and middle-income countries (LMICs).
Studies in LMICs have highlighted the lack of access to healthcare services, medical
rehabilitation, and assistive devices, leading to delays in needed medical evaluations and
preventive care.

Visual impairment leads to disability across the severity spectrum, affecting virtually all aspects
of an individual's life. It hinders the ability to complete critical activities of daily living, reduces
mobility and social participation, and increases the risk of depression. Factors such as
physical/environmental barriers and social factors like discrimination can magnify the impact of
vision loss [2].

Recent advancements in Artificial Intelligence (AI) have ushered in a new era of assistive
technologies tailored to the unique needs of the visually impaired community. These AI-powered
systems offer sophisticated functionalities that surpass the capabilities of traditional tools. For
example, facial recognition technology has integrated advanced algorithms with real-time
distance estimation, improving social engagement and mobility. Additionally, the integration of
Optical Character Recognition (OCR) technology with real-time image capture has enabled
instant reading of printed text, enhancing information access [3].

1|Page
AI has also been applied to object recognition technology, environmental sound analysis, and
AI-powered companions, further enhancing accessibility and inclusivity for visually impaired
individuals. These advancements have greatly improved the lives of visually impaired
individuals, offering them greater independence and access to information. As AI continues to
advance, there is great potential for further enhancements in assistive technologies, ensuring that
visually impaired individuals have the tools and support they need to navigate the world
confidently and independently.

1.2 Problem Identified and Solution Proposed

1.2.1 Problem Identified

The proposed project addresses the lack of accessibility and independence for visually impaired
individuals in developing countries, hindering their ability to understand and navigate their
environment. These individuals face challenges that impede their daily activities, leading to
isolation, dependency, and limited access to resources.

Specifically, they struggle with limited awareness of individuals in their environment, resulting
in social barriers and communication difficulties. They face difficulty identifying objects and
distances, affecting their mobility and safety, and have restricted access to written materials like
books due to a lack of braille materials or assistive technologies.

In Ethiopia, an estimated 1.2 million people are visually impaired, according to the World Health
Organization. This population encounters numerous challenges daily, including navigating
crowded environments, accessing public transportation, participating in educational activities,
and engaging socially, due to the lack of accessible infrastructure, scarce resources for assistive
technologies, and societal stigma[4].

The lack of appropriate accommodations and support in educational settings can create
significant barriers for visually impaired individuals, limiting their ability to fully participate in
academic activities and access essential learning materials. Without access to braille materials,
adaptive technologies, or trained educators, visually impaired students may struggle to keep up
with their peers and face challenges in understanding and retaining information. This can result
in lower academic achievement, decreased self-esteem, and a sense of exclusion from the
educational environment.

2|Page
Furthermore, it is estimated that most visually impaired individuals in Ethiopia face significant
barriers to employment, leading to a substantial loss of opportunities. This impacts their
economic independence and overall quality of life, exacerbating poverty and social exclusion.

1.2.2 Solution Proposed

The proposed solution for the project aimed at assisting visually impaired individuals is a
cutting-edge system designed to provide real-time information about their surroundings. By
providing real-time information about their surroundings through sound cues, this innovative
system enables visually impaired individuals to effortlessly detect and recognize individuals in a
room, as well as understand the distance of each person from them.

Additionally, the system can identify objects in the environment and convey this information to
the user, specifying the type of object and its distance from the individual. Furthermore, the
project includes a feature that enables the system to read books aloud to the visually impaired
person, enhancing their access to written materials.

By combining these functionalities, the project aims to significantly improve the independence
and autonomy of visually impaired individuals, allowing them to navigate their surroundings
with greater ease and engage more fully in social interactions and educational activities. This
comprehensive solution represents a significant step towards creating a more inclusive and
accessible environment for visually impaired individuals.

1.3 Objective

1.3.1 General Objective


To design, develop and implement Machine Vision-Based Assistance System for Visually
Impaired Individuals.

1.3.2 Specific Objectives


To realize our general objective, the following are our specific objectives:
 To understand the gaps in the existing system
 To gather the desired data set for training and testing operations.
 To implement hardware and software
 To deploy a CNN model in an integrated manner with hardware for the proposed system

3|Page
at a prototype level
 To test functionality of the proposed system
1.4 Methodology of the Project
To ensure efficient performance, this project was meticulously organized and coordinated
through several key stages. First, societal issues were identified through observation and
investigation. A specific real-world problem impacting visually impaired individuals in a chosen
community was then selected for deeper analysis. Following the selection of a study area, a
comprehensive review of existing literature on visually impaired individuals and related
challenges was conducted. This research informed the project's scope definition, outlining its
goals and limitations. Additionally, a suitable methodology was established to guide the project's
execution. Next, the focus shifted to system design and development. This involved in-depth
consideration of how different components would be integrated to effectively address the
identified issue. A block diagram was created to visually represent this overall system integration.
Finally, based on the defined scope, methodology, and system design, the device itself was
designed, developed and deployed in compatible environments where it could be utilized
effectively.
Problem Identification Regarding Community

Problem Identification Regarding Community

Choosing Necessary Problem that Need Solution

Proposing Necessary Solution that Used to Solve the Problem Identifies

Reading and Conceptualizing Existing Related Works

Choosing The best Methodology that Helps to do Proposed Solution

Finally Design, Develop and Deploy the Overall System

Figure 1: Methodology of the Project

4|Page
1.5 Scope and Limitation of the project

1.5.1 Scope of the Project


The proposed system offers an efficient, accurate, and reliable solution tailored for assisting
visually impaired individuals in their daily activities. The system is capable to aid users in
identifying individuals within their vicinity and determining their distance, providing crucial
spatial awareness cues. Additionally, the system have an ability to detect object within the user's
surroundings, identifying objects and conveying their distance from the user. Furthermore, the
system possesses the capability to read aloud written materials such as books, thereby enhancing
accessibility to printed content for visually impaired users. This comprehensive set of
capabilities reflects the system's versatility and potential to significantly improve the quality of
life and independence of visually impaired individuals by addressing key challenges related to
environmental awareness, social interactions, and access to information.

1.5.2 Limitation of the Project


Despite the innovative and beneficial aspects of the project for visually impaired individuals,
there are several limitations that need to be addressed. One major limitation is the unavailability
of certain components such as advanced sensors and processors, which are essential for real-time
object recognition and environmental awareness. This scarcity of components can hinder the
project's effectiveness and limit its functionality. Additionally, cost constraints pose a significant
challenge as acquiring high-quality components can be expensive, especially in developing
countries where resources are limited. Furthermore, the project team may face difficulties due to
time constraints, as developing and implementing such a complex system requires extensive
testing and optimization. Lastly, the performance of computers used in the project may be subpar,
affecting the speed and accuracy of object recognition and distance measurements. Overall, these
limitations highlight the need for adequate resources, funding, and time to overcome challenges
and enhance the project's capabilities for the benefit of visually impaired individuals.
1.6 Significance of the Project
Real-time Environmental Awareness and Object Recognition
 Empowers visually impaired individuals to navigate surroundings with ease and
confidence.
 Promotes independence and safety.

5|Page
Cost-Effectiveness
 Crucial in developing countries with limited resources for specialized technologies.
 Ensures accessibility to assistive technologies for a larger segment of the visually
impaired population.
 Reduces financial burdens on individuals and families.
Social Inclusion
 Detects and recognizes individuals, providing information about proximity and actions.
 Fosters meaningful social interactions.
 Promotes mutual understanding and awareness.
Educational and Informational Access
 Reads books and written materials aloud.
 Enhances access to education and information regardless of geographical location or
economic status.
 Promotes lifelong learning and personal development.
 Opens doors to employment opportunities and active civic participation.

6|Page
CHAPTER TWO

LITRATURE REVIEW

2.1 Overview

Visual impairment presents significant challenges in the daily lives of affected individuals,
affecting various aspects such as social interaction, education, and mobility safety. While
traditional tools like white canes and guide dogs have been instrumental in improving
independence and mobility, they often fall short in providing the comprehensive support required
for seamless integration into society. These tools, while effective to some extent, do not fully
address the diverse and complex needs of visually impaired individuals in navigating the world
around them. Recent advancements in Artificial Intelligence (AI) have brought about a new era
of assistive technologies specifically designed to meet the unique needs of the visually impaired
community. AI-powered systems offer innovative solutions that go beyond the capabilities of
traditional tools. These systems leverage the power of AI algorithms to provide more
sophisticated functionalities, enhancing the overall quality of life for visually impaired
individuals.

One of the key areas where AI has made significant advancements is in facial recognition
technology. Early facial recognition systems were limited in their ability to accurately estimate
distances, which is essential for effective social interaction. However, recent developments have
integrated advanced algorithms with real-time distance estimation capabilities, enabling visually
impaired individuals to assess the distance to individuals and objects in their surroundings more
accurately. This enhancement in facial recognition technology has not only improved the
accuracy of identifying individuals but has also enhanced situational awareness, leading to richer
social connections and improved overall quality of life.

Furthermore, AI has revolutionized information access technologies for the visually impaired.
While Text-to-Speech and Optical Character Recognition (OCR) tools have existed for some
time, their effectiveness has been limited by their reliance on pre-existing digital content. This
constraint has hindered the real-time accessibility of printed text for visually impaired
individuals. However, recent projects have integrated OCR technology with real-time image
capture capabilities, allowing users to instantly read printed text in their surroundings without the

7|Page
need for prior digitization. Additionally, the integration of AI-based narration has further
enhanced the reading experience by providing contextually relevant information, improving
overall comprehension of the text, and overcoming the limitations of traditional methods. recent
advancements in AI have significantly improved assistive technologies for visually impaired
individuals. These advancements have not only addressed the limitations of traditional tools but
have also opened up new possibilities for enhancing independence, accessibility, and inclusivity
for the visually impaired community[5].

2.2 Existing Systems

Facial recognition technology has undergone significant advancements in recent years,


particularly in addressing the limitations of early systems. Initially, these systems were proficient
at identifying individuals but struggled with accurately estimating distances, which is crucial for
effective social interaction. This limitation posed challenges for visually impaired individuals, as
it hindered their ability to engage with others in social settings. The benefit of this project in
facial recognition technology have successfully integrated advanced algorithms with real-time
distance estimation capabilities, marking a significant improvement in the technology's
functionality[6].

By combining advanced facial recognition algorithms with real-time distance estimation, these
systems can now not only identify individuals but also accurately assess the distance to them and
other objects in the surrounding environment. This integration has greatly enhanced situational
awareness for visually impaired individuals, allowing them to navigate social interactions more
effectively. Moreover, the ability to assess distances to both individuals and obstacles in the
environment has significantly improved safety and mobility for visually impaired individuals[7].

In practical terms, this advancement means that facial recognition technology can now provide
more comprehensive support to visually impaired individuals in various scenarios. For example,
in a social setting, the technology can help individuals identify and locate friends or
acquaintances accurately. In a more complex environment, such as a crowded street or public
transportation, the technology can assist individuals in navigating safely by detecting and
alerting them to obstacles in their path. the integration of advanced algorithms with real-time
distance estimation in facial recognition technology represents a significant step forward in
assistive technology for visually impaired individuals. It not only improves the accuracy of

8|Page
identifying individuals but also enhances situational awareness, fostering richer social
connections, and improving overall quality of life[8].

In the realm of information access technologies, significant progress has been made to improve
accessibility for visually impaired individuals. While Text-to-Speech (TTS) and Optical
Character Recognition (OCR) tools have been available, their effectiveness has been hindered by
their reliance on pre-existing digital content. This limitation has restricted real-time accessibility
for visually impaired individuals, as they often require printed text to be digitized before it can
be read aloud.to address this challenge, several projects have focused on integrating OCR
technology with real-time image capture capabilities. But our projects allows users to instantly
read printed text in their surroundings without the need for prior digitization. By capturing
images in real-time and processing them through OCR algorithms, these systems convert printed
text into a format that can be read aloud by a TTS engine. This integration not only provides a
more immediate reading experience but also enhances accessibility by eliminating the need for
pre-existing digital content[9].

Moreover, the integration of AI-based narration further enhances the reading experience for
visually impaired individuals. By providing contextually relevant information and improving
overall comprehension, AI-based narration ensures that the text is more easily understood and
accessible. This combination of real-time OCR and AI-based narration offers a more seamless
and accessible reading experience, effectively overcoming the limitations of traditional methods.
these advancements in information access technologies have significantly improved accessibility
for visually impaired individuals, providing them with greater independence and accessibility in
their daily lives[10].

The integration of AI in assistive technologies has a wide-reaching impact beyond facial


recognition and information access. Object recognition technology plays a crucial role in scene
understanding by identifying and describing objects, which aids in navigation and environmental
awareness for visually impaired individuals. By utilizing AI algorithms, object recognition
systems can provide detailed descriptions of objects in the environment, enabling users to
navigate more confidently and independently. environmental sound analysis is another area
where AI is making a difference in assistive technologies. AI algorithms can differentiate
between different types of sounds, providing valuable cues for safe navigation. For example,

9|Page
these systems can identify the sound of a car approaching or a doorbell ringing, alerting the user
to potential obstacles or events in their environment. This enhances the user's situational
awareness and safety[11].

AI-powered companions are also revolutionizing the field of assistive technologies. These
companions can provide emotional support and social interaction, reducing feelings of isolation
and improving overall well-being. They can engage users in conversations, provide reminders,
and assist with daily tasks, enhancing the user's quality of life. the potential impact of AI on
assistive technologies for visually impaired individuals is significant. AI-powered systems offer
comprehensive solutions that enhance accessibility and inclusivity, enabling visually impaired
individuals to live more independently and fulfillingly. Continued research and development in
AI integration hold promise for a future where assistive technologies empower all individuals,
regardless of visual impairment, to lead more fulfilling lives[12].

10 | P a g e
CHAPTER THREE
SYSTEM DESIGN METHODOLOGY
This chapter presented the design methodology of the proposed system. At a high level view, the
project incorporated two broad design sections, an AI and Electrical design section with their
respective subsections. As revealed on Figure 2, the AI section has four subsections (Data
collection and preparation, Data preprocessing, Modeling and Testing). Also, The Electrical
section consisted the hardware and software components in an integrated form.

Figure 2 : Block Diagram of the proposed system

11 | P a g e
3.1 AI Design Section of the Project

Data collection and Preparation Data Preprocessing

Testing Modeling

Figure 3: Block diagram of an AI design section

3.1.1 Dataset Collecting and Preparation


A dataset refers to a comprehensive collection of data. In our dataset collection efforts, we have
placed particular emphasis on gathering diverse and relevant data to enhance accuracy. This
includes acquiring data from our surrounding environment. The resulting dataset is substantial
and has been meticulously curated to serve as a robust resource for training and testing across
various applications. Emphasizing the significance of meticulous dataset collection practices
enhances the quality and applicability of machine learning models.

12 | P a g e
Captured by camera From online source From stored source

Trained Dataset

Figure 4: Image Source Resources

The data collection process involves identifying subjects via survey, setting up a controlled
environment, capturing images with varied conditions, manually annotating and organizing them,
applying data augmentation, and conducting a quality assurance review.
Table 1: Data Collection Process

Stage Method Description

Identify Subjects Survey and Selection Identify the various objects and individuals to
be included in the dataset through a
preliminary survey.

13 | P a g e
Setup Environment Controlled Environment Arrange a controlled environment with
Setup consistent lighting and background for
capturing images.

Capture Images Digital Use a high-resolution digital camera or


Camera/Smartphone smartphone to capture images of objects and
individuals.

Vary Conditions Multiple Angles Take images from different angles to ensure
diversity in the dataset.

Annotate Images Manual Labeling and Label each image with relevant tags such as
Tagging "chair", "table", "person1", "person2" using
CVAT open data annotation platform.

Organize Dataset Structured Folder System Store the annotated images in a structured
folder system categorized by object type and
individual identity.

Data Image Processing Enhance the dataset by applying augmentation


Augmentation Software techniques such as rotation, scaling, and
cropping.

Review Quality Quality Assurance Check Conduct a thorough review of the dataset to
ensure high-quality and accurate annotations.

3.1.1.1 Face Detection Dataset

In the case of tabular data, a data set corresponds to one or more database tables, where every
column of a table represents a particular variable, and each row corresponds to a given record of
the data set in question. The data set lists values for each of the variables, such as the recognize
and identify of person, for each member of the data set. In this project, we have selected eight
different persons for detections.

14 | P a g e
Table 2: Face Detections

Number Persons

1 Yosef Delesa

2 Zelalem Kibiru

3 Tewodros Asfaw

4 Temesgen Gezahegn

5 Eng. Kris

6 Mr. Adane

7 Mr. Ashebir

8 Mr. Gadisa

We have sampled eight persons of face detection dataset. In general, there are seven (8) persons
of datasets labeled as “Mr. Adane”, “Mr. Ashebir”, “Mr. Kris”, “Mr. Gadisa”, “Temesgen”,
“Zelelem” , “Yosef” and “Tewodros”. We have collected 100 image datasets for all persons
before augmentation. Figures below are data set samples for face detection of persons

Figure 5: Image dataset of Mr. Ashebir

15 | P a g e
Figure 6: Image dataset of Mr.Adane

Figure 7: Image dataset of Eng. Kris

Figure 8: Image dataset of Mr. Gadisa

Figure 9: Image dataset of Zelalem

16 | P a g e
Figure 10: Image dataset of Tewodros

Figure 11: Image dataset of Temesgen

Figure 12: Image dataset of Yosef


3.1.1.2 Object Detections Dataset

We selected eleven essential objects for this project. The selection process involved consulting
with the JIT campus and reviewing various research studies. The criteria for selection were based
on the presence of these objects in schools or workplaces. The list of these objects, which are
basic necessities for visually impaired individuals, is shown in the following table. Each attribute
has a value of 0 which means false and 1 which means true. The training data structure as shown
in table 3. In this study, there are 11 labels or lists of objects to be trained.

17 | P a g e
Table 3: Table of Training Data

sample Detection 1 ……….. Detection N Object

Detection1 1 0 Object A

Detection2 0 1 Object B

Detection3 0 1 Object C

…. ……. ……. …….. ……..

…… ……. ……. ……… ………

Detection N 1 1 Object N

Table 4: Selected Objects

Number Objects

1 Chair

2 Computer

3 Blackboard

4 Keyboard

5 Mouse

6 Door

7 Window

8 Wall

18 | P a g e
9 Birr

10 Stair

11 Person

3.1.2 Data Pre-processing


The system initiates pre-processing procedures in preparation for subsequent data analysis.
Employing TensorFlow and Keras within a Google Colab environment. The dataset is
systematically organized into batches and class names are identified. To further refine the input
data, a sequence for image resizing and rescaling is introduced using TensorFlow's Keras
Sequential model. These measures are incorporated to ensure that the incoming image data is
appropriately formatted and scaled to align with the system’s predetermined requirements,
thereby establishing a robust foundation for subsequent data analysis and model training. This
steps as mentioned below was applied to all the raw input images to convert them into clean
versions, which could be fed to the neural network machine learning model.
 The input image is resized to dimensions of 118 x 118 pixels.
 The images are scaled and normalized using the standard mean of TensorFlow's built-in
weights.
 The images are center-cropped to a pixel value of 118 x 118 x 3.
 The images are converted into tensors, which are similar to NumPy arrays.
 The dimensions 118 x 118 represent the width and height of the image in pixels.

3.1.2.1 Data Augmentation

Image data augmentation is a technique used to artificially expand the size of a training dataset
by creating modified versions of images in the dataset. It maximizes the dataset by applying
simple techniques like rotating and zooming images. Training deep learning neural network
models on more data can result in more skillful models, and the augmentation techniques can
create variations of the images that can improve the ability of the fit models to generalize what
they have learned to new images. The dataset was augmented in this project to improve accuracy,
as evidenced in Table 5.

19 | P a g e
Table 5: Data Augmentation Parameters

Parameter Value Justification


Rotation Range 20 Enhances model's ability to recognize objects from
different angles.
Zoom Range 0.15 Helps the model generalize by allowing it to learn from
zoomed-in or zoomed-out images.
Width Shift Range 0.2 Enables the model to learn from images with slight
horizontal shifts.
Height Shift Range 0.2 Enables the model to learn from images with slight vertical
shifts.
Shear Range 0.15 Helps the model recognize objects that are skewed or
distorted.
Horizontal Flip True Increases the diversity of training data by flipping images
horizontally.
Fill Mode "nearest" Fills in pixels that may be created during rotation or
shifting with the nearest pixel value.
Images per Class 1,000 Ensures a balanced representation of each class in the
dataset.

3.1.3 Modeling
During the modeling phase of the system, machine learning algorithms are utilized to perform
tasks such as identifying individual faces and detecting objects. This intricate process involves
training the model on meticulously curated datasets, enabling the algorithm to learn intricate
patterns and make accurate classifications. By consistently feeding the algorithm with such data,
it becomes adept at recognizing individuals and discerning objects within the visual field of a
visually impaired person, thereby enhancing their autonomy and safety.

Training the model on curated data is a pivotal step in ensuring its accuracy and reliability in
real-world applications. The systematic exposure to diverse datasets allows the algorithm to
generalize its learnings, thus improving its ability to identify and classify objects and faces

20 | P a g e
accurately. This iterative process of training and refining the model serves as the backbone for
developing robust and effective machine learning systems, especially in scenarios where precise
object detection and facial recognition are critical for user interactions and safety measures. The
data training phase is a critical component of the AI design section, where machine learning
algorithms are trained to perform task such as identifying individual faces and detecting objects.
This phase involves several key steps, each contributing to the accuracy and reliability of the
final model.

3.1.3.1 Convolutional Neural Networks for Object and Person Detection

Most effective machine learning models for image processing use neural networks and deep
learning. Deep learning uses neural networks for solving complex tasks similarly to the way
human brain solves them. Different types of neural networks can be deployed for solving
different image processing tasks, from simple binary classification (whether an image does or
doesn’t match a specific criteria) to instance segmentation.

Convolutional Neural Networks (ConvNets or CNNs) are a class of deep learning networks that
were created specifically for image processing with AI. However, CNNs have been successfully
applied on various types of data, not only images. In these networks, neurons are organized and
connected similarly to how neurons are organized and connected in the human brain. In contrast
to other neural networks, CNNs require fewer preprocessing operations. Plus, instead of using
hand-engineered filters (despite being able to benefit from them), CNNs can learn the necessary
filters and characteristics during training. CNN are made up of node levels, each of which has an
input layer, one or more hidden layers and an output layer. Each node is connected to the others
and has a weight and threshold assigned to it. If a node’s output exceeds a certain threshold value,
the node is activated, and the data is sent to the next tier of the network.

Figure 13: CNN Architecture

21 | P a g e
Forward Pass of Convolutional Neural Network

By learning visual attributes from small squares of input data, convolution preserves the link
between pixels. A convolutional layer is made up of a set of learnable filters or kernels that serve
as the network’s weights. The neuron decides whether the information is transmitted based on or
without the aggregate total and activation function.

Backward Pass of Convolutional Neural Network

CNNs are a type of multilayer perceptron influenced by its structure or multilayer perceptron. In
contrast to MLPs, where each neuron has its own weight vector, neurons in CNNs share weights.
This means the weight at one neuron is input for the next neuron. As a result of this weight
sharing, the total number of trainable weights in is CNN reduced.

3.1.4 Testing
In the testing phase the concern is evaluating performance of the proposed system. By
comparing the model's predictions with the actual identification, the accuracy of the system can
be evaluated. Additionally, during testing, parameters such as thresholds, sensitivity settings, and
other variables are fine-tuned to optimize the performance of the model. This iterative process
helps ensure that the system can consistently and accurately identify obejects and individuals.
3.2 Electrical Design Section of the project
As presented under 3.2.1 to 3.2.3, this design section encompassed three subsections: Hardware
design, Software design and Prototype development.

3.2.1 Hardware Design


In the hardware design of the proposed system, the process is divided into three main parts. The
input part includes a mobile camera, The mobile camera plays a crucial role in capturing images
of the surrounding area, providing visual data for further processing. Moving on to the
processing part, this section is controlled by an Arduino microcontroller, which acts as the
central processing unit to manage and coordinate the various components of the system. Lastly,
the output part features a speaker which is used to tell the visually impaired person what is going
on within the surrounding, ensuring a streamlined and efficient workflow. By integrating these
components seamlessly, the hardware design section of the system aims to optimize the
processing operation for enhanced productivity and quality control.

22 | P a g e
Input Part

The camera captures realtime


visual data of the book page

Processing Part

Used to control the switch

Output Part

The speaker audibly reads out the book

Figure 14: Hardware Design


3.2.1.1 Hardware Requirements
For this project we have used different hardware materials. Mainly we have used the following
list materials:

23 | P a g e
Table 6: Hardware Components Used

Sensor Controller Actuator

Camera Arduino Uno Speaker


Switch

Arduino Uno
Arduino Uno is a microcontroller board based on the ATmega328P. It is an open-source
electronics platform that allows users to easily create interactive electronic projects by using a
simple programming language and a variety of pre-built libraries. The Arduino Uno can be used
to control various devices, such as LEDs, motors, and sensors, by sending and receiving digital
and analog signal [13]. In this project it is used for controlling the servo.

Figure 15: Arduino Uno


Mobile Camera
The camera's capability to provide high-quality, real-time visual data is the bedrock for the
subsequent intelligent classification process. It transforms the physical attributes of the coffee
cherry into digital information, allowing the system to make informed decisions. The integration
of the mobile camera not only captures the beauty of the cherry but also empowers the system to
automate the sorting process seamlessly, making it a key component in ensuring precision and
efficiency.

24 | P a g e
Figure 16: Mobile camera
Speaker
The speaker in this project serves to audibly communicate crucial information to visually
impaired individuals, enhancing their system accessibility. By converting digital signals into
audio cues, it delivers real-time feedback and notifications, ensuring effective user interaction
and promoting inclusivity for visually impaired users, reflecting the project's accessibility
commitment.

Figure 17: Speaker

Switch
A switch is a fundamental component in electrical and electronic circuits that controls the flow
of electricity by opening or closing a circuit. It is a mechanical device that allows users to easily
turn devices on or off, change the direction of current flow, or select between different circuits
[14].

Figure 18: Switch

3.2.2 Software Design


In the software design of the system, the process is intricately structured into three key components. The

25 | P a g e
initial phase involves the input part, primarily focused on image acquisition. This step entails capturing
images of the surrounding using the mobile camera, which serves as the primary source of visual data for
further analysis. Moving on to the processing part, this stage is pivotal in the system's functionality as it
involves retrieving the trained model for image identification. The AI algorithm detects and recognizes
face of individuals and it also detects the objects within the surrounding environment. Finally, in the
output part, speaker is used to audibly convey information to visually impaired individuals.. This
systematic approach in software design ensures efficient processing.

Input Part

Image acquisition

Processing Part

Image Model
Identification retrieval
on

Output Part

Audio output

Figure 19: Software Design

3.3.2.1 System Flow Chart


System flow chart for the object or person detection and recognition
Flowchart is a powerful visual tool that represents a workflow or process in a clear and concise
way. It provides an algorithmic representation of a task, making it easier to understand.

26 | P a g e
Start

Is the image YES


a person?

NO

NO Is the image
an object?

YES

Object detect

Object matching Distance


measurnment

Object recognition Distance


calculation

Audio output

ON
Switch

OFF

Figure 20: Flow Chart for object or person detection

27 | P a g e
System flow chart for the Text-To-Speech Device
The system flow chart involves a user starting the process by inserting a paper image or book
into the device and turns on the switch. The device then captures the image and uses an image
processing algorithm to extract text from it. The extracted text is converted to audio using a text-
to-speech engine, which is played through a speaker or headphones. The device checks if the
switch is on or off and if the switch is on continuous the process if not it shuts down the system.

Start

ON
Switch

OFF

End

Figure 21: Flow Chart for Text-To-Speech Device

28 | P a g e
3.3.2.2 Software Requirements
Software tools used in the project are the followings:
Google Collab online code editor
Arduino IDE
Python IDE
Coding language: Arduino programming (C & C++) and Python
 TensorFlow
 Kera’s
 Visual Studio Code Editor
 OpenCV
 Pickle

Google Colab

Collaboratory, or “Colab” for short, is a product from Google Research. Colab allows anybody
to write and execute arbitrary python code through the browser and is especially well suited to
machine learning, data analysis, and education. More technically, Colab is a hosted Jupyter
notebook service that requires no setup to use, while providing access free of charge to
computing resources including GPUs [15]. In this project, we have used Colab to train our model
online with free GPU to get more speed. Also, we choose this platform to overcome the
challenges that may come from installing libraries on python supporting IDEs for offline use.

Figure 22: Google Colab


Python IDE

The Python IDE is a specialized Integrated Development Environment utilized for computer
programming, particularly designed for the Python programming language. This software is

29 | P a g e
developed and maintained by JetBrains, a Czech-based company previously recognized as
IntelliJ. Offering an extensive array of capabilities, it encompasses sophisticated features such as
code analysis, a comprehensive graphical debugger, seamless integration with unit testing
frameworks, as well as cohesive support for version control systems (VCSs). Moreover, it
provides robust compatibility for web development using frameworks like Django and
proficiently supports data science endeavors through seamless integration with Anaconda (as
documented in [16]). Within the scope of this project, the Python IDE has been instrumental for
conducting offline project demonstrations and composing Python code tailored for serial
communication tasks. Notably, the demonstration unfolds through two distinct pathways: firstly,
via the provision of image paths and secondly, by real-time image capture through webcam
integration.

Figure 23: Python IDE

Arduino IDE
Arduino integrated development environment (IDE) is a cross-platform application written in the
programming language Java. It is originated from the IDE for the languages Processing and
Wiring. It provides a simple one-click mechanism to compile and load programs to an Arduino
board [17]. In this project, we have used Arduino IDE to write an Arduino code to get the label
sent from Python through serial communication and give the direction of rotation of the servo
motor and also sensing the touch of the people from touch sensor.

Figure 24: Arduino IDE

30 | P a g e
TensorFlow
TensorFlow is an open-source end-to-end platform for creating Machine Learning applications.
It is a symbolic math library that uses dataflow and differentiable programming to perform
various tasks focused on training and inference of Vision assistance for visually impaired
individual Using Intelligent System. The TensorFlow software handles data sets that are arrayed
as computational nodes in graph form. The edges that connect the nodes in a graph can represent
multidimensional vectors or matrices, creating what are known as tensors. Because TensorFlow
programs use a data flow architecture that works with generalized intermediate results of the
computations. TensorFlow platform is used in this project for the model development using
deep learning algorithm CNN.

Figure 25: TensorFlow

Kera’s
Kera’s Is an open-source software library that provides a Python interface for artificial neural
networks. Keras acts as an interface for the TensorFlow library. It is designed to enable fast
experimentation with deep neural networks, it focuses on being user-friendly, modular, and
extensible. Keras contains numerous implementations of commonly used neural-network
building blocks such as layers, objectives, activation functions, optimizers, and a host of tools to
make working with image and text data easier to simplify the coding necessary for writing a code.
In addition to standard neural networks, Keras has support for convolutional and recurrent neural
networks. It supports other common utility layers like dropout, batch normalization, and pooling.
Keras allows users to productize deep models on smartphones (iOS and Android), on the web, or
on the Java Virtual Machine. Keras library is used in this project for model development using
deep learning algorithm CNN.

31 | P a g e
Figure 26: Kera’s

Visual Studio Code Editor


The Visual Studio Code (VS Code) editor stands out as a formal and organized tool for Python
development due to its lightweight nature, cross-platform compatibility, and user-friendly
interface. Its seamless integration with Git, debugging capabilities, and IntelliSense code
completion feature contribute to a structured and efficient coding workflow. The extensive
library of extensions available for VS Code further enhances its functionality, allowing
developers to customize their environment based on specific project requirements. Overall, the
formal and organized design of Visual Studio Code makes it a valuable asset for maintaining
code quality and improving productivity in Python development projects like ours.

Figure 27: Visual Studio Code Editor

OpenCV
OpenCV, known as the Open Source Computer Vision Library, serves as a pivotal tool in our
development process for its robust capabilities in computer vision and machine learning tasks.
By leveraging OpenCV, we seamlessly integrate functionalities such as live video capture, image
preprocessing, and GUI display into our applications. This organized approach not only enhances
the visual aspects of our projects but also streamlines the implementation of complex computer
vision algorithms. The structured nature of utilizing OpenCV ensures a formal framework for
handling visual data and contributes to the overall organization of our Python development
workflow[18].

32 | P a g e
Figure 28: OpenCV

Pickle
The utilization of the Pickle module in our development process exemplifies a formal and
organized approach to managing Python object structures. By leveraging Pickle for serializing
and deserializing face encodings, we establish a systematic method for efficiently storing and
retrieving crucial facial recognition data. This structured implementation not only enhances the
organization of our codebase but also streamlines the handling of complex object structures,
contributing to the overall robustness and scalability of our applications. The formal integration
of Pickle underscores our commitment to implementing best practices in data management and
reinforces the reliability of our facial recognition system.

Figure 29: Pickle

3.3.3 Prototype Development


3.3.3.1 Integration Part
The integration of the System involves the coordination between the hardware and software
components. The integration of these hardware and software components is essential for
ensuring the smooth operation and effectiveness of the System in accurately identifying the
images.

33 | P a g e
Software Part Hardware Part

System Testing

System Testing

Figure 30: Integration Part

Hardware-Software Interface
The Hardware-Software Interface involves establishing communication between the physical
hardware components (Camera, speaker, Switch, Arduino) and the software that controls and
processes data. This interface enables the transfer of information between the hardware and
software components, allowing for seamless operation of the system. Processing the data in the

34 | P a g e
software to identify the images and sending processed data from the software to speaker to tell
the visually impaired person about the environment.
System Testing
System Testing is a comprehensive evaluation phase that occurs after the integration of hardware
and software components. The primary objective is to validate that the integrated system
operates harmoniously and meets the specified requirements. This testing phase encompasses
various methodologies, including functional testing to verify individual functions, performance
testing to assess system response under different conditions, and stress testing to evaluate the
system's stability under extreme loads. System Testing aims to identify and rectify any issues
arising from the collaboration between hardware and software.
Deployment
Deployment marks the transition from the development and testing phases to the operational
stage. After successful testing and validation, the system is deemed ready for deployment in its
target environment. This involves the physical installation of the hardware and software
components in the designated setting. Configuration settings are adjusted to align with the
specific requirements of the operational environment. The deployment phase is a critical step in
realizing the intended benefits of the system, as it transitions from a controlled testing
environment to real-world use.
3.3.3.2 Prototype of the proposed system
The prototype of the machine is completely implemented. It is shown in figure below.

Figure 31: The Prototype of Text-To-Speech Device

35 | P a g e
3.3.3.3 Working Principle of the prototype
Working principle of object or person detection and recognition
The project for assisting visually impaired individuals functions through a combination of
advanced computer vision, machine learning, and auditory feedback systems. When a visually
impaired person enters a room, the system utilizes cameras to capture the surroundings. The
captured images are processed in real-time using machine learning algorithms to detect and
recognize individuals and objects within the environment. The system identifies each person and
object, determines their distance from the user using depth sensors, and converts this information
into audio feedback. The user receives auditory notifications about the identities of individuals,
their relative distances and the types of objects present along with their locations. This
continuous auditory feedback enables the visually impaired person to navigate and interact with
their environment more effectively and independently.

Mobile Camera Captures The Environment

Distance measurement

Object or Person Detection and Recognition

Model
Audio Output

Figure 32: Working Principle of object or face detection

36 | P a g e
Working principle of Text-To-Speech Device
The working principle of the proposed project idea involves a systematic flow of key stages that
facilitate the accessibility of printed text for visually impaired individuals. The process begins
with the image capture stage, where the device utilizes a camera to capture images of the printed
material. These captured images are then subjected to image processing techniques, specifically
OCR, which enables the extraction of text from the images. Subsequently, the extracted text
undergoes conversion into audio format through a TTS engine. This conversion process ensures
that the visually impaired individuals can listen to the content of the printed material rather than
relying on visual cues. Finally, the converted audio output is played back through a speaker or
headphones, providing a seamless and accessible means for visually impaired individuals to
engage with printed text. By following this organized and detailed working principle, the system
effectively enhances independence and inclusivity for visually impaired individuals in various
reading scenarios.

Image Capture Image Processing

Audio Output Text-to-Speech Conversion

Figure 33: Working Principle of the Text-To-Speech Device

37 | P a g e
CHAPTER FOUR
RESULTS AND DISCUSSION
4.1 Introduction
We have designed, developed, and implemented Machine Vision-Based Assistance System for
Visually Impaired Individuals. The project required a combination of both software and
hardware work, with the software portion consisting of an intelligent system used to identify
objects and individuals. We utilized Google Colab to train our model, tweaking different
parameters such as "Epoch" and "Batch Size" to achieve the most accurate and reliable results
possible by suing data we have in hand.
4.2 Classification Accuracy with machine vision
In the realm of machine vision, the code showcases the development of a CNN that excels in
image classification tasks. Specifically, it demonstrates high accuracy in identifying images.To
evaluate performance, accuracy serves as the primary metric, measuring the proportion of
correctly classified images. Key hyper parameters and training details include 50 epochs of
training to balance optimization with overfitting prevention, a batch size of 4 for efficient model
updates, and image resizing to 256x256 pixels for computational efficiency. Factors contributing
to the model's accuracy include: thoughtful data preprocessing, data augmentation techniques to
enhance generalization, a well-structured CNN architecture for feature extraction, the Adam
optimizer for efficient weight updates, and sparse categorical crossentropy loss for multi-class
classification. In conclusion, the code exemplifies the effectiveness of CNNs in achieving high
classification accuracy in machine vision applications. Its results underscore the potential of
CNNs in various image-based tasks, inviting further exploration and optimization.
Table 7: Hyper-Parameters

No Hyper-parameter Value
1 Input Size 118x118
2 Batch Size 4
3 Epoch 50
4 Channel 3

38 | P a g e
Figure 34: Accuracy and loss

39 | P a g e
The training of the CNN model reveals an outstanding level of performance, underscoring its
remarkable efficacy. Throughout the training process, the model consistently demonstrates high
accuracy and minimal loss, showcasing its ability to effectively learn and generalize from the
training data to the validation set. The progressive increase in accuracy over the epochs is
particularly noteworthy. This improvement reflects the model's capacity to continuously enhance
its ability to correctly classify images in the validation set. Notably, the model achieves a
validation accuracy exceeding 98% in several instances, a remarkable feat that underscores its
proficiency in making accurate predictions. This high level of accuracy indicates that the model
is not only learning effectively but also generalizing well to new, unseen data.

Moreover, the consistently low validation loss throughout the training process is indicative of the
model's precision and reliability. Low loss values suggest that the model makes predictions with
minimal errors, further highlighting its robustness in handling diverse and complex data. This
reliability is crucial for applications requiring high precision and low error rates.

The high accuracy and low loss demonstrated by the CNN model make it an ideal candidate for
applications designed to assist visually impaired individuals. With its exceptional performance,
the model can effectively differentiate between various types of individuals and objects. This
capability ensures that only the highest-quality images are selected for further processing.

The model's high accuracy means it can reliably identify individuals and objects, a crucial
feature for aiding visually impaired individuals. Additionally, the model's low loss indicates its
ability to make predictions with minimal errors, thereby reducing the likelihood of
misclassification. This precision and reliability in making accurate identifications contribute to
more efficient and effective identification processes. Its ability to accurately and reliably classify
images can lead to significant improvements in the quality of assistance provided to visually
impaired individuals, ensuring better outcomes and enhanced support.

40 | P a g e
4.3 Test Performed

41 | P a g e
Figure 36: Result from testing the system
We have tested the model first by splitting 20% of the dataset for testing. The result we have got
from this was very good. As depicted in the figure, it can be observed that the test accuracy is
exceptionally high, demonstrating its remarkable ability to classify with utmost precision.Upon
careful examination of the accompanying figure, it becomes evident that the test accuracy
exhibits a remarkably high level. This remarkable accuracy enables the classification process to
be conducted with exceptional precision and accuracy, nearly approaching perfection.

42 | P a g e
CHAPTER FIVE
CONCLUSION AND RECOMMENDATION
5.1 Conclusion
In addressing the challenges faced by visually impaired individuals, our project represents a
significant step towards enhancing their independence, safety, and inclusion in society. By
leveraging cutting-edge technology, our system provides real-time auditory feedback to visually
impaired individuals, offering insights into their surroundings, including individual recognition,
and object detection. This problem solving solution not only bridges the gap between perception
and information but also empowers users to navigate diverse environments with confidence and
autonomy.
The benefits of our project extend beyond individual users to encompass broader societal
impacts. The cost-effective nature of the technology opens doors for widespread adoption,
ensuring affordability and accessibility for visually impaired individuals across diverse
socioeconomic backgrounds and geographic locations. Moreover, by promoting independence
and reducing reliance on external assistance, our project contributes to greater societal inclusion
and equal opportunities for visually impaired individuals in various facets of life, from education
to employment and social interactions.
While the project successfully addresses key challenges faced by visually impaired individuals,
there are areas that remain unexplored due to time limitations and material costs. Future work
could focus on expanding the functionalities of the project, improving accuracy and efficiency,
and incorporating advanced features to further enhance the user experience.
Despite the constraints of time and cost, the project team strived to overcome these challenges by
optimizing resources and focusing on delivering a practical solution within the given timeframe.
The project underscores the importance of problem solving and dedication in creating inclusive
solutions for individuals with disabilities and serves as a stepping stone for future developments
in assistive technology for visually impaired individuals.
5.2 Recommendation
We highly recommend that to focus on enhancing the accuracy and precision of the system
within the constraints of limited resources. Despite the challenges of having a small budget that
restricts the acquisition of advanced technologies such as sensors and processors, there are

43 | P a g e
several ways to optimize the project's functionalities. Here are some recommendations to
consider:

 Prioritize software optimization: Given the limitations in purchasing high-end hardware,


focus on improving the efficiency of the software algorithms to enhance the project's
performance and accuracy.
 Collaborate with local organizations: Partnering with local organizations or universities can
provide access to resources, expertise, and potential funding opportunities to support the
project's development.
 Utilize open-source platforms: Explore open-source platforms and tools that offer cost-
effective solutions for developing and implementing the project, allowing for flexibility and
customization without significant financial investment.
 Engage in community outreach: Engaging with the visually impaired community for
feedback and insights can help tailor the project to better meet their needs and ensure its
practicality and usability.
By strategically focusing on software optimization, leveraging local resources, utilizing open-
source platforms, engaging with the community, and seeking external funding sources, the
project can continue to evolve and make a meaningful impact in improving accessibility and
independence for visually impaired individuals. Despite financial constraints, creativity,
collaboration, and perseverance can drive the project's success and contribute to its long-term
sustainability.

44 | P a g e
REFERENCE
[1] Rizzo, John-Ross, et al. "The global crisis of visual impairment: an emerging global health
priority requiring urgent action." Disability and Rehabilitation: Assistive Technology 18.3 (2023):
240-245.

[2] Giudice, Nicholas A. "Navigating without vision: Principles of blind spatial


cognition." Handbook of behavioral and cognitive geography. Edward Elgar Publishing, 2018.
260-288.

[3] Bustos-López, Maritza, et al. "Emotion Detection in Learning Environments Using Facial
Expressions: A Brief Review." Handbook on Decision Making: Volume 3: Trends and
Challenges in Intelligent Decision Support Systems (2022): 349-372.

[4] Berhane, Yemane, et al. "National survey on blindness, low vision and trachoma in Ethiopia:
methods and study clusters profile." Ethiopian Journal of Health Development 21.3 (2007): 185-
203.

[5] Zamir, Muhammad Farid, et al. "Smart reader for visually impaired people based on Optical
Character Recognition." Intelligent Technologies and Applications: Second International
Conference, INTAP 2019, Bahawalpur, Pakistan, November 6–8, 2019, Revised Selected Papers
2. Springer Singapore, 2020.

[6] Nagarajan, R., Sainarayanan, G., Yaacob, S. and Porle, R.R., 2004. Object Identification and
Colour Recognition for Human Blind. In ICVGIP (pp. 210-215).

[7] Pu, Ying-Hung, et al. "Aerial face recognition and absolute distance estimation using drone
and deep learning." The Journal of Supercomputing (2022): 1-21.

[8] Rahman, Md Atikur, and Muhammad Sheikh Sadi. "IoT enabled automated object
recognition for the visually impaired." Computer methods and programs in biomedicine update 1
(2021): 100015.

[9] Sarma, Minerva, et al. "Development of a Text-to-Speech Scanner for Visually Impaired
People." Design and Development of Affordable Healthcare Technologies. IGI Global, 2018.
218-238.

[10] Zamir, Muhammad Farid, et al. "Smart reader for visually impaired people based on Optical

45 | P a g e
Character Recognition." Intelligent Technologies and Applications: Second International
Conference, INTAP 2019, Bahawalpur, Pakistan, November 6–8, 2019, Revised Selected Papers
2. Springer Singapore, 2020.

[11] Shanker, Amit, and Ravi Kant. "Assistive technologies for visually impaired: Exploring the
barriers in inclusion." Research Highlights 8.3 (2021): 70.

[12] Smith, Emma M., et al. "Artificial intelligence and assistive technology: risks, rewards,
challenges, and opportunities." Assistive Technology 35.5 (2023): 375-377.

[13] Wikipedia, "ArduinoUNO," [Online]. Available: https://en.wikipedia.org/wiki/Arduino/.

[Accessed: May. 29, 2024].

[14] Wikipedia, "Switch," [Online]. Available: https://en.wikipedia.org/wiki/Switch/.

[15] Google Collab, "https://www.wikipedia.org/," May. 29, 2024, 5:09 PM.

[16] Python, "https://www.componentsource.com/," May. 29, 2024, 5:14:33 AM.

[17] Arduino IDE, "https://www.javatpoint.com/," May. 29, 2024, 5:19:13 AM.

[18] Rosebrock, A. (2018). "OpenCV Face Recognition." PyImageSearch. [Online]. Available:


https://www.pyimagesearch.com/2018/09/24/opencv-face-recognition/.

46 | P a g e
APPENDIX
Appendix A: face recognition and distance estimation
import cv2
import numpy as np
import pyttsx3
KNOWN_DISTANCE = 76.2 # centimeter
KNOWN_WIDTH = 14.3 # centimeter
WHITE = (255, 255, 255)
recognizer = cv2.face.LBPHFaceRecognizer_create()
recognizer.read('trainer/trainer.yml')
#iniciate id counter
id = 0
cam = cv2.VideoCapture(0)#'http://10.180.16.114:8080/video'
cam.set(3, 640) # set video widht
def focal_length(measured_distance, real_width, width_in_rf_image):
focal_length_value = (width_in_rf_image * measured_distance) / real_width
return focal_length_value

# Distance estimation function


def distance_finder(focal_length, real_face_width, face_width_in_frame):
distance = (real_face_width * focal_length) / face_width_in_frame
return distance
def face_data(image):
face_width = 0
gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
faces = faceCascade.detectMultiScale(gray_image, 1.2, 5)
for (x, y, h, w) in faces:
# cv2.rectangle(image, (x, y), (x + w, y + h), (0,255,0), 2)
face_width = w
return face_width
faces = faceCascade.detectMultiScale(
gray,
scaleFactor = 1.2,

47 | P a g e
minNeighbors = 5,
# minSize = (int(minW), int(minH)),
)
if (confidence < 100):
id = name_list[id]
confidence = " {0}%".format(round(100 - confidence))
else:
id = "unknown"
confidence = " {0}%".format(round(100 - confidence))
cv2.putText(img, str(id), (x+5,y-5), font, 1, (255,255,255), 2)
cv2.putText(img, str(confidence), (x+5,y+h-5), font, 1, (255,255,0), 1)
if face_width_in_frame != 0:
Distance = distance_finder(focal_length_found, KNOWN_WIDTH, face_width_in_frame)
cv2.imshow('camera',img)
k = cv2.waitKey(10) & 0xff
if k == 27:
print("\n [INFO] Exiting Program")
cam.release()
cv2.destroyAllWindows()
Appendix B : object detection and distance estimation
import argparse
import csv
import os
import platform
import sys
from pathlib import Path
import torch
if str(ROOT) not in sys.path:
sys.path.append(str(ROOT)) # add ROOT to PATH
ROOT = Path(os.path.relpath(ROOT, Path.cwd())) # relative
from ultralytics.utils.plotting import Annotator, colors, save_one_box
from models.common import DetectMultiBackend
if is_url and is_file:
source = check_file(source)

48 | P a g e
save_dir = increment_path(Path(project) / name, exist_ok=exist_ok) # increment run
(save_dir / "labels" if save_txt else save_dir).mkdir(parents=True, exist_ok=True) # make dir
device = select_device(device)
model = DetectMultiBackend(weights, device=device, dnn=dnn, data=data, fp16=half)
stride, names, pt = model.stride, model.names, model.pt
bs = 1 # batch_size
if webcam:
view_img = check_imshow(warn=True)
dataset = LoadStreams(source, img_size=imgsz, stride=stride, auto=pt, vid_stride=vid_stride)
bs = len(dataset)
elif screenshot:
dataset = LoadScreenshots(source, img_size=imgsz, stride=stride, auto=pt)
else:
dataset = LoadImages(source, img_size=imgsz, stride=stride, auto=pt, vid_stride=vid_stride)
vid_path, vid_writer = [None] * bs, [None] * bs
model.warmup(imgsz=(1 if pt or model.triton else bs, 3, *imgsz)) # warmup
seen, windows, dt = 0, [], (Profile(device=device), Profile(device=device), Profile(device=device))
for path, im, im0s, vid_cap, s in dataset:
with dt[0]:
im = torch.from_numpy(im).to(model.device)
im = im.half() if model.fp16 else im.float() # uint8 to fp16/32
im /= 255 # 0 - 255 to 0.0 - 1.0
if len(im.shape) == 3:
im = im[None] # expand for batch dim
if model.xml and im.shape[0] > 1:
ims = torch.chunk(im, im.shape[0], 0)

# Inference
with dt[1]:
visualize = increment_path(save_dir / Path(path).stem, mkdir=True) if visualize else False
if model.xml and im.shape[0] > 1:
pred = None
for image in ims:
csv_path = save_dir / "predictions.csv" print_args(vars(opt))

49 | P a g e
return opt
def main(opt):
"""Executes YOLOv5 model inference with given options, checking requirements before running the
model."""
check_requirements(ROOT / "requirements.txt", exclude=("tensorboard", "thop"))
run(**vars(opt))
if __name__ == "__main__":
opt = parse_opt()
main(opt)
Appendix C: Book reading
import os
import cv2
import pytesseract
from pygame import mixer
from gtts import gTTS
from playsound import playsound
import matplotlib.pyplot as plt
pytesseract.pytesseract.tesseract_cmd = 'C:\\Program Files\\Tesseract-OCR\\tesseract.exe'
answer = "y"
mixer.init()
while answer.lower() in ["y", "yes"]
video = cv2.VideoCapture("https://192.168.0.43:8080/video")
video.set(3, 640)
video.set(4, 480)

if video.isOpened():
if video.isOpened():
check, frame = video.read()
if check:
cv2.imwrite("frame.jpg", frame)
video.release()
data = pytesseract.image_to_data("frame.jpg")
if z != 0:
if len(a) == 12:

50 | P a g e
x, y = int(a[6]), int(a[7])
w, h = int(a[8]), int(a[9])
cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 0, 255), 2)
cv2.putText(frame, a[11], (x - 15, y), cv2.FONT_HERSHEY_PLAIN, 2, (0, 0, 255), 1)
filewrite.write(a[11] + " ")
filewrite.close()
fileread = open("String.txt", "r")
language = 'en'
line = fileread.read()
fileread.close()
if line != "":
speech = gTTS(text=line, lang=language, slow=False)
speech.save("test.mp3")
mixer.music.load("test.mp3")
mixer.music.set_volume(0.7)
mixer.music.play()
while True:

print("Press 'p' to pause, 'r' to resume")


print("Press 'e' to exit the program")
query = input(" ")
else:
print("The frame is empty. The video capture failed.")
break
else:
print("The video capture is not opened. Check the URL or the connection.")
break
else:
print("The video capture is not opened. Check the URL or the connection.")
break
speech = gTTS(text="Goodbye.", lang=language, slow=False)
speech.save("message.mp3")
playsound("message.mp3")

51 | P a g e
Appendix D: Arduino code
#include <Servo.h>
#include <Wire.h>
#include <Firmata.h>
SerialFirmata serialFeature;
int analogInputsToReport = 0;
unsigned long currentMillis;
unsigned long previousMillis;
struct i2c_device_info {
byte addr;
int reg;
byte bytes;
byte stopTX;
};
Servo servos[MAX_SERVOS];
byte servoPinMap[TOTAL_PINS];
byte detachedServos[MAX_SERVOS];
byte detachedServoCount = 0;
byte servoCount = 0;
void sysexCallback(byte, byte, byte*);
{
#if ARDUINO >= 100
Wire.write((byte)data);
#else
Wire.send(data);
#endif
}
byte wireRead(void)
{
#if ARDUINO >= 100
return Wire.read();
#else
return Wire.receive();
#endif

52 | P a g e
}
if (IS_PIN_DIGITAL(pin)) {
Firmata.write(PIN_MODE_SERVO);
Firmata.write(14);
}
void systemResetCallback()
{
isResetting = true;
#ifdef FIRMATA_SERIAL_FEATURE
serialFeature.reset();
#endif
if (isI2CEnabled) {
disableI2CPins();
}
for (byte i = 0; i < TOTAL_PORTS; i++) {
reportPINs[i] = false; // by default, reporting off
portConfigInputs[i] = 0; // until activated
previousPINs[i] = 0;
}
for (byte i = 0; i < TOTAL_PINS; i++) {
if (IS_PIN_ANALOG(i)) {
setPinModeCallback(i, PIN_MODE_ANALOG);
} else if (IS_PIN_DIGITAL(i)) {
setPinModeCallback(i, OUTPUT);
}
servoPinMap[i] = 255;
}
for (byte i=0; i < TOTAL_PORTS; i++) {
outputPort(i, readPort(i, portConfigInputs[i]), true);
}
isResetting = false;
serialFeature.update();
#endif
}

53 | P a g e

You might also like