0% found this document useful (0 votes)

114 views35 pages

OCR Techniques for B.Tech Students

This paper focuses on using computer vision techniques for optical character recognition (OCR) to recognize and extract text from various sources like scanned documents and images. The proposed OCR system acquires images, performs preprocessing like noise removal, recognizes text using machine learning models trained on labeled text data, and applies postprocessing to refine the extracted text. The paper discusses the methodology, applications of OCR, and provides details of an experimental OCR project implementation along with its limitations and potential future work.

Uploaded by

Ashwani Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

114 views35 pages

OCR Techniques for B.Tech Students

Uploaded by

Ashwani Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

Similarity Report ID: oid:16158:38435882

PAPER NAME AUTHOR

ASHWANI KUMAR SINGH-NTCC-2021-2 ashwani singh

5.docx

WORD COUNT CHARACTER COUNT

5557 Words 32351 Characters

PAGE COUNT FILE SIZE

32 Pages 1009.6KB

SUBMISSION DATE REPORT DATE

Jul 1, 2023 4:54 PM GMT+5:30 Jul 1, 2023 4:54 PM GMT+5:30

14% Overall Similarity

The combined total of all matches, including overlapping sources, for each database.
10% Internet database 1% Publications database
Crossref database Crossref Posted Content database
11% Submitted Works database

Excluded from Similarity Report

Bibliographic material Quoted material
Cited material Small Matches (Less then 10 words)

Summary
Term Paper Report
on
OCR (Text Recognition and Extraction) Using CV
1
Submitted to
Amity University Uttar Pradesh

In partial fulfillment of the requirements for the award of the

degree of
Bachelor of Technology
in
Computer Science and Engineering
by
Ashwani Kumar Singh A705221101

Under the guidance of

Dr Bramah Hazela
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
AMITY SCHOOL OF ENGINEERING AND TECHNOLOGY AMITY
UNIVERSITY
UTTAR PRADESH
JUNE 2023
AMITY UNIVERSITY
–––––––––UTTAR PRADESH–––––––––

DECLERATION BY THE STUDENT

9
I Ashwani Kumar Singh student of B.Tech (5-C.S.E.-25(Y)) hereby declare
that the project titled “OCR (Text Recognition and Extraction) Using CV”
which is submitted by me to Department of Computer science, Amity
School of Engineering Technology, Amity University Uttar Pradesh,
Lucknow, in partial fulfillment of requirement for the award of the degree
of Bachelor of Technology in Computer Science and Engineering, has not
been previously formed the basis for the award of any degree, diploma or
other similar title or recognition.

The Author attests that permission has been obtained for the use of any
copy righted material appearing in the Dissertation / Project report other
than brief excerpts requiring only proper acknowledgement in scholarly
writing and all such use is acknowledged.

Lucknow

Date: Ashwani Kumar Singh

B. Tech (CS & E) 5th Sem

A7605221101
AMITY UNIVERSITY
–––––––––UTTAR PRADESH–––––––––

CERTIFICATE

On the basis of declaration submitted by Mr. Ashwani Kumar Singh student

of B.Tech (Computer Science and Engineering) 5th semester, I hereby certify
that the Term Paper titled “OCR (Text Recognition and Extraction) Using CV”
which is submitted to Amity School of Engineering and Technology, Amity
University Uttar Pradesh, Lucknow, in partial fulfillment of the requirement
for the award of the degree of Bachelor of Technology in Computer Science
and Engineering , is an original contribution with existing knowledge and
faithful record of work carried out by her under my guidance and supervision.
3
To the best of my knowledge this work has not been submitted in part or
full for any Degree or Diploma to this University or elsewhere.

Lucknow

Date: -

Dr Bramah Hazela
Assistant Professor- III
ASET
Amity University
1

AMITY UNIVERSITY
–––––––––UTTAR PRADESH–––––––––

ACKNOWLEDGEMENT

The satisfaction that accompanies that the successful completion of any

task would be incomplete without the mention of people whose ceaseless
cooperation made it possible, whose constant guidance and
encouragement crown all efforts with success. I would like to thank Prof.
(Dr) O.P. Singh, Head of Department-CSE, and Amity University for giving
me the opportunity to undertake this project. I would like to thank my
faculty guide Dr Bramah Hazela who is the biggest driving force behind my
successful completion of the project. She has been always there to solve
any query of mine and guide me in the right direction regarding the
project. Without her help and inspiration, I would not have been able to
complete the project.
Also, I would like to thank my batchmates who guided me, helped me, and
gave ideas and motivation at each step.

Ashwani Kumar Singh

Table of Contents
Sl. No. Description Pg. No.
8
1 Abstract

2 Introduction
3 Methodology
3.1 Image Acquisition

3.2 Image Preprocessing

3.3 Text recognition
3.4 Postprocessing

4 Types of OCR
5 Applications of OCR

6 Project
6.1 Aim
6.2 Setup

6.3 Descriptions
6.4 Documentation
6.5 Code and It’s Working

7 Comparison with Existing Methods

8 Results
9 Limitations and Challenges

10 Future Research Directions

11 Conclusion
12 References
Abstract
This research paper focuses on the utilization of Computer Vision (CV)
techniques for Optical Character Recognition (OCR), specifically in the context
of text recognition and extraction. The primary goal of this investigation is to
develop a highly efficient Optical Character Recognition (OCR) system capable
10
of accurately identifying and extracting text from a wide range of sources,
including scanned documents, images, and videos.

The methodology employed in this research encompasses various essential

steps. Initially, image acquisition is conducted to gather relevant images
containing textual content from different sources. These acquired images
then undergo image preprocessing techniques aimed at enhancing quality
and optimizing the performance of the OCR system. These techniques involve
processes such as noise reduction, image resizing, and normalization.
The core element of the OCR system is the text recognition module, which
leverages CV algorithms and machine learning techniques. It involves training
a model on a sizable dataset of labeled text samples to enable accurate
recognition and classification of different characters and words. The
recognition process entails tasks such as character segmentation, feature
extraction, and pattern recognition.

Upon completion of the text recognition phase, postprocessing techniques

are applied to refine and enhance the extracted text. This entails error
correction, spell checking, and formatting to ensure the accuracy and
consistency of the recognized text.

The research paper explores various types of OCR systems and their
applications in real-world scenarios. It provides an in-depth analysis of the
experimental setup, encompassing both hardware and software
components. Detailed descriptions and implementation details of the project
setup are provided.

The acquired outcomes undergo meticulous analysis and are systematically

compared with existing OCR methodologies to assess the performance and
effectiveness of the proposed system. Furthermore, the paper discusses the
limitations and challenges encountered during the research and suggests
potential future research directions.
Introduction
OCR, a groundbreaking technology, revolutionizes the conversion of printed
or handwritten text into a format that machines can readily comprehend. By
leveraging Computer Vision (CV) techniques, OCR plays a vital role in
recognizing and extracting text, enabling businesses to digitize and efficiently
manage large volumes of documents. This digitization process leads to
enhanced productivity and streamlined workflows in modern business
environments.

Traditional documents such as forms, invoices, contracts, and handwritten

notes can be processed and analyzed with ease, eliminating the need for
manual data entry, and enabling seamless integration into digital workflows.
This technology empowers businesses to automate data extraction, conduct
analytics, and make informed decisions based on the extracted textual
information.

The methodology employed in OCR involves several key steps. The process
begins with image acquisition, where high-quality scans or images of
documents are obtained. These images then undergo preprocessing
techniques to enhance their quality, correct distortions, and improve OCR
performance. The heart of OCR lies in text recognition, where advanced CV
algorithms and machine learning methods are employed to accurately
recognize and classify characters and words. Postprocessing techniques are
applied to refine the extracted text and convert it into a computerized file for
further analysis and utilization.

The essence of OCR lies in its capacity to transform image-based text into
editable and searchable data. It empowers organizations across diverse
sectors such as banking, healthcare, logistics, and more to unlock the
potential of their textual information.

The research paper aims to investigate the effectiveness and limitations of

OCR in text recognition and extraction, examining its impact on operational
efficiency, decision-making processes, and the overall progress of document
processing. By evaluating experimental setups, detailing project specifics, and
conducting comparative analyses with existing methods, this paper seeks to
provide valuable insights and outline future research avenues in the realm of
OCR.
Methodology
The methodology employed in this research paper encompasses several key
steps and techniques aimed at achieving accurate OCR performance for text
recognition and extraction using Computer Vision (CV). The methodology can
be divided into the following stages: image acquisition, image preprocessing,
text recognition, postprocessing.
Image Acquisition: In this stage, high-quality scans or images of the
documents to be processed are obtained. Various acquisition methods, such
as scanning devices, cameras, or video frames, can be utilized to capture the
images. It is important to ensure optimal image quality and resolution to
enhance OCR accuracy.
In our project, image acquisition is performed using the OpenCV library. The
cv2.imread() function is used to read the image from the specified path
(img_path). This function reads the image file and stores it as a multi-
dimensional NumPy array, representing the image's pixel values.
img = cv2.imread(img_path)
 Image Preprocessing: - The acquired images often require
preprocessing to enhance their quality and correct any distortions or
imperfections that may hinder the OCR process. This stage includes
techniques such as deskewing to fix alignment issues, despeckling to
remove image noise or spots, and cleaning up boxes or lines in the
image to improve readability.
After image acquisition, the code applies preprocessing techniques to
7
enhance the image quality and improve the OCR accuracy. First, the
cv2.cvtColor() function is used to convert the image from the default
BGR color space to grayscale. This step simplifies the image and reduces
the complexity of subsequent operations.
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
Next, the preprocessed image is written to the disk using the
cv2.imwrite() function. This function saves the grayscale image at the
specified location (src_path + "2.jpg") to be used for further
processing. cv2.imwrite(src_path + "2.jpg", img)
 Text Recognition: - The core of the OCR system lies in the text
recognition stage. Here, advanced CV algorithms and machine learning
techniques are employed to analyze the pre-processed images and
accurately recognize and classify characters and words. Pattern
matching algorithms can be used to compare character or word images
with stored templates. Feature extraction techniques may also be
employed to break down glyphs into lines, loops, intersections, and
other features for accurate recognition.
To perform text recognition, the code utilizes the pytesseract library.
11
The pytesseract.image_to_string() function is used to extract text from
the pre-processed image. It takes the pre-processed image as input
(opened using the PIL library's Image.open() function and returns the
12
extracted text as a string.
result = pytesseract.image_to_string(Image.open(src_path + "2.jpg"))

 Postprocessing: - After the text has been recognized, postprocessing

techniques are applied to refine and improve the extracted text data.
This may involve error correction, noise reduction, or enhancing the
overall readability of the text. Additional processing steps may include
formatting the extracted text into a desired structure or converting it
into a specific file format for further analysis and utilization.
After the text has been recognized, the code performs postprocessing
6
tasks. In this case, the recognized text (result) is printed to the console
using the print() function.
print(img2txt)
Additionally, the code converts the recognized text into an audio file
using the gTTS library. The gTTS() function takes the text (img2txt) and
language (lang) as input and generates an audio file (output1.mp3)
containing the spoken version of the text.

myobj = gTTS(text=img2txt, lang='en', slow=False)

myobj.save('output1.mp3')
Finally, the code plays the generated audio file using the operating
system's default audio player. The os.system() function executes the
command to play the audio file.
os.system('output1.mp3')
At the end of the execution, the code removes the intermediate
image file (2.jpg) and the generated audio file (output1.mp3) from
the disk using the os.remove() function.
os.remove('2.jpg')
os.remove('output1.mp3')
Types of OCR
 Simple Optical Character Recognition Software: - Simple OCR engines
utilize pattern-matching algorithms to compare text images with a pre-
defined internal database of font and text image patterns. The software
matches the text character by character, and if it finds a word-level
match, it is known as optical word recognition. However, this approach
has limitations as it cannot capture and store an infinite number of font
2
and handwriting styles in its database.
 Intelligent Character Recognition Software: - Modern OCR systems
employ intelligent character recognition (ICR) technology, which
mimics human reading behavior. These systems leverage machine
learning algorithms, such as neural networks, to analyze text images at
multiple levels. The OCR software examines various image attributes
like curves, lines, intersections, and loops. By combining the results of
these analyses, the system produces the final recognition output. ICR
processes images character by character, but it accomplishes this
2
quickly, providing results in seconds.
 Intelligent Word Recognition: - Similar to ICR, intelligent word
recognition systems operate on the principle of analyzing whole word
images instead of processing them into individual characters. These
systems use advanced machine learning techniques to recognize and
interpret word-level patterns, enabling faster and more accurate word
recognition.
 Optical Mark Recognition: -Optical mark recognition (OMR) is a
specialized form of OCR that focuses on identifying specific symbols,
marks, or patterns within a document. OMR is commonly used for tasks
such as detecting and interpreting checkboxes, tick marks, barcodes,
logos, watermarks, and other graphical elements.
It is important to note that OCR technology continues to advance, and new
variations and enhancements may emerge over time. These different types of
OCR cater to various applications and requirements, enabling automated text
recognition and extraction across a wide range of documents and media.
Applications of OCR

OCR technology offers various applications across different industries,

providing numerous benefits and streamlining processes. Here are some key
applications of OCR:

1. Searchable Text: OCR enables businesses to convert their documents into

a searchable format, creating a knowledge archive. This allows for automated
data processing and analysis using data analytics software, leading to
improved knowledge management.

2. Operational Efficiency: OCR software improves efficiency by integrating

document workflows with digital processes. Examples include:
- Automated processing of hand-filled forms provides an efficient way to
verify, review, edit, and analyze data, resulting in time savings compared to
manual processing and data entry.
- Quick searching for specific terms within a database, eliminating the need
for manual sorting through physical files.
2
- Conversion of handwritten notes into editable texts and documents.

3. Artificial Intelligence Solutions: OCR is often incorporated into artificial

intelligence solutions to enhance decision-making and reduce costs.
Examples include:
2
- Reading and interpretation of number plates and road signs in self-driving
cars.
- Detection of brand logos in social media posts for marketing insights.
- Identification of product packaging in advertising images for market
analysis.
4. Banking: In the banking sector, OCR technology plays a crucial role in
processing and verifying various financial documents such as loan
applications, deposit checks, and conducting secure financial transactions.
OCR helps improve fraud prevention and transaction security. For instance,
BlueVine, a fintech company, used cloud-based OCR service Amazon Textract
to automate processing of Pay check Protection Program (PPP) loan forms,
assisting thousands of businesses during the COVID-19 relief stimulus
package.

5. Healthcare: Within the healthcare industry, OCR is utilized to streamline

the processing of patient records. This includes managing information related
to treatments, medical tests, hospital records, and insurance payments. By
employing OCR, healthcare institutions can ensure accurate and efficient
handling of patient data, leading to improved workflows and better patient
care. It streamlines workflows, reduces manual work, and ensures up-to-date
2
records. For example, the nib Group uses Amazon Textract to automate the
processing of medical invoices submitted through their mobile app,
expediting the approval of medical claims.

2
6. Logistics: OCR is utilized in logistics companies to track package labels,
invoices, receipts, and other documents efficiently. By automating invoice
2
processing, OCR improves accuracy and increases business efficiency. The
Foresight Group uses Amazon Textract to automate invoice processing in SAP,
reducing manual data entry across multiple accounting systems.

These are just a few examples of how OCR is applied in various industries to
enhance productivity, accuracy, and decision-making processes. The
technology continues to evolve and find new applications as businesses
recognize its value in data management and automation.
Project
Aim
The Aim of this project is to assist visually impaired individuals in accessing
written content by providing them with an audio representation of the text
in front of them by developing an optimized OCR-based text recognition and
extraction python program. On executing, it’ll open the camera, capture an
image within the glass frame, recognize the text, convert it into audio format,
and deliver the audio output through a micro speaker attached to glasses.

Setup
1. Equipment and Software:
- Computer system with appropriate hardware specifications.
- Webcam or camera device for capturing images.
- Operating system compatible with the required libraries and tools.
- Python programming language installed.
- OpenCV library for image processing.
- Tesseract OCR engine for text recognition.
- PIL library for image manipulation.
- gTTS library for text-to-speech conversion.
2. Dataset:
- Gather a diverse dataset of images containing text in various formats, such
as scanned documents, photographs, or screenshots.
- Ensure that the dataset covers different fonts, sizes, orientations, and
backgrounds to represent real-world scenarios.

3. Installation and Setup:

- Install the required libraries and dependencies, including OpenCV, Tesseract
OCR, PIL, and gTTS.
- Set the appropriate path for the Tesseract OCR engine in the pytesseract
module.

4. Image Acquisition:
- Use the subprocess module to open the default camera application on the
system.
- Capture an image using the camera device or import images from the
designated source path.

5. Preprocessing:
- Iterate through the images in the source path and convert them to a
standardized format (e.g., JPEG).
- Load the image using the OpenCV library.
- Convert the image to grayscale using the cv2.cvtColor() function.
6. Text Recognition:
- Save the preprocessed image as "2.jpg" in the source path.
5
- Utilize the pytesseract.image_to_string() function to extract text from the
image using Tesseract OCR.
- Store the extracted text result in the "img2txt" variable.

7. Text-to-Speech Conversion:
- Create a gTTS object with the extracted text and desired language (e.g.,
English).
- Save the generated audio output as "output1.mp3".

8. Results and Evaluation:

- Print the extracted text using the "img2txt" variable.
- Play the generated audio file using the os.system() function.
- Remove the temporary image and audio files ("2.jpg" and "output1.mp3")
from the system.
Descriptions
The code we’ll use utilizes several libraries and modules to perform various
tasks related to optical character recognition (OCR), image processing, and
text-to-speech conversion. Here is a detailed description of the libraries and
modules used:

1. cv2 (OpenCV):
- Popular computer vision library OpenCV offers tools for processing images
and videos.
- In the code, cv2 is used for reading and manipulating images, specifically
converting images to grayscale.
- It offers a range of image processing techniques, including color conversion,
filtering, and transformation.

4
2. numpy:
- Python's NumPy package is the foundational tool for numerical computing.
- Large, multidimensional arrays and matrices are supported, along with a
selection of mathematical operations that can be performed on these arrays.
- In the code, numpy is imported as np, but it is not directly used in the
provided code snippet.

3. pytesseract:
6
- pytesseract is a Python wrapper for the Tesseract OCR engine, which is a
popular open-source OCR tool.
- It allows developers to easily integrate Tesseract into their Python
applications and extract text from images.
- The code sets the path for the Tesseract OCR engine using the
pytesseract.pytesseract.tesseract_cmd attribute.
4. PIL (Python Imaging Library):
- PIL is a library for many different image file formats, allowing for opening,
editing, and storing.
- In the code, PIL is used in conjunction with pytesseract to preprocess images
before performing OCR.
- Specifically, it is used to open images and convert them to the required
format for OCR processing.

5. gtts (Google Text-to-Speech):

5
- gtts is a library that allows for text-to-speech conversion using Google Text-
to-Speech API.
- It provides a convenient way to generate audio files from text strings.
- In the code, gtts is used to create a gTTS (Google Text-to-Speech) object and
save the generated audio as an mp3 file.

6. os:
- The os module gives users a mechanism to communicate with the os.
- In the code, it is used to perform various file-related operations, such as
renaming, removing, and executing files.
- The os.system() function is used to play the generated audio file.

7. subprocess:
- The subprocess module allows the creation of new processes, providing
more control and flexibility over system commands.
- In the code, subprocess.Popen() is used to launch the default camera
application on the system.
- It allows capturing images from the camera device.
Documentation
The purpose of this project is to develop a Python algorithm that automates
the process of capturing images using the connected device's camera. Upon
execution, the algorithm activates the camera automatically and waits for the
user to capture an image of a specific area containing text. Once the image is
captured and the user presses any button, the algorithm resumes its
execution.

The algorithm then proceeds to analyze the files in the specified directory and
detects and stores the filename of any image file format in a variable. It
utilizes various imported Python libraries to analyze the captured image,
detect the presence of text, and extract the text from the image.

The recognized text is then displayed in the console, allowing the user to view
the extracted information. Additionally, the algorithm converts the extracted
text into an audio format, enabling automatic playback of the synthesized
speech.

Once the transcription and audio playback are complete, the algorithm
automatically deletes the captured image and the corresponding audio file.
This ensures that no additional secondary memory is occupied by
unnecessary files, optimizing the memory usage of the system.
Code
It’s Working

This code performs optical character recognition (OCR) on an image using the
Tesseract OCR engine and converts the recognized text into speech using the
gTTS (Google Text-to-Speech) library. Let's go through the code step by step:
1. It imports the required libraries for optical character recognition (OCR),
image processing, text-to-speech conversion, and operating system
functions.

2. It launches the default camera application on Windows using the

`subprocess.Popen()` function. This opens the camera application using the
Windows shell command `start microsoft.windows.camera:`.

3. It waits for the user to press the Enter key using the `input()` function. This
allows the user to capture an image using the camera application.

4. It sets the path to the Tesseract OCR executable using

`pytesseract.pytesseract.tesseract_cmd`. This ensures that Tesseract OCR
can be accessed correctly.

5. It initializes the `directory_path` variable with the path to the directory

containing the image files.

6. It iterates over the files in the specified directory using `os.listdir()`.

7. For each file, it constructs the full file path using

`os.path.join(directory_path, filename)`.

8. It tries to open the image file using the `Image.open()` function from the
PIL (Python Imaging Library). If the image format is JPEG, PNG, or JPG, it
assigns the filename to the variable `s` and breaks out of the loop.

9. If there is an exception while opening the image file, it is caught and ignored
using the `pass` statement.

10. It defines a function called `get_string()` that takes an image path

(`img_path`) as input.
11. Inside the function, it uses Tesseract OCR to perform OCR on the image
specified by the filename stored in the `s` variable. The recognized text is
obtained using `pytesseract.image_to_string()`.

12. The recognized text is stored in the variable `img2txt`.

13. It prints a message indicating that text recognition from the image is in
progress and displays the recognized text using `print('--- Recognizing text
from image ---'+'\n'+img2txt)`.

14. It initializes a `gTTS` object with the recognized text, language (`'en'` for
English), and speed (`slow=False` for normal speed).

15. It saves the synthesized speech as an audio file named `'speech.mp3'`

using the `save()` method of the `gTTS` object.

16. It plays the audio file `'speech.mp3'` using the default audio player on the
system using `os.system('speech.mp3')`.

17. It removes the audio file `'speech.mp3'` using `os.remove('speech.mp3')`

In summary, this code captures an image using the default camera

application, performs OCR on the captured image using Tesseract OCR,
extracts the recognized text, converts the text into speech, plays the speech
audio, and then cleans up by removing the temporary audio file generated
during the process.
Code 2.0
It’s Working

1. Imports the required libraries:

- `cv2`: OpenCV library for image processing.
- `numpy`: Library for numerical operations in Python.
- `pytesseract`: OCR (Optical Character Recognition) library for extracting text
from images.
- `PIL`: Python Imaging Library for image manipulation.
- `gtts`: Google Text-to-Speech library for converting text to speech.
- `os`: Library for interacting with the operating system.
- `subprocess`: Library for creating new processes.

2. Defines a function `preprocess_image(img)` that takes an image as input

and performs the following preprocessing steps:
- Converts the image to grayscale using `cv2.cvtColor()` function.
- Applies median blur to reduce noise using `cv2.medianBlur()` function.
- Applies Gaussian blur to further smooth the image using
`cv2.GaussianBlur()` function.
- Performs thresholding to convert the image into a binary image using
`cv2.threshold()` function.
- Creates a rectangular structuring element using
`cv2.getStructuringElement()` function.
- Applies morphological closing operation on the image to fill small holes and
gaps using `cv2.morphologyEx()` function.
- Returns the preprocessed image.

3. Opens the Windows Camera application using `subprocess.Popen()`

function and the command 'start microsoft.windows.camera:'.

4. Waits for user input using the `input()` function.

5. Defines a variable `directory_path` that stores the path of the directory

where the image files are located.
6. Initializes a variable `selected_image_path` to store the path of the
selected image.

7. Iterates through each file in the directory using `os.listdir(directory_path)`.

8. Checks if the file is a valid image file (JPEG, PNG, or JPG) using the
`Image.open(file_path)` function from the PIL library.
- If a valid image is found, the `selected_image_path` is set to the path of the
selected image and the loop is exited.

9. If a valid image is found (`selected_image_path` is not None), the image is

read using `cv2.imread()` function and stored in the `img` variable.

10. The image is preprocessed using the `preprocess_image()` function

defined earlier, and the preprocessed image is saved as 'thres.png' using
`cv2.imwrite()` function.

11. The preprocessed image is passed to the `pytesseract.image_to_string()`

function to extract the text from the image.

12. If no valid image is found, the `result` variable is set to "No valid image
found".

13. The extracted text is printed on the console.

14. The extracted text is converted to speech using the `gTTS` library by
creating a `gTTS` object with the text and language parameters. The speech
output is saved as 'speech.mp3' using the `save()` method.

15. The 'speech.mp3' file is played using the `os.system()` function.

16. The 'speech.mp3' and 'thres.png' files are removed using the
`os.remove()` function to clean up the files created during the process.
In summary, this code captures an image from the camera, performs OCR on
the image, converts the recognized text to speech, plays the audio, and cleans
up the temporary files created during the process.
Comparison with Existing Methods

Already present solution:-

***Assuming already present solution as “Code 1”

and our Solution as “Code 2” ***
Here is the comparison

Both Code-1 and Code-2 are OCR (Optical Character Recognition) codes that
extract text from images and convert it into speech using Tesseract OCR,
OpenCV, and other libraries. Let us compare and describe the differences
between the two codes:

1. Image Preprocessing:
- Code-1: In Code-1, the image undergoes preprocessing steps, including
converting it to grayscale, dilating and eroding to remove noise, applying
adaptive thresholding, and saving the processed image.
- Code-2: In Code-2, image preprocessing is separated into a function named
`preprocess_image()`. The function performs similar steps as in Code-1, but
it takes the image as input and returns the preprocessed image.
2. Image Selection:
- Code-1: Code-1 assumes that the target image to be processed is located in
the current working directory and named "2.png".
- Code-2: Code-2 searches for a valid JPEG, JPG or PNG image file within a
specified directory path. It iterates through the files, opens them with the PIL
library, and selects the first valid image found.

3. Code Structure and Organization:

- Code-1: Code-1 is a more compact code snippet without explicitly defined
functions. All the operations are performed within the main script.
- Code-2: Code-2 is structured with function definitions and distinct sections
for each step of the process, providing better code organization and
modularity.

Result

With regards to Efficiency and Optimality

Both codes are similar in terms of the OCR functionality they provide.
However, Code-2 offers a more organized and modular structure. It separates
the preprocessing into a reusable function, handles image selection more
flexibly, and has clear sections for each step of the process.

Based on these differences, Code-2 can be considered more efficient and

optimal in terms of code structure and organization, making it easier to read,
understand, and maintain.
Limitations and Challenges

Limitations Description
OCR technology has trouble
correctly identifying handwriting,
Recognition of handwriting especially when the writing is
illegible or employs unusual writing
styles
OCR technology may have trouble
understanding the layout of
Complex Structure complicated documents, such as
those with multiple columns of text,
tables, and graphs. As a result, errors
in the retrieved data or the loss of
important data may occur
When dealing with text written in
several font sizes, styles, or
Unpredictable text orientations inside a single
document, OCR technology may
have trouble
The caliber of the processed image
has a big impact on how accurate
OCR technology is. Text recognition
Picture Caliber can be impacted by elements
including image resolution,
illumination, and the presence of
shadows
It can be difficult for OCR technology
to recognize text in various
Language Recognition languages, especially when the
material is jumbled or uses unusual
writing systems
OCR integration hurdles: specialized
knowledge, ML and data
Mix Technologies compatibility, and data protection
concerns pose difficulties in
integrating OCR with other
technologies.
Future Research Directions

The present project successfully addresses the objective of assisting visually

impaired individuals in accessing written content through an optimized OCR-
based text recognition and extraction Python program. However, to further
enhance the functionality and usability of this system, future research can
focus on the integration of hardware components into the design.

One potential research direction involves the development of a specialized

microchip capable of housing the OCR program and other essential
functionalities. This microchip would need to be compact and lightweight,
ensuring it can be seamlessly embedded into the glasses. Additionally, touch
sensors can be incorporated into the glasses to facilitate user interaction and
input during the execution of the program.

To implement this hardware aspect successfully, the research would involve

exploring techniques to integrate touch sensors into the sides of the glasses.
These touch sensors would enable users to provide inputs and navigate
through the system effortlessly. Simultaneously, a flat transparent camera
can be positioned in front of the frame to capture the text accurately.

The microchip, with the OCR program installed, would play a crucial role in
processing the captured image and converting it into audio format. As part of
the future research, the development and integration of the microchip would
require meticulous attention to detail. The challenge lies in designing an
efficient and power-effective chip that can handle real-time image processing
and audio conversion tasks.

Furthermore, the small speakers attached to the sides of the glasses would
need to be optimized to deliver clear and high-quality audio output. Research
in this area could focus on developing advanced audio technologies that
provide enhanced sound reproduction while maintaining a compact and
unobtrusive design.
In Conclusion, the project's future research focuses on integrating hardware,
embedding software in a microchip, and incorporating touch sensors, a
transparent camera, and high-quality speakers into the glasses. This aims to
overcome challenges, merge components seamlessly, ensure functionality,
and enhance the user experience for visually impaired individuals, enabling
efficient access to written content.
Conclusion
In conclusion, this research paper focused on the significant role of OCR (Text
Recognition and Extraction) technology in transforming the way we process
and manage documents. OCR, powered by Computer Vision (CV) techniques,
revolutionizes the conversion of printed or handwritten text into a machine-
readable format, enabling businesses to digitize and efficiently handle large
volumes of information.

Throughout the paper, we explored the effectiveness and limitations of OCR

in text recognition and extraction, highlighting its impact on operational
efficiency, decision-making processes, and overall document processing. By
evaluating experimental setups and conducting comparative analyses,
valuable insights were gained regarding the potential of OCR in various
industries such as banking, healthcare, logistics, and more.

OCR simplifies the processing of traditional documents like forms, invoices,

contracts, and handwritten notes. It eliminates the need for manual data
entry and enables seamless integration into digital workflows. This
automation empowers businesses to extract and analyze data, make
informed decisions, and improve productivity.

The OCR methodology involves key steps such as image acquisition,

preprocessing, text recognition using advanced CV algorithms and machine
learning, and postprocessing for refining the extracted text. The essence of
OCR lies in its ability to transform image-based text into editable and
searchable data, unlocking the potential of textual information across diverse
sectors.

Moving forward, the future of OCR holds exciting possibilities. The integration
of OCR technology with other emerging technologies such as artificial
intelligence and natural language processing can further enhance its
capabilities. This integration can lead to more accurate text recognition,
improved language understanding, and better contextual analysis.
Moreover, ongoing research in OCR can focus on addressing the challenges
associated with specific types of documents, handwriting styles, and
languages. By continually refining OCR algorithms and expanding language
support, OCR can become even more reliable and adaptable to diverse text
recognition scenarios.

In conclusion, OCR technology has emerged as a powerful tool for businesses

and individuals alike, enabling the efficient processing and utilization of
textual information. The research conducted in this paper highlights the
transformative potential of OCR and emphasizes the need for further
advancements in the field. By harnessing the power of OCR and continuously
pushing its boundaries, we can create a future where the conversion of
printed or handwritten text into digital format is seamless, accurate, and
accessible to all.
Similarity Report ID: oid:16158:38435882

14% Overall Similarity

Top sources found in the following databases:
10% Internet database 1% Publications database
Crossref database Crossref Posted Content database
11% Submitted Works database

TOP SOURCES
The sources with the highest number of matches within the submission. Overlapping sources will not be
displayed.

coursehero.com
1 6%
Internet

aws.amazon.com
2 3%
Internet

Amity University on 2014-04-23

3 <1%
Submitted works

University of North Texas on 2023-04-14

4 <1%
Submitted works

D Vaithiyanathan, Manigandan Muniraj. "Cloud based Text extraction u...

5 <1%
Crossref

University of Southampton on 2023-05-04

6 <1%
Submitted works

University of Leeds on 2023-05-06

7 <1%
Submitted works

air.eng.ui.ac.id
8 <1%
Internet

Sources overview
Similarity Report ID: oid:16158:38435882

get2fiu.com
9 <1%
Internet

Galileo Global Education on 2023-05-25

10 <1%
Submitted works

Sunway Education Group on 2021-11-29

11 <1%
Submitted works

dator8.info
12 <1%
Internet

Sources overview

OCR & Text Recognition for CS Students
100% (2)
OCR & Text Recognition for CS Students
37 pages
MP Final Report
No ratings yet
MP Final Report
38 pages
PDL-III Report FINAL
No ratings yet
PDL-III Report FINAL
34 pages
OCR System Development for Text Extraction
No ratings yet
OCR System Development for Text Extraction
28 pages
A12REVIEW
No ratings yet
A12REVIEW
18 pages
Hand Written Character Recognition Using Neural Network: BACHELOR OF ENGINEERING (Computer Engineering)
No ratings yet
Hand Written Character Recognition Using Neural Network: BACHELOR OF ENGINEERING (Computer Engineering)
46 pages
EasyOCR: Multilingual Text Recognition
No ratings yet
EasyOCR: Multilingual Text Recognition
11 pages
Data Extraction From Images Through OCR-IJRASET
No ratings yet
Data Extraction From Images Through OCR-IJRASET
5 pages
Multilingual OCR Project Overview
No ratings yet
Multilingual OCR Project Overview
89 pages
OCR Model Development for Text Extraction
No ratings yet
OCR Model Development for Text Extraction
5 pages
Gupta Utkarsh
No ratings yet
Gupta Utkarsh
43 pages
Multilingual OCR Software Report
No ratings yet
Multilingual OCR Software Report
85 pages
Handwritten Digits Recognition
No ratings yet
Handwritten Digits Recognition
27 pages
Image Recognition System Project
No ratings yet
Image Recognition System Project
4 pages
B.E Cse Batchno 249
No ratings yet
B.E Cse Batchno 249
58 pages
Synopsis
No ratings yet
Synopsis
17 pages
IT ProjectManagement
No ratings yet
IT ProjectManagement
13 pages
Fin Irjmets1684836352
No ratings yet
Fin Irjmets1684836352
7 pages
Ocr Presentation
No ratings yet
Ocr Presentation
15 pages
Documentation Final
No ratings yet
Documentation Final
73 pages
Bengal College of Engineering and Technology, Durgapur: "Handwritten Text Recognition"
No ratings yet
Bengal College of Engineering and Technology, Durgapur: "Handwritten Text Recognition"
15 pages
Text Extraction From Document Image
No ratings yet
Text Extraction From Document Image
7 pages
Optical Character Recognition Using Convolutional Neural Network
No ratings yet
Optical Character Recognition Using Convolutional Neural Network
5 pages
Smart Editor: OCR Tool for Document Editing
No ratings yet
Smart Editor: OCR Tool for Document Editing
2 pages
Ai Powered Ocr For Efficient Government Documentation
No ratings yet
Ai Powered Ocr For Efficient Government Documentation
49 pages
OCR Techniques for Researchers
No ratings yet
OCR Techniques for Researchers
6 pages
Optical Character Recognition Overview
No ratings yet
Optical Character Recognition Overview
40 pages
Neural OCR for Handwriting Recognition
No ratings yet
Neural OCR for Handwriting Recognition
21 pages
Offline Urdu Nastaliq OCR
100% (1)
Offline Urdu Nastaliq OCR
161 pages
ML Report
No ratings yet
ML Report
5 pages
Optical Character Reconciliation
No ratings yet
Optical Character Reconciliation
55 pages
Hand Written Project
No ratings yet
Hand Written Project
40 pages
Mahatma Jyotiba Phule Rohilkhand University, Bareilly: Dr. Iram Naim
No ratings yet
Mahatma Jyotiba Phule Rohilkhand University, Bareilly: Dr. Iram Naim
18 pages
Optical Character Recognition: Article
No ratings yet
Optical Character Recognition: Article
5 pages
SURF-Based Object Recognition Report
No ratings yet
SURF-Based Object Recognition Report
5 pages
Document 11
No ratings yet
Document 11
10 pages
Optical Character Recognition: Article
No ratings yet
Optical Character Recognition: Article
5 pages
3 M&a
No ratings yet
3 M&a
24 pages
Ajay Kumar Garg Engineering College: 27 Delhi-Hapur Bypass Road GHAZIABAD-201001
No ratings yet
Ajay Kumar Garg Engineering College: 27 Delhi-Hapur Bypass Road GHAZIABAD-201001
9 pages
IP MINI GD (Ver02) FINAL DG
No ratings yet
IP MINI GD (Ver02) FINAL DG
18 pages
Full Text 01
No ratings yet
Full Text 01
85 pages
Design of An OCR System and Its Hardware Implementation
No ratings yet
Design of An OCR System and Its Hardware Implementation
18 pages
Optical Character Recognition Guide
No ratings yet
Optical Character Recognition Guide
5 pages
B.E Cse Batchno 178 Sathyabama
No ratings yet
B.E Cse Batchno 178 Sathyabama
56 pages
India OCR Market: SnapOCR Overview
No ratings yet
India OCR Market: SnapOCR Overview
47 pages
An Efficient OCR System Based On The Regional Feature Using The ASVM As Classifier
No ratings yet
An Efficient OCR System Based On The Regional Feature Using The ASVM As Classifier
7 pages
Digitizing Notes Using Optical Character Recognition and Automatic Topic Identification and Classification Using Natural Language Processing
No ratings yet
Digitizing Notes Using Optical Character Recognition and Automatic Topic Identification and Classification Using Natural Language Processing
10 pages
OCR Project Report PDF
No ratings yet
OCR Project Report PDF
24 pages
3 - Round The Clock Virtual Friend - Report
No ratings yet
3 - Round The Clock Virtual Friend - Report
41 pages
Development of Text Extraction Technique 3acb33e9
No ratings yet
Development of Text Extraction Technique 3acb33e9
8 pages
OCR on Grid Infrastructure
No ratings yet
OCR on Grid Infrastructure
9 pages
Project Report Object Detection
No ratings yet
Project Report Object Detection
75 pages
Optical Character Recognizer: Team Member
No ratings yet
Optical Character Recognizer: Team Member
7 pages
Optical Character Recognition - Project Report
100% (1)
Optical Character Recognition - Project Report
84 pages
Sample Project Report Template
No ratings yet
Sample Project Report Template
6 pages
Optical Character Recognition System Using Artific
No ratings yet
Optical Character Recognition System Using Artific
7 pages
Optical Character Recognition Techniques
No ratings yet
Optical Character Recognition Techniques
6 pages
Pine Script v5 User Manual
100% (1)
Pine Script v5 User Manual
509 pages
Grade 7 Computer Hardware Basics
No ratings yet
Grade 7 Computer Hardware Basics
7 pages
Data Integration
No ratings yet
Data Integration
6 pages
Chapter 2
No ratings yet
Chapter 2
43 pages
C++ Array of Objects and Constructors
No ratings yet
C++ Array of Objects and Constructors
18 pages
Thesis Help for Wireless Networks
100% (3)
Thesis Help for Wireless Networks
7 pages
Chapter-6 Memory Management
No ratings yet
Chapter-6 Memory Management
24 pages
Windows Services - All Roads Lead To System: Kostas Lintovois
No ratings yet
Windows Services - All Roads Lead To System: Kostas Lintovois
21 pages
Project - Report - Rajni (200002065)
No ratings yet
Project - Report - Rajni (200002065)
32 pages
CIS 4520 Exam Review Topics Summary
No ratings yet
CIS 4520 Exam Review Topics Summary
16 pages
Rise of GPT Models in AI
No ratings yet
Rise of GPT Models in AI
3 pages
Unit-4 Memory Notes
No ratings yet
Unit-4 Memory Notes
25 pages
Nexsan User Manual v3.4
0% (1)
Nexsan User Manual v3.4
192 pages
Intro to Programming Basics
No ratings yet
Intro to Programming Basics
33 pages
MX BNG 17.x-18.x New Features v02 (ENG)
No ratings yet
MX BNG 17.x-18.x New Features v02 (ENG)
48 pages
UserGate Proxy & Firewall Guide
No ratings yet
UserGate Proxy & Firewall Guide
60 pages
Java Important Questions
No ratings yet
Java Important Questions
2 pages
BulSU2 Est Part
No ratings yet
BulSU2 Est Part
39 pages
Von Neumann Havard
No ratings yet
Von Neumann Havard
11 pages
Dkag
No ratings yet
Dkag
34 pages
Java Collections Interview Questions
No ratings yet
Java Collections Interview Questions
11 pages
2.2.3.a UniversalGatesNORLogicDesign
No ratings yet
2.2.3.a UniversalGatesNORLogicDesign
7 pages
Memory Organization Memory Hierarchy 2.2.1
No ratings yet
Memory Organization Memory Hierarchy 2.2.1
3 pages
Uttah Release
No ratings yet
Uttah Release
81 pages
Gui Unit 1
No ratings yet
Gui Unit 1
22 pages
X4300 Series Firmware Update V6.H6.00
No ratings yet
X4300 Series Firmware Update V6.H6.00
14 pages
AI Fruit Sorting for Industry
No ratings yet
AI Fruit Sorting for Industry
3 pages
Atlassian - LeetCode
No ratings yet
Atlassian - LeetCode
2 pages
Acoustic Camera Sonascreen Flyer
No ratings yet
Acoustic Camera Sonascreen Flyer
2 pages
Makefile and Python Programming Guide
No ratings yet
Makefile and Python Programming Guide
23 pages