Reading Text from the Image using Tesseract
Last Updated :
01 Dec, 2022
Pytesseract or Python-tesseract is an Optical Character Recognition (OCR) tool for python. It will read and recognize the text in images, license plates, etc. Here, we will use the tesseract package to read the text from the given image.
Mainly, 3 simple steps are involved here as shown below:-
- Loading an Image saved from the computer or download it using a browser and then loading the same. (Any Image with Text).
- Binarizing the Image (Converting Image to Binary).
- We will then Pass the Image through the OCR system.
Implementation:
The following python code represents the Localizing of the Text and correctly guessing the text written in the image.
Python3
# We import the necessary packages
#import the needed packages
import cv2
import os,argparse
import pytesseract
from PIL import Image
#We then Construct an Argument Parser
ap=argparse.ArgumentParser()
ap.add_argument("-i","--image",
required=True,
help="Path to the image folder")
ap.add_argument("-p","--pre_processor",
default="thresh",
help="the preprocessor usage")
args=vars(ap.parse_args())
#We then read the image with text
images=cv2.imread(args["image"])
#convert to grayscale image
gray=cv2.cvtColor(images, cv2.COLOR_BGR2GRAY)
#checking whether thresh or blur
if args["pre_processor"]=="thresh":
cv2.threshold(gray, 0,255,cv2.THRESH_BINARY| cv2.THRESH_OTSU)[1]
if args["pre_processor"]=="blur":
cv2.medianBlur(gray, 3)
#memory usage with image i.e. adding image to memory
filename = "{}.jpg".format(os.getpid())
cv2.imwrite(filename, gray)
text = pytesseract.image_to_string(Image.open(filename))
os.remove(filename)
print(text)
# show the output images
cv2.imshow("Image Input", images)
cv2.imshow("Output In Grayscale", gray)
cv2.waitKey(0)
Now, follow the below steps to successfully Read Text from an image:
- Save the code and the image from which you want to read the text in the same file.
- Open Command Prompt.Go to the location where the code file and image is saved.
- Execute the command below to view the Output.
Example 1:
Execute the command below to view the Output.
python tesseract.py --image Images/title.png
We have The Original Image displayed.
title
We have the GrayScale Image Displayed. (p.png)
p
Output:

Example 2:
Execute the command below to view the Output.
python tesseract.py --image Images/OCR.png
We have The Original Image displayed.
OCR
We have the GrayScale Image Displayed. (p.png)
p
Output:

Similar Reads
Python | Using PIL ImageGrab and PyTesseract ImageGrab and PyTesseract ImageGrab is a Python module that helps to capture the contents of the screen. PyTesseract is an Optical Character Recognition(OCR) tool for Python. Together they can be used to read the contents of a section of the screen. Installation - Pillow (a newer version of PIL) pip
2 min read
Text Localization, Detection and Recognition using Pytesseract Pytesseract or Python-tesseract is an Optical Character Recognition (OCR) tool for Python. It will read and recognize the text in images, license plates etc. Python-tesseract is actually a wrapper class or a package for Googleâs Tesseract-OCR Engine. It is also useful and regarded as a stand-alone i
3 min read
MATLAB - Read images using imread() function MATLAB stands for Matrix Laboratory. It is a high-performance language that is used for technical computing. It was developed by Cleve Molar of the company MathWorks.Inc in the year 1984. It is written in C, C++, Java. It allows matrix manipulations, plotting of functions, implementation of algorith
2 min read
Reading an image in OpenCV using Python Prerequisite: Basics of OpenCVIn this article, we'll try to open an image by using OpenCV (Open Source Computer Vision) library.  Following types of files are supported in OpenCV library:Windows bitmaps - *.bmp, *.dibJPEG files - *.jpeg, *.jpgPortable Network Graphics - *.png WebP - *.webp Sun raste
6 min read
Convert Text Image to Hand Written Text Image using Python In this article, we are going to see how to convert text images to handwritten text images using PyWhatkit, Pillow, and Tesseract in Python. Module needed: Pytesseract: Sometimes known as Python-tesseract, is a Python-based optical character recognition (OCR) program. It can read and recognize text
2 min read