0% found this document useful (0 votes)
80 views24 pages

DigiDoc - Transforming Handwritten Prescriptions Into Digital Clarity

Uploaded by

pritish
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
80 views24 pages

DigiDoc - Transforming Handwritten Prescriptions Into Digital Clarity

Uploaded by

pritish
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

DigiDoc

Transforming Handwritten Prescriptions into Digital Clarity


DigiDoc: Digital Prescription Solution

Introduction
Understanding Doctor’s handwriting is no easy task. Important documents like medical prescriptions when
interpreted wrongly leads to deaths of many people. FDA has estimated that around 7000 lives have been lost due
to misinterpretation of Doctor’s handwriting. DigiDoc is a terminal/web based application that digitizes the doctor’s
prescription and produces image/pdf file with clear fonts for better understanding.

Contents of this file


● Installation and Quick Start
● User Manual
● Underlying Algorithms
● Code Architecture
● Testing

Installation and Quick Start


Python version 2.7

Dependencies installation
python get-pip.py
apt-get -y install python-opencv
virtualenv venv
source venv/bin/activate
pip install matplotlib
pip install numpy
pip install pyimagesearch
pip install reportlab
apt-get install git
git clone https://github.com/tmbdev/ocropy.git
python setup.py install

The entire application comprises of two subparts :


● User Interface - Comprises of PHP , Javascripts , CSS
● Python Scripts - Preprocessing , Segmentation , detection and prediction.

User Interface
The index.php file present inside the UI folder runs on a local server (localhost) . An image has to be be uploaded
in jpg/jpeg format .
The final processed image can be downloaded in text/pdf format once the processing is done .
Link to the history is provided in the main page and various prescription (sorted as date wise ) can be downloaded
in text/ pdf formats .
History can directly be accessed by running history/ history.php on a local server.

Directories present in UI
● uploads - It comprises of all the uploaded images .
● edited_image - It comprises of the finally processed image .
● text_files - It consists of the prescriptions in the text format .

Python Scripts
The smart_ocr.py present in the folder is run automatically once the the image has been uploaded .
This file runs in background cmd via upload.php script present in the UI folder. The smart_ocr.py imports
all the other remaining python scripts related to scanning and perspective orientation etc .
Underlying Algorithms
1. Image Reorientation
In order to get good results on segmentation and then character recognition we first
preprocess the image. Goal was to remove the background from the raw image and then make it
more sharper.
Removing the background noise and then generate the “Bird’s Eye View” of the image.This
was done in in 3 steps.

a. Detect the edges.


i. Image converted to Grayscale
ii. Used Gaussian Blur to remove the noise
iii. Perform Canny Edge Detection
b. Use the edges to get the contour of the image
i. Find contours in the edged image
ii. Assumption- Largest contour in the image with exactly 4 points is the desired image
iii. Sort the contours by area and take the largest one
iv. If the contour selected has four points than we have found the document.

c. Apply the perspective transform to get the top down view of the document present in the
image.
i. Used the four point transform function from pyimagesearch library
ii. Warped image is converted to grayscale
iii. Adaptive thresholding done to give it a scanned look and which will increase
sharpness of the image.
2. Segmentation
Next major task is to extract the text segments from the image. Now for this we rely on the
fact the density of pixels near the text area is relatively much higher than remaining page. So
rather than using any readymade tool we design our own algorithm of ‘Histogram’ with highly
tunable capacity depending on the image.

So the principle of histogram is as follow -


● Add all the pixels value in a particular row(or column) and iterate this for all rows(or
columns)
● The rows(or columns) with texts will show high peaks of pixel values. As shown in image
below.
● Next we narrow down the area to selectively high rows(or columns) where we estimate the
texts and then apply histogram on it again in the other axes.
Fig: Histogram to detect text

In the image above we showed for a graph that is even more difficult to detect than regular
text on paper for the reason that graph itself has high density of random lines. We first added all
pixels in a column for all columns and plotted histogram at bottom . The we narrowed down area
to region with high peaks. We eliminated narrow peaks. Now for selected area we added all pixels
in a row(over selected region) for all rows. So we have complete coordinates of all lines. Even in
such noisy image we could separate out text. This shows our algorithms proof of concept.

3. Text and Word Segmentation


So when we apply it on a scanned prescription some region of paper may be colored.So it
will increase the average pixel density making extraction difficult. So we separate out areas with
different based on color of page and apply normalization (based on top x-percentile)

Fig : Left
side -
Before normalization , Right Side - After partitioning and individual normailation

As shown in picture above we separate grey colored parts from non colored part and apply
normalization individually.

3.1. Stage 1 : Finding basic text regions in page


Next we separate out the out the lines of text based on histograms.

Fig:

histograms along rows and corresponding generated lines


Now we we select areas with these text and apply histogram along columns. This will give
us complete broad text boxes.

Fig: Sample text region

Fig: Histogram to find text region specifically

Output of the first stage:-


Fig: original grayscale image Fig: 1st stage output

3.2. Stage 2: Separate Lines within text region

Next we find lines within these text boxes. By that we mean we find if there are multiple
lines within a same textbox. So we normalize and apply histogram within text boxes.

Fig: Sample text region with multiple lines Fig : segmented lines
Output of Second Stage 2:-
Fig: Stage 1 output Fig: stage 2 output (with segmented sublines)

3.3 Stage 3 (Final segmentation): Finding individual words in page

Now that we got individual lines precisely we now extract individual words. SO we use
histograms to break text lines at the most logical point to extract words. An example has been
shown in 3.4 (proof of concept)
Fig:

Stage 2 output with line accuracy Fig: stage 3 - Final words segmented

Note - The random boxes will be eliminated by OCR

3.4. Proof of Concept - Robustness of Our Model


Doctors save our lives regardless of circumstance even when we suffer due to our faults. So
the least we can give them is by ignoring their handwriting faults … And hence making it our
responsibility to adjust according to their handwriting.

1. We know doctors generally write in improper alignment (slant lines) So we seperate lines
regardless of it.

Fig: Despite Improper line writing, the histograms able to seperate the two lines.

2. We know doctors love to scribble with intermixing words so we separate out logical words
regardless of it ….. Separating the inseparable ….

Fig: Regardless of the level of scribbling we separate out each words

3. And Finally we go beyond the page color …. We go beyond language ….

Fig : Regardless of page color our normalization works. Our algorithm is unaffected by language

So our algorithm is highly robust .


4. Information Extraction
Now we have coordinates of each box. We use our OCR (explained in later section) and pass
each word to ocr and store the OCR result with coordinates. We then use our format information to
make sense out of the OCR data as shown in image

5. Offline Handwriting Recognition


This is the bottleneck in the pipeline, as the accuracy of prediction is contingent on the quality of
handwriting recognition. But the state of the art in handwriting still struggles to understand handwriting
that is even slightly complex.

For this project, we used an open-source project build on top of the idea of LSTM for handwriting
recognition called ocropy (https://github.com/tmbdev/ocropy).

We realised for the handwriting to be efficient, we decided to train the LSTM on the handwriting
of a certain doctor and then use it to constantly digitize his handwriting.
Simple RNN

LSTM Chain

These were the steps involved in Optical Character Recognition:

1. We first train a model on the handwriting of the doctor, using an image of annotated text
written by the doctor. Sample image we used:
2. The we save the model and use it for the OCR of the image passed from the segmentation
algorithm.

Example segment sent for extraction:

Result extracted:

iike he mahbe hea

6. Medicine name matching


We have extracted medicine names, diseases and symptoms data from online websites and
built a corpus which is used to match the output of the OCR with the corpus for correct prediction.
These are the following steps used for the matching algorithm.
● Process the text output of the ocr, to remove noise characters.
● We generate sequences by using consecutive n-grams from the output.
● We use upto 4-grams, for better accuracy.
● We iterate over each n in the n-gram and apply fuzzy pattern matching with the entire
corpus.
● After each iteration we sort the corpus on the basis of the value of the pattern matching,
then we discard the bottom 50% of the corpus.
● This is done upto 4-grams. At the end the word with the highest score/value is returned.

When tested on our corpus of ~14000 words, we can accurately match the medicine
provided the input captures at least 2-3 bigrams.

Flowchart of the Application


User Interface
The entire application is made in the form of a website .Input format in a image file (JPG, JPEG format) and the
output will come in PDF and text . The same page allows to user to visit the history of previously processed
images.

Once the image is uploaded and the image is sent for further processing , the progress bar loads .
Once the preprocessing is done on the uploaded image(Grayscale conversion, Top view format) , the main
algorithm which comprises of segmentation , OCR detection and prediction runs on the preprocessed image . The
final page shows both the uploaded as well as the final processed image . It also has the option to download the
file in pdf/text format.
Our Application also provides access to the history section . The various prescriptions are sorted date wise and the
raw image as well as the final processed text/pdf file can also be downloaded. One can also navigate to the main
page of the application using the back button.
Assumptions:
Due to presence of a vast number of layouts used by Doctor we are fixing a layout and our application is build
keeping in mind that layout.

Following are the assumptions for the input image.


● Image resolution should be equal to or greater than 1024*720 for higher accuracy.
● Clinic Name and Doctor’s details should be present in printed format along with some other details like
Patient name tag, age tag, sex tag.
● Remark and Advice should be present in top 30% of the page.
● Medicine details are present in the remaining 70% of the page. Left 70% contains medicine names and
right 30% contains medicine doses.
● Header contains Clinic’s name
● Footer has details like phone number and address.
● Both header and footer are in printed format.
● No diagrams/symbols should be present in the handwritten areas.

You might also like