DigiDoc - Transforming Handwritten Prescriptions Into Digital Clarity
DigiDoc - Transforming Handwritten Prescriptions Into Digital Clarity
Introduction
Understanding Doctor’s handwriting is no easy task. Important documents like medical prescriptions when
interpreted wrongly leads to deaths of many people. FDA has estimated that around 7000 lives have been lost due
to misinterpretation of Doctor’s handwriting. DigiDoc is a terminal/web based application that digitizes the doctor’s
prescription and produces image/pdf file with clear fonts for better understanding.
Dependencies installation
python get-pip.py
apt-get -y install python-opencv
virtualenv venv
source venv/bin/activate
pip install matplotlib
pip install numpy
pip install pyimagesearch
pip install reportlab
apt-get install git
git clone https://github.com/tmbdev/ocropy.git
python setup.py install
User Interface
The index.php file present inside the UI folder runs on a local server (localhost) . An image has to be be uploaded
in jpg/jpeg format .
The final processed image can be downloaded in text/pdf format once the processing is done .
Link to the history is provided in the main page and various prescription (sorted as date wise ) can be downloaded
in text/ pdf formats .
History can directly be accessed by running history/ history.php on a local server.
Directories present in UI
● uploads - It comprises of all the uploaded images .
● edited_image - It comprises of the finally processed image .
● text_files - It consists of the prescriptions in the text format .
Python Scripts
The smart_ocr.py present in the folder is run automatically once the the image has been uploaded .
This file runs in background cmd via upload.php script present in the UI folder. The smart_ocr.py imports
all the other remaining python scripts related to scanning and perspective orientation etc .
Underlying Algorithms
1. Image Reorientation
In order to get good results on segmentation and then character recognition we first
preprocess the image. Goal was to remove the background from the raw image and then make it
more sharper.
Removing the background noise and then generate the “Bird’s Eye View” of the image.This
was done in in 3 steps.
c. Apply the perspective transform to get the top down view of the document present in the
image.
i. Used the four point transform function from pyimagesearch library
ii. Warped image is converted to grayscale
iii. Adaptive thresholding done to give it a scanned look and which will increase
sharpness of the image.
2. Segmentation
Next major task is to extract the text segments from the image. Now for this we rely on the
fact the density of pixels near the text area is relatively much higher than remaining page. So
rather than using any readymade tool we design our own algorithm of ‘Histogram’ with highly
tunable capacity depending on the image.
In the image above we showed for a graph that is even more difficult to detect than regular
text on paper for the reason that graph itself has high density of random lines. We first added all
pixels in a column for all columns and plotted histogram at bottom . The we narrowed down area
to region with high peaks. We eliminated narrow peaks. Now for selected area we added all pixels
in a row(over selected region) for all rows. So we have complete coordinates of all lines. Even in
such noisy image we could separate out text. This shows our algorithms proof of concept.
Fig : Left
side -
Before normalization , Right Side - After partitioning and individual normailation
As shown in picture above we separate grey colored parts from non colored part and apply
normalization individually.
Fig:
Next we find lines within these text boxes. By that we mean we find if there are multiple
lines within a same textbox. So we normalize and apply histogram within text boxes.
Fig: Sample text region with multiple lines Fig : segmented lines
Output of Second Stage 2:-
Fig: Stage 1 output Fig: stage 2 output (with segmented sublines)
Now that we got individual lines precisely we now extract individual words. SO we use
histograms to break text lines at the most logical point to extract words. An example has been
shown in 3.4 (proof of concept)
Fig:
Stage 2 output with line accuracy Fig: stage 3 - Final words segmented
1. We know doctors generally write in improper alignment (slant lines) So we seperate lines
regardless of it.
Fig: Despite Improper line writing, the histograms able to seperate the two lines.
2. We know doctors love to scribble with intermixing words so we separate out logical words
regardless of it ….. Separating the inseparable ….
Fig : Regardless of page color our normalization works. Our algorithm is unaffected by language
For this project, we used an open-source project build on top of the idea of LSTM for handwriting
recognition called ocropy (https://github.com/tmbdev/ocropy).
We realised for the handwriting to be efficient, we decided to train the LSTM on the handwriting
of a certain doctor and then use it to constantly digitize his handwriting.
Simple RNN
LSTM Chain
1. We first train a model on the handwriting of the doctor, using an image of annotated text
written by the doctor. Sample image we used:
2. The we save the model and use it for the OCR of the image passed from the segmentation
algorithm.
Result extracted:
When tested on our corpus of ~14000 words, we can accurately match the medicine
provided the input captures at least 2-3 bigrams.
Once the image is uploaded and the image is sent for further processing , the progress bar loads .
Once the preprocessing is done on the uploaded image(Grayscale conversion, Top view format) , the main
algorithm which comprises of segmentation , OCR detection and prediction runs on the preprocessed image . The
final page shows both the uploaded as well as the final processed image . It also has the option to download the
file in pdf/text format.
Our Application also provides access to the history section . The various prescriptions are sorted date wise and the
raw image as well as the final processed text/pdf file can also be downloaded. One can also navigate to the main
page of the application using the back button.
Assumptions:
Due to presence of a vast number of layouts used by Doctor we are fixing a layout and our application is build
keeping in mind that layout.