Delete comment from: Google Operating System
WOW! I was officially sock-less after I did this:
1) Scanned a 2-page paper doc (some plain text, some texty images) at 300dpi to 'blind' PDF [800kb]. Blind meaning no searchable text, just an image.
2) Input PDF to free-online-ocr (FOO).
3) Output as PDF [1800kb](yes, it sounds silly, thus it was the last combo I tried)
4) It looked exactly like the original. Searched for words that FOO and Google Docs (GDX) had trouble converting to text formats in previous combos. No trouble. As far as I can tell EVERY word in this PDF (although not in the images in the PDF) was searchable. This is where I lost my sox.
5A) Uploaded to GDX without conversion to GDX format. Every word remained searchable.
5B) Uploaded to GDX WITH conversion to GDX format...all the trouble words are messed up again.
Moving forward I'll convert 'blind' pdfs to searchable pdfs via FOO then upload to GDX for searchable storage.
Jan 13, 2011, 7:51:56 PM
Posted to Google Adds OCR for PDF Files and Images