Scanning Images into Words

November 15, 2008 | Leave a Comment

The worlds largest search engine is turning scanned images into words, searchable words, words that you can find. In the past, scanned documents containing image(s) only were rarely included in Google’s search results due to unrecognizable content. Today, that all changes. Google is now able to perform Optical Character Recognition (OCR) on any scanned document that they find stored in the Adobe Portable Document Format (PDF).

 

 

 

 

This Optical Character Recognition (OCR) technology lets Google convert an image (of a thousand words) into a thousand words — words that can be searched and indexed, so that these valuable documents are more easily found.

By contrast, PDF documents that are made up of image(s) only are typically created by fax machines or are scanned documents, and up until now were not indexed. These files usually contain images of text, rather than the text themselves.

What does this mean? It’s a game changer when it comes to automation, tracking, indexing and retrieval of documents online—everything from government reports, peer-reviewed journals, academic papers and even certificates of insurance. To see Google’s new OCR system at work have a look at the following two links:

Adobe Acrobat PDF scanned (image only): ACORD 25-S (7/97)
Google OCR (converted to HTML): ACORD 25-S (7/97)