It Is Document Imaging Software That Does the Real Work?
While hardware devices like the CCD in the scanner sense the light information from the scanned document, it is document-imaging software like scanner software that does most of the work. The scanner software saves the sensed red-green-blue light information in a standard graphic format like JPEG.
Text in Document Images
The saved JPEG image might contain text characters (in a business context, most of the scanned documents would be text documents). These text characters are human-readable but not yet computer-readable.
Unless the text is computer-readable, you can't edit the document or work with it filling up any additional data. If you want to use the text document in your workflow, you would most likely need to convert it into a machine-readable text format such as ASCII.
Character recognition software like Optical Character Recognition - OCR - and Intelligent Character Recognition - ICR - do this task. They make the text characters machine-readable.
Making Documents Searchable
The machine-readable document is not yet an integral part of the enterprise content. To make it an integral part, it needs to be indexed and its meta information included in the relevant index file.
Again, it is a piece of software, the indexing software, that does the job. It links the document to the words in its content (full-text indexing) or in the tags attached to the document (tag indexing). It then becomes possible to retrieve the document with the linked words.
Indexing software can even work automatically if the selected option is full-text indexing. For tag-based indexing, you have to attach relevant tags (usually words that people use to look for the document) to the document.
Fine-Tuning Images and Character Recognition
The original scanned images might not be all that good. It might contain illegible characters owing to poor contrast in the paper document, unsightly black borders, distorted characters owing to folds in the paper and so on.
Here again, document imaging software can come to your help. Programs with sophisticated algorithms can help you get images that might be better than the original paper document.
Character recognition could also face problems. For example, closely similar characters could be confused with each other, producing unreliable text documents. OCR programs are now available that can handle such problems and carry out highly reliable character recognition.
Selection of Software
There are some issues you should be aware of when selecting document imaging software. The selected programs must be compatible with your existing system - scanners, document management system, operating system and so on.
Most software would have been made compatible with well-known scanner brands and major document management systems. You only have to check that the software you buy would work with your systems. Backward compatibility with earlier versions of the software can also be an issue if you have document images created with the latter.
Cost is another criteria. Going for unnecessarily sophisticated features can push up your costs. Assess your needs and select only what you want, now and in the near future.
It is document imaging software that does most of the work related to document imaging - such as saving the image in a standard format, making text characters in the image machine-readable, indexing the documents to integrate it with the enterprise content and so on.
You should select software that is compatible with your existing systems.
