Digital Document Imaging Is Not Just Scanning
Scanning paper documents into digital images is what we think of when we hear digital document imaging. However, digital document imaging involves additional processing before the image becomes useful content.
While the scanner reads light information from the document or object being scanned, the scanner software saves this information in a graphic format like JPEG. This is the first processing task.
Often, the image quality leaves something to be desired. There might punch hole marks, black borders, illegible characters owing to poor contrast (such as blue ink characters on a bluish background) and so on.
Sophisticated scanning software could remove undesired elements from the digital image and adjust contrast to provide legible characters (often better than on the original). Improving image quality could thus be the second processing task.
Where the digital image contains text characters, these would be in a format that would not be readable by the computer. Text characters need to be converted into ASCII or other text-specific format before machines can read it.
Such conversion is done by character recognition software using OCR, ICR etc technologies. Here again, more sophisticated software could distinguish between closely similar characters and give the right output. This is the next processing task for text documents.
Paper-based text documents thus need to be scanned and character recognized before it can become a proper digital text document. These processes might require advanced capabilities to provide consistently high quality images and accurate character recognition.
Even though the text document image is editable now, it cannot yet be said to be part of the electronic workflow or an integral part of the enterprise content. To achieve this status, one more processing step is needed, indexing.
Indexing Documents
Indexing involves attaching the words in the document content, or the document tags, to the document itself. It would then be possible to retrieve the document based on those words.
You enter the words into a search box, and the search facility would bring up a list of documents that have been indexed for those words. You can then select the particular document you want from the list.
It is at this stage that the document can be said to be a true part of the enterprise content. Till then, the document might not be retrievable even though it might reside on the computer storage media.
If it is not retrievable, it cannot become part of the workflow processes.
Once the document has become part of the enterprise content, it can become available on the Intranet and extranet. People in the organization, whether they work locally or in a geographically distant location, can access it if they have necessary access rights.
No large enterprise of today can stay competitive if it tries to manage information flows using the traditional paper/folder/filing cabinet/file room/cumbersome retrieval methods.
Digital document imaging is more than scanning. Scanned images have to be saved in a recognized graphic format. Any text in the image has to be made machine-readable using character recognition technologies. Quality of the image and accuracy of the character recognition might have to be improved using sophisticated processing software. The text documents would then need to be indexed before they become an integral part of the enterprise workflow.
