Does ICR Keep Paper Forms Viable?
A challenge for modern organizations is the need to accept data input from large numbers of individuals, businesses, or government agencies. The U.S. Internal Revenue Service, healthcare organizations, student testing services, businesses conducting global surveys, and companies interacting with a variety of individuals often must accept paper-based forms as a low-technology solution to obtaining data.
Those using paper forms do not have a computer linked to electronic mail or the Internet, and even well-connected individuals do not always have a computer available when they need to place orders, fill out credit card receipts, or file requests for healthcare coverage. Paper forms are still the least expensive data capture device for many business applications where individuals without network connections must provide data for entry into a computer system. Even today's much-acclaimed Internet browser interfaces do not help computer applications when the data must be collected from constantly changing individuals and constantly changing locations.
Processing data entry forms that contain handwritten responses requires software with character recognition capabilities that dramatically exceed those of standard optical character recognition (OCR) software. Whereas OCR software reads typewritten characters, data on paper forms filled out by human hands is far more difficult to decipher and recognize than consistent typewritten text.
By its very nature, ICR performs a data conversion during the capture process: the ICR software recognizes scanned images of data then converts it to American Standard Code for Information Interchange (ASCII) text. To the extent that the data is converted accurately, it does not differ from the original paper business record. However, if a handwritten number one is misread during ICR as the lower-case alphabet letter l, then the electronic record does not reflect the content of the original paper record.
A records management question arises in the industry practice of discarding paper-based input forms once the data has been entered into a computer system. It is not uncommon for credit card receipts and other such transitory data collection devices to be discarded once data entry has occurred. In some applications where paper-based forms are used for data gathering and subsequent data entry, the original paper records are stored off-site in a commercial records storage center. In other cases, the paper records are discarded as their data is entered into the computer system. No auditable trail is retained to show what appeared on the original form, making it more difficult to resolve disputes that arise regarding data accuracy in newly created electronic records.
Paper Is Still Practical
Much has been said about the demise of paper as a medium for human communications. Electronic document management systems, electronic mail, and word processing software are often presented as technologies that can lead to the paperless office. These computer-based solutions to the paper glut do have many advantages, including their ability to support modern office needs for comprehensive document life cycle management, fast communication, and efficient information storage and retrieval. However, paper is still the medium chosen by most individuals for reading documents, collaborating on large-format drawings, and responding to informal surveys. Most people would prefer reading an easily browsed paper magazine or report to scrolling around through screens of its electronic equivalent on a laptop computer.
Many business activities that need to gather computer input data can be frustrated by an inability to exercise control over the data source. For example, when healthcare management organizations accept insurance claims from individuals or physicians' groups, they contend with requests from entities that are without any consistent form of computer-based office automation. Doctors' offices use various computer systems and software, and some groups persist in filling out standardized healthcare request forms by hand. Individuals may also submit claims for coverage.
Even though standard forms are used, the healthcare insurance providers have virtually no control over the computing architecture used by those requesting coverage. For this reason, handwritten and typewritten forms dominate claims submission for healthcare insurance coverage.
Forms-based electronic data interchange software or Internet browser software would seem a good solution to submitting data. However, healthcare insurance providers' inability to affect how users request coverage indicates that a low-tech, paper-based form solution is still the most effective method of obtaining data.
When someone orders products without a computer available, the most convenient and effective method is to fill out a paper form, then fax or mail it. For companies to receive and process electronic product orders from the general public would require implementing computer systems and software accessible by a highly diverse population with varied computer systems of their own (Intel/ Windows, UNIX, Apple, etc.), a very difficult undertaking. Many individuals still do not have regular, reliable access to any computer system and would be unable to participate in a buying scheme limited to orders placed electronically. It is much less expensive to simply accept properly formatted typewritten or handwritten forms, then perform ICR processes that recognize data on those forms.
Businesses also must interact with other businesses by processing invoices for services or products. An automobile manufacturer may receive invoices for parts from small suppliers who are unable to participate in the electronic data interchange (EDI) systems that the manufacturer has established with its larger vendors. Smaller firms may need to submit paper forms because they can not afford the investment in computer system hardware and software that EDI requires. In such cases, paper forms and ICR technology are often the most cost-effective solution.
Organizations conducting surveys (e.g., gathering census data), have similar challenges. How could they possibly design a computer system that would allow all potential survey respondents to submit data? The variety of respondents makes such an undertaking inherently impossible. In contrast, almost anyone can fill out a paper form.
Paper forms are still very much in use today for all of these reasons. There is little initial cost per form, no technology required, and, most importantly, forms are easy to use. However, the data that appears on those forms must eventually be loaded into a computer system to be useful.
The traditional business model for getting handwritten data into a computer system has been to contract for data entry services. Firms that perform this service accept large volumes of paper records that are keyed into a computer software for later return to customers as ASCII files. The files are then loaded into the customer's database for subsequent processing.
This approach to data entry is highly effective, as special software ensures that correct data is entered from the forms. Key-to-disk software, such as that produced by Viking Software Services Inc. and Lifetime Software Technologies Inc., uses extensive data entry correction rules to prevent errors such as double entries, numbers in character fields, impossible dates, and other data anomalies that can arise during manual data entry.
Manual data entry is a slow, laborious, and expensive process compared to computer recognition of text and its subsequent processing. Data entry specialists are becoming increasingly difficult to employ in the United States since it is tedious, boring work. Frequently, data entry is outsourced to firms in Mexico, the Caribbean, and India, where labor is far less expensive and the desire to obtain any computer systems work is much higher than in America.
In fact, paper forms or their scanned images can be sent overseas, the data entered into a database, and the database file returned less expensively than if the forms were processed in the United States. Depending on the consistency and standardization of this increasingly electronic business activity, numerous records authenticity issues may arise. The creation of data entry audit trails can become quite important.
How Does ICR Work?
Intelligent character recognition software potentially reduces the need for manual entry of computer data. When used successfully, 95 percent to 99 percent of a form's text characters can be recognized and stored in the appropriate database fields. Software such as that from Associated Solutions Inc., Captiva Software Corp., and RRI Inc. uses neural-net-based ICR engines that recognize and process text very efficiently. Here's how the process works:
Place a form or document into a document scanner connected to a computer workstation. The document's appearance gets captured as a raster image in the manner usually associated with document imaging software; a bit-mapped image comprised of detailed black dots on a white background is produced, usually as a tagged image format (TIF) file. Software then performs image analysis to properly align the image, match ICR zones with expected data fields, and begin categorizing the data as hand print, typed characters, or other data types.
Lines and boxes on the form may be dropped out, and image despeckling or other image enhancement activities may occur. An ICR template is used to identify fields of character data on the image. These fields are broken down into discrete characters that are then classified by ICR algorithms and assigned confidence values by the ICR recognition engine. The engine ranks alternate possibilities for characters and then chooses the most likely characters. Post-ICR processing routines can include data validation for certain form fields with spell-checkers, check-sum routines, and database look-up tables.
Numerous possibilities exist for errors in character recognition. Document scanners can misread an image that is dirty or too skewed. Lines on a form may be removed where a handwritten character overlapped them, thus changing the character's final appearance and causing it to be read as a different character. Characters squashed together may be interpreted as a different, larger character. Characters read without contextual analysis may be interpreted as letters, when only numbers should exist in a field.
For these reasons, most data processed by ICR software creates a "reject" file that must be manually corrected. Once manual corrections are made, the data file is complete and accurate. Only then can the data be loaded into a computer system with a high degree of confidence that the electronic records contain the same data that originally resided on the paper forms.
ICR accuracy is described from several perspectives:
* Acceptance rate is the percentage of characters that the ICR software considers identifiable.
* Rejection rate is the percentage of characters that the ICR software is not confident about attempting to identify.
* Accuracy rate is the percentage of accepted characters that are actually identified correctly.
* Substitution error rate is the percentage of characters incorrectly identified. Substitution errors are the most dangerous; during these errors, an incorrect character is substituted for the correct character (Gingrande 1998). An ICR software substitution error on the number $1,000,000 might change the first 0 to an 8, thus giving a new number of $1,800,000. This could radically change an invoice or claim form amount.
An ICR recognition engine assigns a confidence value to each character recognized. Confidence thresholds may be modified within the software for certain fields or characters, and it is normal to test ICR software against any new forms to assure optimal performance. The goal of system tuning is to reduce the ICR recognition errors as much as possible.
In post-processing, special automated dictionaries or data validation routines can greatly reduce the need for subsequent manual cleanup. Clean forms that are filled in properly deliver accuracy rates as high as 99 percent. However, dirty forms, folded forms, and handwriting that is atypical or fails to stay within prescribed data boxes can result in accuracy rates as low as 50 percent of characters read. Data entry costs rise dramatically when recognition rates fall below 90 percent and considerable manual correction of data is required.
Could ICR Alter Business Records?
Considering the reduced costs for data entry and the overall accuracy of ICR software and recognition engines, ICR processing of forms is of tremendous value to many industries. Auto-indexing of business documents, medical claims processing, catalog order entry processing, automated reading of government forms, and scanning surveys or questionnaires for data are all activities that take advantage of ICR technology every day. For applications where the ability to control data entry is limited or the variety of respondents is great, paper forms and ICR technology are an accepted, industry-standard method.
Considering that the data on the original paper business records can be altered during the ICR conversion process or when manual keystrokes correct data during quality assurance checking, it seems prudent to implement two standard records management practices. First, all business processes that contribute to transforming the data from paper form to electronic record should be thoroughly documented and established procedures rigorously followed. Should any concerns be raised about the data's authenticity or accuracy during the use of ICR-generated electronic records, business practices can be audited for compliance with expected (documented) procedures.
Second, all originally submitted paper forms should be covered by a comprehensive set of records retention procedures that specify how the original records are managed and preserved. Either the original paper forms or electronic images of the scanned forms should be preserved for a duration that exceeds any usual need to verify the data as submitted on the forms.
Temporarily retaining these original form-based records in paper or image format provides an audit trail to determine authenticity and accuracy of electronic records used within the eventual host computer system. This archived set of records might be preserved for a duration as short as three months or as long as regulatory requirements or business concerns dictate.
Paper is still the best medium for some business communications. However, as electronic data and records become standard for conducting business, it is important to assure that any new records created from the transformation of paper to electronic format accurately depict the original content, if not the format, intended by the original documents' creators.
Editor's Note: The companies and products named in this article are provided as examples by the author and do not constitute endorsement by ARMA International.
Gingrande, Arthur. Forms Automation: From ICR to E-Forms to the Internet. Silver Spring, MD: AIIM International, 1998.
John T. Phillips, CRM, is the owner of Information Technology Decisions, a management consulting firm. He has more than 20 years' experience in information resources management, specializing in automated records management systems and other technology-related areas. He can be contacted at email@example.com
|Printer friendly Cite/link Email Feedback|
|Author:||PHILLIPS, JOHN T.|
|Publication:||Information Management Journal|
|Date:||Apr 1, 2000|
|Previous Article:||Retention of Merger and Acquisition Records and Information.|
|Next Article:||The KM Technology Infrastructure.|