Printer Friendly
The Free Library
14,504,020 articles and books
Member login
User name  
Password 
 
Join us Forgot password?

Does ICR Keep Paper Forms Viable?


Computer technology is capable of performing amazing a·maze  
v. a·mazed, a·maz·ing, a·maz·es

v.tr.
1. To affect with great wonder; astonish. See Synonyms at surprise.

2. Obsolete To bewilder; perplex.

v.intr.
 feats when applied to some otherwise mundane human endeavors. The ability of intelligent character recognition In computer science, intelligent character recognition (ICR) is an advanced optical character recognition (OCR) system that allows fonts and different styles of hand writing to be learned by a computer during processing to improve accuracy and recognition levels.  (ICR (Intelligent Character Recognition or Image Character Recognition) The machine recognition of hand-printed characters as well as machine printing that is difficult to recognize. ) software to interpret and read written characters enables paper forms to continue their prevalence in many special industries.

A challenge for modern organizations is the need to accept data input from large numbers of individuals, businesses, or government agencies. The U.S. Internal Revenue Service, healthcare organizations, student testing services, businesses conducting global surveys, and companies interacting with a variety of individuals often must accept paper-based forms as a low-technology solution to obtaining data.

Those using paper forms do not have a computer linked to electronic mail or the Internet, and even well-connected individuals do not always have a computer available when they need to place orders, fill out credit card receipts, or file requests for healthcare coverage. Paper forms are still the least expensive data capture device for many business applications where individuals without network connections must provide data for entry into a computer system. Even today's much-acclaimed Internet browser See Web browser.  interfaces do not help computer applications when the data must be collected from constantly changing individuals and constantly changing locations.

Processing data entry forms that contain handwritten hand·write  
tr.v. hand·wrote , hand·writ·ten , hand·writ·ing, hand·writes
To write by hand.



[Back-formation from handwritten.]

Adj. 1.
 responses requires software with character recognition capabilities that dramatically exceed those of standard optical character recognition optical character recognition (OCR), method for the machine-reading of typeset, typed, and, in some cases, hand-printed letters, numbers, and symbols using optical sensing and a computer.  (OCR OCR
 in full optical character recognition

Scanning and comparison technique intended to identify printed text or numerical data. It avoids the need to retype already printed material for data entry.
) software. Whereas OCR software reads typewritten type·write  
intr. & tr.v. type·wrote , type·writ·ten , type·writ·ing, type·writes
To engage in writing or to write (matter) with a typewriter.
 characters, data on paper forms filled out by human hands is far more difficult to decipher and recognize than consistent typewritten text.

By its very nature, ICR performs a data conversion during the capture process: the ICR software recognizes scanned images of data then converts it to American Standard Code for Information Interchange American Standard Code for Information Interchange: see ASCII.


See ASCII.

American Standard Code for Information Interchange - The basis of character sets used in almost all present-day computers.
 (ASCII ASCII or American Standard Code for Information Interchange, a set of codes used to represent letters, numbers, a few symbols, and control characters. Originally designed for teletype operations, it has found wide application in computers. ) text. To the extent that the data is converted accurately, it does not differ from the original paper business record. However, if a handwritten number one is misread mis·read  
tr.v. mis·read , mis·read·ing, mis·reads
1. To read inaccurately.

2. To misinterpret or misunderstand: misread our friendly concern as prying.
 during ICR as the lower-case alphabet letter l, then the electronic record does not reflect the content of the original paper record.

A records management question arises in the industry practice of discarding paper-based input forms once the data has been entered into a computer system. It is not uncommon for credit card receipts and other such transitory TRANSITORY. That which lasts but a short time, as transitory facts that which may be laid in different places, as a transitory action.  data collection devices to be discarded once data entry has occurred. In some applications where paper-based forms are used for data gathering and subsequent data entry, the original paper records are stored off-site in a commercial records storage center. In other cases, the paper records are discarded as their data is entered into the computer system. No auditable trail is retained to show what appeared on the original form, making it more difficult to resolve disputes that arise regarding data accuracy in newly created electronic records.

Paper Is Still Practical

Much has been said about the demise of paper as a medium for human communications. Electronic document management systems, electronic mail, and word processing word processing, use of a computer program or a dedicated hardware and software package to write, edit, format, and print a document. Text is most commonly entered using a keyboard similar to a typewriter's, although handwritten input (see pen-based computer) and  software are often presented as technologies that can lead to the paperless office Long predicted, the paperless office is still a myth. Although paper usage has been reduced in some organizations, it has increased in others. Today's PCs make it easy to churn out documents.

As one technology eliminates paper, another comes along to increase usage.
. These computer-based solutions to the paper glut do have many advantages, including their ability to support modern office needs for comprehensive document life cycle management, fast communication, and efficient information storage and retrieval information storage and retrieval, the systematic process of collecting and cataloging data so that they can be located and displayed on request. Computers and data processing techniques have made possible the high-speed, selective retrieval of large amounts of . However, paper is still the medium chosen by most individuals for reading documents, collaborating on large-format drawings, and responding to informal surveys. Most people would prefer reading an easily browsed paper magazine or report to scrolling around through screens of its electronic equivalent on a laptop computer.

Many business activities that need to gather computer input data can be frustrated by an inability to exercise control over the data source. For example, when healthcare management organizations accept insurance claims from individuals or physicians' groups, they contend with requests from entities that are without any consistent form of computer-based office automation. Doctors' offices use various computer systems and software, and some groups persist in Verb 1. persist in - do something repeatedly and showing no intention to stop; "We continued our research into the cause of the illness"; "The landlord persists in asking us to move"
continue
 filling out standardized healthcare request forms by hand. Individuals may also submit claims for coverage.

Even though standard forms are used, the healthcare insurance providers have virtually no control over the computing architecture used by those requesting coverage. For this reason, handwritten and typewritten forms dominate claims submission for healthcare insurance coverage.

Forms-based electronic data interchange See EDI.

(application, communications) electronic data interchange - (EDI) The exchange of standardised document forms between computer systems for business use. EDI is part of electronic commerce.
 software or Internet browser software would seem a good solution to submitting data. However, healthcare insurance providers' inability to affect how users request coverage indicates that a low-tech, paper-based form solution is still the most effective method of obtaining data.

When someone orders products without a computer available, the most convenient and effective method is to fill out a paper form, then fax or mail it. For companies to receive and process electronic product orders from the general public would require implementing computer systems and software accessible by a highly diverse population with varied computer systems of their own (Intel/ Windows, UNIX UNIX

Operating system for digital computers, developed by Ken Thompson of Bell Laboratories in 1969. It was initially designed for a single user (the name was a pun on the earlier operating system Multics).
, Apple, etc.), a very difficult undertaking. Many individuals still do not have regular, reliable access to any computer system and would be unable to participate in a buying scheme limited to orders placed electronically. It is much less expensive to simply accept properly formatted typewritten or handwritten forms, then perform ICR processes that recognize data on those forms.

Businesses also must interact with other businesses by processing invoices for services or products. An automobile manufacturer may receive invoices for parts from small suppliers who are unable to participate in the electronic data interchange (EDI (Electronic Data Interchange) The electronic communication of business transactions, such as orders, confirmations and invoices, between organizations. Third parties provide EDI services that enable organizations with different equipment to connect. ) systems that the manufacturer has established with its larger vendors. Smaller firms may need to submit paper forms because they can not afford the investment in computer system hardware and software that EDI requires. In such cases, paper forms and ICR technology are often the most cost-effective solution.

Organizations conducting surveys (e.g., gathering census data), have similar challenges. How could they possibly design a computer system that would allow all potential survey respondents to submit data? The variety of respondents makes such an undertaking inherently impossible. In contrast, almost anyone can fill out a paper form.

Paper forms are still very much in use today for all of these reasons. There is little initial cost per form, no technology required, and, most importantly Adv. 1. most importantly - above and beyond all other consideration; "above all, you must be independent"
above all, most especially
, forms are easy to use. However, the data that appears on those forms must eventually be loaded into a computer system to be useful.

The traditional business model for getting handwritten data into a computer system has been to contract for data entry services. Firms that perform this service accept large volumes of paper records that are keyed into a computer software for later return to customers as ASCII files. The files are then loaded into the customer's database for subsequent processing.

This approach to data entry is highly effective, as special software ensures that correct data is entered from the forms. Key-to-disk software, such as that produced by Viking Software Services Inc. and Lifetime Software Technologies Inc., uses extensive data entry correction rules to prevent errors such as double entries, numbers in character fields, impossible dates, and other data anomalies that can arise during manual data entry.

Manual data entry is a slow, laborious, and expensive process compared to computer recognition of text and its subsequent processing. Data entry specialists are becoming increasingly difficult to employ in the United States United States, officially United States of America, republic (2005 est. pop. 295,734,000), 3,539,227 sq mi (9,166,598 sq km), North America. The United States is the world's third largest country in population and the fourth largest country in area.  since it is tedious, boring work. Frequently, data entry is outsourced to firms in Mexico, the Caribbean, and India, where labor is far less expensive and the desire to obtain any computer systems work is much higher than in America.

In fact, paper forms or their scanned images can be sent overseas, the data entered into a database, and the database file returned less expensively than if the forms were processed in the United States. Depending on the consistency and standardization of this increasingly electronic business activity, numerous records authenticity issues may arise. The creation of data entry audit trails can become quite important.

How Does ICR Work?

Intelligent character recognition software potentially reduces the need for manual entry of computer data. When used successfully, 95 percent to 99 percent of a form's text characters can be recognized and stored in the appropriate database fields. Software such as that from Associated Solutions Inc., Captiva Software Captiva Software is now part of EMC Software Group, a division of EMC Corporation.[1] Captiva makes software solutions for document information processing and data capture from paper and electronic documents and provides related services.  Corp., and RRI RRI Radio Romania International
RRI Raman Research Institute
RRI Resource Renewal Institute
RRI Robarts Research Institute
RRI Research Reactor Institute
RRI Renal Research Institute (USA)
RRI Rights and Resources Initiative
 Inc. uses neural-net-based ICR engines that recognize and process text very efficiently. Here's how the process works:

Place a form or document into a document scanner An optical scanner geared to office documents rather than photographs. Also called "office scanners," "enterprise scanners" and "business scanners," desktop models have automatic document feeders that can scan in the range of approximately 15 to 100 pages per minute.  connected to a computer workstation. The document's appearance gets captured as a raster The horizontal lines (scan lines) displayed on a TV or computer monitor. This is the origin of the term "raster graphics," which is the major category that all bitmapped images and video frames fall into (GIF, JPEG, MPEG, etc.).  image in the manner usually associated with document imaging software; a bit-mapped image bit-mapped image
n.
A computer image that is stored and displayed as a set of colored points in a rectangular grid. Also called raster graphic.
 comprised of detailed black dots on a white background is produced, usually as a tagged image format (TIF TIF Tagged Image File (file name extension)
TIF Tax Increment Financing
TIF Temporary Internet Files
TIF Transport Innovation Fund (UK)
TIF Telecommunications Infrastructure Fund
) file. Software then performs image analysis to properly align the image, match ICR zones with expected data fields, and begin categorizing the data as hand print, typed characters, or other data types.

Lines and boxes on the form may be dropped out, and image despeckling or other image enhancement See image editing.  activities may occur. An ICR template is used to identify fields of character data on the image. These fields are broken down into discrete characters that are then classified by ICR algorithms and assigned confidence values by the ICR recognition engine. The engine ranks alternate possibilities for characters and then chooses the most likely characters. Post-ICR processing routines can include data validation In computer science, data validation is the process of ensuring that a program operates on clean, correct and useful data. It uses routines, often called validation rules, that check for correctness or meaningfulness of data that are input to the system.  for certain form fields with spell-checkers, check-sum routines, and database look-up tables.

Numerous possibilities exist for errors in character recognition. Document scanners can misread an image that is dirty or too skewed skewed

curve of a usually unimodal distribution with one tail drawn out more than the other and the median will lie above or below the mean.

skewed Epidemiology adjective Referring to an asymmetrical distribution of a population or of data
. Lines on a form may be removed where a handwritten character overlapped them, thus changing the character's final appearance and causing it to be read as a different character. Characters squashed together may be interpreted as a different, larger character. Characters read without contextual analysis may be interpreted as letters, when only numbers should exist in a field.

For these reasons, most data processed by ICR software creates a "reject" file that must be manually corrected. Once manual corrections are made, the data file is complete and accurate. Only then can the data be loaded into a computer system with a high degree of confidence that the electronic records contain the same data that originally resided on the paper forms.

ICR accuracy is described from several perspectives:

* Acceptance rate is the percentage of characters that the ICR software considers identifiable.

* Rejection rate is the percentage of characters that the ICR software is not confident about attempting to identify.

* Accuracy rate is the percentage of accepted characters that are actually identified correctly.

* Substitution error rate is the percentage of characters incorrectly identified. Substitution errors are the most dangerous; during these errors, an incorrect character is substituted for the correct character (Gingrande 1998). An ICR software substitution error on the number $1,000,000 might change the first 0 to an 8, thus giving a new number of $1,800,000. This could radically change an invoice or claim form amount.

An ICR recognition engine assigns a confidence value to each character recognized. Confidence thresholds may be modified within the software for certain fields or characters, and it is normal to test ICR software against any new forms to assure optimal performance. The goal of system tuning is to reduce the ICR recognition errors as much as possible.

In post-processing, special automated dictionaries or data validation routines can greatly reduce the need for subsequent manual cleanup. Clean forms that are filled in properly deliver accuracy rates as high as 99 percent. However, dirty forms, folded forms, and handwriting that is atypical or fails to stay within prescribed data boxes can result in accuracy rates as low as 50 percent of characters read. Data entry costs rise dramatically when recognition rates fall below 90 percent and considerable manual correction of data is required.

Could ICR Alter Business Records?

Considering the reduced costs for data entry and the overall accuracy of ICR software and recognition engines, ICR processing of forms is of tremendous value to many industries. Auto-indexing of business documents, medical claims processing, catalog order entry processing, automated reading of government forms, and scanning surveys or questionnaires for data are all activities that take advantage of ICR technology every day. For applications where the ability to control data entry is limited or the variety of respondents is great, paper forms and ICR technology are an accepted, industry-standard method.

Considering that the data on the original paper business records can be altered during the ICR conversion process or when manual keystrokes correct data during quality assurance checking, it seems prudent to implement two standard records management practices. First, all business processes that contribute to transforming the data from paper form to electronic record should be thoroughly documented and established procedures rigorously followed. Should any concerns be raised about the data's authenticity or accuracy during the use of ICR-generated electronic records, business practices can be audited for compliance with expected (documented) procedures.

Second, all originally submitted paper forms should be covered by a comprehensive set of records retention procedures that specify how the original records are managed and preserved. Either the original paper forms or electronic images of the scanned forms should be preserved for a duration that exceeds any usual need to verify the data as submitted on the forms.

Temporarily retaining these original form-based records in paper or image format provides an audit trail to determine authenticity and accuracy of electronic records used within the eventual host computer system. This archived set of records might be preserved for a duration as short as three months or as long as regulatory requirements or business concerns dictate.

Paper is still the best medium for some business communications. However, as electronic data and records become standard for conducting business, it is important to assure that any new records created from the transformation of paper to electronic format accurately depict the original content, if not the format, intended by the original documents' creators.

Editor's Note Editor's Note (foaled in 1993 in Kentucky) is an American thoroughbred Stallion racehorse. He was sired by 1992 U.S. Champion 2 YO Colt Forty Niner, who in turn was a son of Champion sire Mr. Prospector and out of the mare, Beware Of The Cat.

Trained by D.
: The companies and products named in this article are provided as examples by the author and do not constitute endorsement by ARMA International.

References

Gingrande, Arthur. Forms Automation: From ICR to E-Forms to the Internet. Silver Spring, MD: AIIM (Association for Information and Image Management International, Silver Spring, MD, www.aiim.org) A membership organization founded in 1943 devoted to creating industry standards and disseminating information about the document management industry.  International, 1998.

John T. Phillips, CRM (Customer Relationship Management) An integrated information system that is used to plan, schedule and control the presales and postsales activities in an organization. , is the owner of Information Technology Decisions, a management consulting Noun 1. management consulting - a service industry that provides advice to those in charge of running a business
service industry - an industry that provides services rather than tangible objects
 firm. He has more than 20 years' experience in information resources (1) The data and information assets of an organization, department or unit. See data administration.

(2) Another name for the Information Systems (IS) or Information Technology (IT) department. See IT.
 management, specializing in automated records management systems and other technology-related areas. He can be contacted at jtpitd@usit.net
COPYRIGHT 2000 Association of Records Managers & Administrators (ARMA)
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2000, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.

 Reader Opinion

Title:

Comment:



 

Article Details
Printer friendly Cite/link Email Feedback
Author:PHILLIPS, JOHN T.
Publication:Information Management Journal
Geographic Code:1USA
Date:Apr 1, 2000
Words:2388
Previous Article:Retention of Merger and Acquisition Records and Information.
Next Article:The KM Technology Infrastructure.
Topics:



Related Articles
Teddy beaten again. (New York Post sold to Peter S. Kalikow)
Adaptive Solutions and Mitek Systems form strategic alliance.
Retevision of Spain Deploys GeoTel Technology; Customer-support Strategy Optimized through Customer-profile Routing; Cystelcom Providing Systems...
GeoTel-Enterprise Agent Extends the ICR's Call Distribution and CTI Capabilities to Non-ACD Agents.
GeoTel Communications Sets New Industry Standard for Computer Telephony Integration --CTI-- with ICR Version 4.
GeoTel Delivers Network-to-Desktop CTI Functionality with GeoTel-CTI Desktop.
GeoTel Communications Corporation Announces Network Intelligent CallRouter Version 4.
Primix Introduces Intelligent Customer Response Service Offering; Provides Insurance Carriers with Faster Access to Relevant Customer Information for...
A2iA FieldReader Adds Handprint & Cursive Writing Recognition to Cardiff TELEform.
Datacap Supports FileNet P8 and Capture 5.0; Continues Long History of Tight Integration With ECM Leader.

Terms of use | Copyright © 2009 Farlex, Inc. | Feedback | For webmasters | Submit articles