Printer Friendly
The Free Library
22,719,120 articles and books

Strategies for improving electronic recordkeeping performance.


The "big bucket" approach to records retention calls for a consolidation of disparate retention categories, so the total number of categories is drastically reduced. This approach can result in a reduction from hundreds--sometimes even thousands--of categories to just 100 or so. This reduction in retention categories has four positive implications on electronic recordkeeping systems.

The use of big buckets in electronic recordkeeping ultimately means that classification accuracy will rise, and the software systems will perform better.

Increasing Manual End User Classification Accuracy

In an electronic recordkeeping system, the accuracy of records classification is a vital make-it-or-break-it factor for a successful system. If the overall classification accuracy, expressed as the percentage of total records that are known to be correctly (accurately) classified, does not meet an acceptable threshold (typically in the 80%-90% range), it is pointless to run disposition against the records, as the error rate would be unacceptably high. Too many records would be destroyed too early or too late.

Worse still, the sheer overwhelming volume of electronic records (easily into the millions in many organizations) makes it impractical to review, let alone correct, a high classification error rate. Hence, it is absolutely critical that the overall classification accuracy rate be maintained above the acceptable threshold at all times.

Most classification in an electronic recordkeeping system is still done manually, in that the end user has to decide in which category the record belongs. This is particularly true of e-mail, where the user has to decide on the fly where to store and/or categorize an e-mail that constitutes a business record.

The software might present a list of all categories to choose from or a list of personalized "most-often-used" categories. Or, the user might drag an e-mail/document onto a folder that represents a category. Regardless of which technique is used, the fewer the categories, the easier it is to decide which is the proper choice, and overall accuracy of classification is certain to increase.

For example, suppose an organization begins with three similar categories "Safety--Incidents," "Safety, General," and "Safety--Procedures." An e-mail describing a procedure followed during a safety infraction could potentially belong in any of the three categories, undoubtedly leading to classification errors as different users interpret the e-mail as "General," an "Incident" or a "Procedure."

An organization following the big bucket approach might consolidate the three categories into a single, new, bigger bucket, "Safety." Over time, as users classify safety-related e-mails and documents into that single big bucket, the classification error rate will drop as users no longer need to pause, carefully evaluate the original three choices, and make an educated guess as to the appropriate category.

Improving List Navigation and Selection

Bigger buckets have a rather obvious positive impact on the theoretical overall classification accuracy rate by making the decision easier for the user. But what about the tool people use to make this decision? The software process itself can facilitate the right decision by making it easier to mechanically translate the mental decision into a physical selection on their computer screens. The software's job is to clearly and unambiguously present the selections for review on the screen and make the process of selecting the desired category simple, fast, and as error-free as possible.

A smaller quantity of bigger buckets means smaller lists to select from. In any Windows-based user interface, shorter lists are easier to work with. A shorter list on the screen makes it easier to see all possible choices in a single window, with less vertical scrolling. Even the vertical height of the list slider in the window margin is thicker and easier to select and scroll in a window with fewer items displayed. This translates to less time and trouble to make the selection onscreen, contributing to consistency in selection.

An even greater positive impact of big buckets would be on the so-called "tree display." A hierarchical list of folders, such as e-mail folders, is displayed so a given folder can be opened (branch expanded) to reveal subfolders or closed (branch collapsed) as the user navigates up and down the hierarchy. Tree displays can be rather long and can take a great deal of branch opening/closing while navigating in order to examine all the possible choices. Big buckets mean a smaller tree to display, resulting in less onscreen navigation, which facilitates easier, faster end user selection, which will undoubtedly result in an increase in overall classification accuracy.

Improving Machine-Driven Classification

Some newer electronic recordkeeping software systems and utilities deliver so-called "auto-classification" capability. The software "reads" the content of a target e-mail or document, understands the subject of the document, then classifies it by selecting a retention category that most closely matches the document subject. All this happens without any involvement from the user.

The success or failure of auto-classification software ultimately depends on the classification accuracy it can deliver--it must be above the customer's acceptability threshold. There is no doubt that big buckets will dramatically increase the classification accuracy of this emerging technology. The reason is precisely the same as with human classification--fewer choices mean less interpretation, which translates to higher accuracy of selection.

Internally, auto-classification software establishes "confidence levels": for each decision it makes, it establishes how confident it is that the decision was correct, as compared to a known accurate reference decision. The customer can then establish arbitrary confidence thresholds that determine if the selection should be rejected or accepted, based on the internal confidence.

For instance, if the confidence is less than 80 percent or "somewhat uncertain," a user might wish to route the decision to an expert for review and possible manual correction. If the confidence level is less than 50 percent or uncertain, the user might wish to reject the decision entirely and instead manually classify it.

Big buckets mean few categories to choose from. It will be more obvious to the software which category is the correct choice, and the confidence of the decisions will rise accordingly. Not only will overall classification accuracy increase, but the rejection level of uncertain choices will drop, resulting in less administrative overhead.

Improving Internal System Efficiency

Electronic recordkeeping today is largely delivered as a capability within document management (DM) or its more sophisticated version, enterprise content management (ECM), software solutions. The use of big buckets will ultimately make the recordkeeping operations of such systems internally more efficient.

At their core, these DM/ECM systems are sophisticated database systems that organize themselves internally into a series of databases, or tables. All documents must be assigned metadata that specifies, among other things, the type of document (document or content type), security access privileges, date stored, author, subject, and many more possible data elements. These data elements are stored in a database as tables.

These tables, and even the document content itself, are indexed for fast searching and retrieval. The tables and the data within them are related to each other in ways that allow powerful access to and manipulation of the data. For instance, users can easily access all documents submitted by a specified author and change the security level of all those documents in a single operation.

All DM/ECM systems are organized at their most basic level by document type (a metadata field) and folder. Where the system's recordkeeping capability has been fully and appropriately used, all folders (with the exception of some non-records) are linked to a retention category.

For example, all documents in the folder "Safety" and/or all documents where "Subject = Safety," and/or "Document Type = Safety," or perhaps all documents where "Author = lane" and "Department = Safety," would be categorized against the official "Safety" category from the retention schedule. Many of the system's internal database tables are directly or indirectly linked to a category in this retention table. Millions of stored documents are therefore individually linked, either directly or indirectly, through their metadata or some other table to the retention category.

These links have to be maintained throughout the life of the documents as they are stored, moved, modified, and ultimately destroyed by the records disposition process. A smaller overall quantity of retention categories means the links can be internally resolved faster and more efficiently. The more links and the greater the amount of data that must be indexed and processed for each individual operation, the less efficient the operation.

The retention disposition process is particularly computation-intensive. For each document considered for destruction, the system must conduct many checks to determine its eligibility for destruction (30 or more individual checks are not uncommon), and this can take a very long time to process. Bigger buckets, particularly in a retention schedule with fewer mathematical branches (broken down into fewer smaller buckets), will have a dramatically positive impact on the overall performance of disposition processing--it will run faster and therefore complete sooner.

Two of these are usage factors (how the end user interacts with the software):

1. Manual end user classification accuracy will increase.

2. List navigation and selection will be improved.

Two are technical factors (how the software performs):

3. Machine-driven classification will improve.

4. Internal system efficiency will improve.

Bruce Miller, founder and president of RIMtech Inc., is widely regarded as the inventor of modern electronic recordkeeping software. He founded TrueArc in 1989, where he pioneered ForeMost, the world's first commercial electronic recordkeeping software, and Tarian Software in 1999, where he pioneered the world's first e-records software engine. Miller, who holds a diploma in electronics engineering technology and a master's in business administration from Queen's University, now consults widely on electronic recordkeeping technology implementation. He may be contacted at

COPYRIGHT 2008 Association of Records Managers & Administrators (ARMA)
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2008 Gale, Cengage Learning. All rights reserved.

 Reader Opinion




Article Details
Printer friendly Cite/link Email Feedback
Author:Miller, Bruce
Publication:Information Management Journal
Geographic Code:1USA
Date:Sep 1, 2008
Previous Article:Big buckets for simplifying records retention schedules.
Next Article:Legal implications: for using big buckets.

Related Articles
Applying Records Retention to Electronic Records.
Lessons from Down Under: Records Management in Australia.
Metadata & ISO 9000 Compliance.
Electronic Records Management Defined by Court Case and Policy.
The indiana University Electronic Records Project: Lessons Learned.
Perfect together: insurance and accounting record; the right integrated databases can be a valuable practice tool.
An integrated approach to records management: the records continuum model's purpose-oriented approach to records management changes the role of...
Understanding data and information systems for recordkeeping.
Pondering theoretical recordkeeping.
GAO: NARA not doing its job.

Terms of use | Copyright © 2014 Farlex, Inc. | Feedback | For webmasters