Printer Friendly

The challange of web site records preservation: managing electronic records in fast-paced, technology-driven web environments has frustrated information management professionals for more than a decade.

The "free and open" Internet has transformed personal techniques for gathering and storing information and, as a result, has changed workplace expectations about document availability. Using Internet browser software from a personal computer, one can quickly access enormous volumes of documents placed around the world on Internet Web servers.

In data- and document-intensive office environments, it is appealing to place information on intranet-accessible Web servers rather than to use paper-based distribution methods. Printed newsletters, bound policy/procedure manuals, and cork-backed bulletin boards nailed to lunchroom walls have been largely replaced with flexible and easily updated digital Web sites. These electronic newsrooms offer organizational agendas, changes in official policies, modifications to administrative procedures, immediate safety alerts, and links to local resources that may be of interest to employees.

The 24-hours-a-day, 7-days-a-week, 365-days-a-year (24x7x365) company "uptime" concept has given rise to expectations that there is to be no interruption in information accessibility. Corporate multinationals, software help desks, and a variety of crisis centers must operate without allowing time zones, personnel work shifts, or professional staff availability to reduce access to information or services. Internet- or intranet-accessible Web sites have positively revolutionized the way modern organizations create, store, and share information. However, the forward-thinking business planners and technologists that championed a stampede toward the increasing use of Web-based documents have largely forgotten two critical aspects of basic records management: information accessibility and viability over time.

For mankind to benefit from accumulated organizational and cultural experiences, recorded information must be properly preserved. Writings on clay tablets and paper-based media have withstood centuries of use and abuse only to remain readable and decipherable after thousands of years. In stark contrast, it is ironic that the very business planning sentiments that insist on the 24x7x365 organizational availability equation may be ensuring that information will not be available in as little as five or more years--4x7x365x5. It is this last number in the information availability equation that will determine if the enormous investment in information technology and resources will be lost to posterity. Will there be a "virtual archives" that stores for future generations the digital records being created by our governments, cultural institutions, and workplaces? What viral monsters lurk ready to gobble up unprotected data where there is no long-term information disaster protection plan? Will Web site resident records that require long-term retention be adequately protected and preserved to ensure meeting legal, statutory, and regulatory business requirements? Most importantly, are any action plans or procedures in place to ensure these issues will be addressed so that data, documents, and records stored on Web sites will not simply evaporate or become unusable over time?

Preservation of officially designated documents (records) that are posted on Web browser-accessible computers requires considerable advanced planning. Due to both the technical complexities of the Web server environment and the informal undocumented work streams that often are used to populate these computer systems with documents, most Web servers have little or no application of records management techniques and principles to their repositories of potentially valuable records. In addition, there is no single easy technology or management solution that can be installed or practiced that will immediately result in the protection and preservation of electronic records.

To be successful, the management of records and information requires the application of professional concepts and practices with sufficient management support to be influential and organizationally effective.

Technology Drives Web Records

Records placed on Web sites can exist in a variety of forms. These documents are "hosted" by Web server software and processes that also must be documented and managed. This is a much more complex medium for information delivery than the two-dimensional static paper or microfilm media traditionally used for recording information. The technology format, informational content, document structure, and context in which the records are produced all interact to create records "quality" issues that impact Web records' accuracy, integrity, and authenticity.

A simple, "static," or unchanging Web page is typically made up of hypertext markup language (HTML)-coded (tagged or marked up) text and graphics that can be read by Internet browser software, such as Microsoft Internet Explorer or Netscape Navigator. According to Preston Gralla's book How the Internet Works, "Markup languages are the road signs of a Web page ... These instructions, called tags or markups, are embedded in the source document that creates the Web page."

Unfortunately, on a computer without the correct software version, a page of HTML may display information formatted differently than the author intended. Document fonts, heading styles, table formatting, and graphics can appear exaggerated, off-color, incorrectly aligned, or with any number of presentation anomalies, and any one of these anomalies may cast aspersions on the accuracy or authenticity of a record. However, static postings of documents to Web sites do not pose the level of threat to good management of records as does the creation of Web pages that include hyperlinks or embedded documents for download.

A major advantage of the Internet is the ability to link multiple pages together with "hyperlinks" by placing hypertext transfer protocol (http://) references in Web page documents, so that the reader may click on the hyperlink and quickly "jump" visually to another document page, potentially on another Internet server in another location. This ability to link multiple pages of a "virtual" or "compound" document with an easy-to-access appearance of being one record volume can make it difficult to ensure that all components of the virtual document are preserved together in a logical manner if there is no organized method of tracking changes to all the components and to the linked pages that are external to the main document.

Some Web sites use even more advanced Web page programming techniques by taking advantage of extensible markup language (XML). Gralla writes that XML "is used only to convey information about the content, not about the presentation of the content." This technique allows personal computers, personal digital assistants, and cellular phones, for instance, to format the received documents themselves. Under these circumstances, it may be difficult later to determine exactly how information was displayed on a particular receiving communications device.

In other cases, according to Gralla, Web sites take advantage of techniques such as common gateway interface (CGI) scripts, a communications protocol used to reach other software applications such as databases. Using this technology, a formatted report that appears on a Web page may contain data that was dynamically drawn from a database. For instance, when accessing an Internet Web site to order a book or other item, the Web site may display the current price of that item or the number of items currently in stock. To deliver this data to the dynamic Web page, CGI or other similar code accesses the database in the background based on the Web page user's selecting options on a form. The Web page then queries the database to present the data immediately to the user's computer screen, thus giving a quick virtual report that may, in fact, be a record of a transaction. Unfortunately, from a records management perspective, unless the user preserves a copy of the computer screen by saving the screen to a file or printing it to paper, the record of the transaction may be lost.

Research Agendas Tackle Fundamentals

The challenge of managing electronic records in fast-paced, technology-driven environments has been a source of frustration for information management professionals for more than a decade. Although in the late 1980s and early 1990s many programmatic efforts to accept electronic records on optical and magnetic media were partially successful, it soon became apparent that taking custody of the burgeoning volume of electronic records on desktop computers, in e-mail systems, and embedded in departmental software applications was doomed to failure. It is quite possible to categorize and classify a limited number of word processing documents received by a records center on a floppy disk and to then attach a label to the disk indicating the disk contents, relevant records series, and retention periods. However, such a "custodial" role for records management services can never be successful when there is an assumption that records to be archived will be reviewed (accessioned) individually, due to the enormous quantities of electronic records now being created daily by computer systems users. In addition, the dependency of Web sites on fully functioning software and hardware architectures to create, store, read, organize, and retrieve these information "objects" began to arise as an equally daunting concern.

It soon became obvious that something as simple as changing software versions over time could render Web site resident documents unreadable. As these challenges were realized by the information management community, several introductory research efforts attempted to quantify and describe the magnitude of the problem.

Quantifying Challenges

An important early example of the effort to quantify Web records management challenges was a study conducted by J. Timothy Sprehe and Charles R. McClure for the National Historical Publications and Records Commission with a report published in January 1998. "Analysis and Development of Model Quality Guidelines for Electronic Records for Management of State and Federal Websites" had these goals:

1. "To provide a theoretical and conceptual framework within which to understand records management and historical preservation issues related to government Websites.

2. To provide a statement of records management and historical preservation principles as they apply to government Websites, based on an empirical assessment of state and federal Website activities.

3. To provide model guidelines for webmasters and records managers concerning management and preservation of electronic records on government Websites.

4. To promote awareness in, and education of, archivists and records managers concerning measures to be taken in order to manage and preserve historically valuable records on government Websites."

The report effectively documented the enormous variety of professional and organizational opinions and perspectives in play at that time regarding electronic records management (ERM). It noted that"literature on ERM of Web sites lacks clear, consistent use of key ERM concepts and terms, however. Regarding key issues, the literature is often contradictory, confusing, contentious, and nebulous, and it lacks coherent organization and understanding of key concepts. Much research and writing is still needed concerning ERM of Web sites."

Unfortunately, this remains true to some extent today, although there has been some progress within certain sectors to define Web site policies and procedure alternatives within limited settings. No broadly accepted best practices exist for this area, and most implementations of solutions today are still "home grown" as everyone tries to cope with the intersection of professional, technological, and organizational challenges.

An indication of the progress that has been made in resolving issues for some specific Web site management settings, such as governmental sites, is the recent publication of "Archiving Web Resources: Guidelines for Keeping Records of Web-based Activity in the Commonwealth Government," produced by the National Archives of Australia in 2001. This document actually proposes specific strategic and technological options that can be used to generate relevant policies and procedures for organizations.

Challenges discussed include divergent organizational responsibilities for managing Web records, varying Web technologies used to host records, dynamically changing documents, and systems management issues. Assessing the risk to records is covered, as well as the factors that influence considerations regarding hardware, software, and media obsolescence. However, a highly innovative aspect of the document is the direct guidance in the section "Determining the Best Option." Though primarily oriented toward arriving at a Web records management solution for Australian government settings, action plans are presented that can be followed to assess Web site activities, implemented technologies, levels of risk, and relevant recordkeeping requirements. Then, a high-level set of factors to assess is presented that encourage each agency to establish its own system of recordkeeping checks and balances.

According to the report, "There is no generic solution for creating and maintaining records of Web-based activity. The best option will depend on the outcome of an analysis of the particular circumstances. Each agency should assess a number of factors ...," which are listed. The message clearly communicated is one of following general professional guide-lines and creating a "combination of approaches" that best fit the organization.

This is a remarkably refreshing way to empower records and information managers to take charge of recordkeeping initiatives without expecting them to wait on divine guidance from an over-seeing records management authority. The moving target of managing information properly in technology-driven environments often precludes waiting for industry standards or best practices to be firmly established. Increasingly, professional records and information managers will need to take the initiative to advise their management of recordkeeping solutions, even when elements of those solutions will need to be developed internally.

This development is especially important due to the convergence of professional practices and theories in this area. In many cases, archivists, as well as records managers, will have some responsibility for determining the Web content that is preserved.

Although less a factor in the private sector, archivists in government are especially concerned about preserving records of historical value that may appear on Web sites. It was their concern about protecting cultural history that initiated a call on January 12, 2001, by Lewis J. Bellardo, deputy archivist of the United States, for U.S. government chief information officers (CIO) to take a snapshot of their agencies' public Web sites in order to "ensure that we are able to document at least in part agency use of the Internet at the end of the Clinton Administration."

Web Records Retention Solutions

Bellardo's memorandum contained only minimal technical direction on how to perform the snapshots or how to store them on appropriate electronic media. The computer systems administrators in charge of these sites were not comfortable with this minimal amount of technical direction, and archivists themselves were equally unsure of how the minimal procedural detail actually protected records. Clearly, to be successful, records retention requirements, archival science-based document preservation issues, and computer systems operating procedures must all cooperatively converge in a team effort to arrive at comprehensive records management guidelines in these complex environments.

To protect Web site records, responsible team members must first understand the supporting technologies so that records at risk can be identified. Various data formats used to create Web pages or post documents to Web sites, such as HTML, XML, Adobe Portable Document Format (PDF), word processing documents (such as Microsoft Word), or graphics interchange format (GIF), should be familiar to team members. A basic understanding of how Web servers host and display documents is also desirable. Once the fundamental technologies are understood, the team can get down to work.

The records types resident on Web pages must be categorized according to the risk for loss. For instance, Web sites that contain static documents are at less risk for records loss than Web sites that use forms for data entry into databases or Web sites that take advantage of advanced dynamic document programming technologies. An initial inventory of Web site contents must be performed to determine who created the content in order to establish responsibility. A description of Web pages by such metadata as title, author/source, the date the information was posted, the date the information was removed, and any identifiable relevant software issues or document version numbers can be used to create an initial registry for managing Web site resources. In cases where the data is populated dynamically from a source, that source must be entered into the registry of responsibility.

Specific policies and procedures then must be developed to direct appropriate personnel to assume responsibility for capturing the Web site records into a recordkeeping system. This will vary with the organization and the decisions of the Web site records management team. In some cases, the author/source will be required to ensure that records are captured into a recordkeeping system before they are posted to a Web site, thus potentially invalidating the need to snapshot the Web site. In other cases, specific procedures will need to be in place that direct information technology (IT) personnel to capture records into a Web site management system that will, in turn, transfer records to a recordkeeping system. In other cases, snapshots of the Web site will be taken over time to capture precisely what was displayed during any given time period.

The need for managing records from the beginning of the records life cycle will become evident when managing Web site records, as metadata that is assigned to documents quickly becomes invaluable for tracking, managing, storing, and preserving Web site records. Metadata contains information that enables establishing records value and, therefore, retention over time. This metadata also can be searched automatically to schedule candidate Web records for retention, removal, or destruction. As with electronic mail and electronic document management systems, it may be preferable to transfer Web site records to a separate digital repository, rather than try to retain the records long-term in their application of origin. This means, of course, that a separate recordkeeping system for electronic records must exist and be funded.

Web Records Management Programs

As if the challenge of consistently capturing well-defined records from Web sites were not enough, there is also the challenge of ensuring that Web sites are updated and managed consistently. The frequency with which Web pages change can cause tremendous problems in establishing what was presented and when it was presented. To establish for documentation or historical purposes the records and information shown to the public or to an internal workgroup on a project team, each version of a document that appears on the site must be preserved (by capturing or tracking). The alternative is to risk being unable to understand the information that was made available at any particular time. The inability to produce what was displayed on a Web site at a particular time means that the organization has lost at least some of the records for that Web site.

In addition to issues related to Web site records format, presentation, version management, and compound document challenges, system security breaches can have an undesirable impact on systems integrity and claims about records authenticity. When organizations begin to rely on Internet Web sites for information that is used in decision making, the accuracy and integrity of the records displayed must be verifiable and defensible--potentially in court. System security breaches may give rise to questions about what or when information was displayed, or even cast doubt on whether the information was displayed at all. It is critical to protect the content of Web sites with rigorous computer system security measures so that the integrity or authenticity of Web site records will be beyond reproach.

This convergence of computer systems operations, recordkeeping technologies, and organizational responsibilities for maintaining records required by regulation, statute, or business rules means that the management of Web site records must be a collaborative team activity. To be effective, records management initiatives must be implemented by working with IT personnel, who are in charge of the systems. Management priorities and perspectives will influence the assignment of recordkeeping programmatic responsibilities as well as serve as a major incentive for departmental records creators to accept the need for addressing records inventories and implementing records management policies. Management interest in employee adherence to records management policies will become particularly notable during any electronic records discovery requests from opposing legal counsel.

To sell the idea of properly managing Web site records, records management personnel and/or the Web site records management team must present a clear business case to executive management. Any corporate accountability factors, legal liability concerns, public records responsibilities, and potential financial damages from e-commerce initiatives should be clearly illustrated in the business case. The adverse consequences of not managing Web site records appropriately must be presented thoroughly as well. In a government environment, the responsibility to preserve historical records, as was illustrated in the National Archives and Records Administration's attempt to preserve the records of the Clinton presidency, also must be factored into rationales. Webmasters, departmental content creators, legal counsel, and records management personnel must all equally support the business case so that sufficient resources will be provided to ensure Web site management program success.

A fundamental premise of records and information protection is that both information accessibility and viability must be protected over time. If software technology challenges are not considered, document components are inaccessible due to hyperlinks to non-existent references, or systems management procedures fail to adequately capture required data and supporting software, enormous volumes of valuable irreplaceable organizational assets may be lost. Properly preserving Web site records will require immediate attention in most organizations to prevent a continuing loss of these important information resources.

At the Core

This article:

* Explains the need for Web site records management and retention

* Examines the challenges of Web site records management

* Discusses Web records retention solutions


Bellardo, Lewis J. Memorandum to Chief Information Officers: Snapshot of Agency Public Web Sites. 12 January 2001. Available at memo_to_cios.html (accessed 25 November 2002).

Gralla, Preston. How the Internet Works. Indianapolis: QUE, 2002.

National Archives of Australia. Archiving Web Resources: Guidelines for Keeping Records of Web-based Activity in the Commonwealth Government. March 2001. Available at (accessed 25 November 2002).

National Archives and Records Administration (NARA). Federal Web Site Snapshot Information. 2001. Available at (accessed 25 November 2002).

Sprehe, J. Timothy and Charles R. McClure. Analysis and Development of Model Quality Guidelines for Electronic Records for Management of State and Federal Websites. National Historical Publications and Records Commission, Washington D.C., 1998. Available at (accessed 25 November 2002).

John T. Phillips, CRM, FAI, is a Senior Consultant at Information Technology Decisions. Over the past 25 years, he has worked as a management consultant, data systems project manager, computer research associate, librarian, and records manager in a variety of information resources management activities. He may be reached at
COPYRIGHT 2003 Association of Records Managers & Administrators (ARMA)
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2003, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.

Article Details
Printer friendly Cite/link Email Feedback
Author:Phillips, John T.
Publication:Information Management Journal
Geographic Code:1USA
Date:Jan 1, 2003
Previous Article:Protecting records in the face of chaos, calamity, and cataclysm: even organizations that do not think they are prime targets for terrorists do not...
Next Article:Putting "strategic" into information management: if information is not strategically managed as an asset, information management will be...

Related Articles
Technology: Tools for Managing Information.
Recordkeeping in the 21st Century.
XML for Content and E-Commerce.
Web technologies for information management. (Cover Story).
A resolution for the New Year. (In focus: a message from the editors).
Toolkits from across the pond: the United Kingdom has developed standards and guidelines that are persuading organizations to take records management...
ARMA 2004 Long Beach.
From the mouths of CIOs: organizations can meet the biggest challenges facing them today by getting their records management and IT professionals to...
Sites every RIM professional should know.
Digital archiving in the pharmaceutical industry: while relatively new as a retention method in the drug industry, e-archiving of records is a...

Terms of use | Privacy policy | Copyright © 2019 Farlex, Inc. | Feedback | For webmasters