Digital Preservation: A Global Information Management Problem.In an e-mail, Svanhildur Bogadottir, city archivist ARCHIVIST. One to whose care the archives have been confided. of the Reykjavik Municipal Archives, Reykjavik, Iceland, writes: At my archives, we are now receiving information in digital form almost every week. In endless sizes and formats, up to 15 to 20 years old. Most of it is impossible to read for us today and some of it has permanent value, but it is not available on paper, so it is really lost. It is a great and GRAVE problem how we can be sure to preserve our information today for the future. This is not a problem for the future but for today. This brief statement underscores digital preservation as a global information management problem. If the city of Reykjavik experiences problems reading and processing older digital media, what about the large archival repositories operated by national and state/provincial governments throughout the world? And what about the untold thousands of electronic recordkeeping systems in business corporations that have long-term retention requirements but no formal policy or practices in place to assure their preservation? For many years, archival institutions throughout the world have operated programs to preserve records possessing archival value that are worthy of permanent preservation. These programs have been organized around physical records -- paper, microfilm, and other visible record media. Now, however, archival repositories worldwide find themselves in a long transition from paper to electronic records as the predominant recordkeeping medium, a transition that began during the '70s, accelerated during the '80s and '90s, and will not come to fruition for another decade or more. Archivists, records managers, and other information management specialists throughout the world face reinventing their professional practices to ensure permanent or long-term preservation of electronic records. It is no hyperbole to say that this is the biggest challenge the records and information community has yet confronted. The essence of the digital preservation problem is how to support the ability to digitally process and read many years from now Paul McCartney: Many Years from Now is a 1997 biography of Paul McCartney by Barry Miles. It is the "official" biography of McCartney and was written "based on hundreds of hours of exclusive interviews undertaken over a period of five years" according to the back cover of the electronic records being created today. The digital preservation problem is of greater magnitude than problems associated with preserving physical records. With physical records, the information is contained on media that are durable. Moreover, physical records can be read by sight or with relatively simple viewing devices. Generally, deterioration is visibly apparent, and there is a time window -- usually measured in years, if not decades -- during which conservation measures can be undertaken if required. Technically, there is nothing particularly complex about preserving physical records. Once appraised as archival, they are accessioned into the archives, finding aids are prepared for them, and they are housed in environmentally appropriate space. When these steps are completed, the records should retain their integrity and usability for many decades, even centuries, under proper environmental conditions. Finally, assuming the steps have been properly completed, the preservation of physical records is generally a once-and-done task. No further measures should be required. With electronic records, however, the situation is drastically different: the recording media are not durable, and the records are not human-readable. The hardware and software required to read electronic records have even less life expectancy Life Expectancy 1. The age until which a person is expected to live. 2. The remaining number of years an individual is expected to live, based on IRS issued life expectancy tables. than the media on which they are recorded. Thus, the tasks required to ensure long-term preservation of digital records are technically difficult -- specialized expertise as well as the requisite hardware, software, and systems documentation are required. Moreover, the data preservation tasks must be performed periodically for as long as ongoing retention is required. In short, there is not an ideal once-and-done solution to permanent preservation of electronic archives. However, archival institutions throughout the world do not have the luxury of doing nothing while they wait for such a solution to appear. Electronic records of archival value are getting older by the day, and they may be threatened with attacks on their integrity and retrievability. They may even be abandoned by their owners as old data. As a matter of top priority, archival institutions worldwide must have plans for dealing with this situation, and they must implement such plans as aggressively as possible. Basic Requirements The National Media Lab, located in St. Paul St. Paul as a missionary he fearlessly confronts the “perils of waters, of robbers, in the city, in the wilderness.” [N.T.: II Cor. 11:26] See : Bravery , Minnesota, has performed studies for many years on what it takes to preserve digital data. Dr. John W. C. Van Bogart, the lab's principal investigator Noun 1. principal investigator - the scientist in charge of an experiment or research project PI scientist - a person with advanced knowledge of one or more sciences for media stability studies, states: "Digital archives should be transcribed every 10 to 20 years." To realize lifetimes greater than this, one would be required to maintain all of the following: * recording system and media * system hardware and software * operating system operating system (OS) Software that controls the operation of a computer, directs the input and output of data, keeps track of files, and controls the processing of computer programs. * operations manuals * ample spare parts Spare parts, also referred to as Service Parts is a term used to indicate extra parts available and in proximity to the mechanical item, such as a automobile, boat, engine, for which they might be used. Spare parts are also called “spares. This requirement suggests that digital preservation facilities must adopt the museum approach, saving these processing components indefinitely. This approach may not be necessary. Current state-of-the-art best practices in digital preservation employed by archival institutions throughout the world include: * selecting storage media most appropriate for long-term data retention * converting data to standard formats to facilitate its processing on a variety of computing platforms * migrating data to new technology platforms when the computing environment is upgraded * preserving systems documentation required to process the data * copying or recopying the data onto new storage media at regular intervals * taking steps to store and maintain these media properly These measures are labor-intensive, error-prone, and costly. They must be performed periodically for as long as the data are retained. Thus, the long-term preservation of digital data is a substantial undertaking. Although new storage media in development have excellent potential for long-term data retention, they are still a long way from an ideal solution. These matters will have to be worked out over the next 10 years or so. The Biggest Problem: Sustained Organizational Commitment In the study of organizational behavior and Industrial/Organizational Psychology, organizational commitment is, in a general sense, the employee's psychological attachment to the organization. Nearly all commentators agree that technological obsolescence ob·so·les·cent adj. 1. Being in the process of passing out of use or usefulness; becoming obsolete. 2. Biology Gradually disappearing; imperfectly or only slightly developed. represents a far greater threat to the preservation of digital archives than does media longevity. In all hardware and software market sectors, service lives of less than five years are commonplace. Even the most fragile media's life will likely exceed the continued availability of readers for them. Thus, efforts to preserve physical media provide only a short-term, partial solution. Technology obsolescence may not be the greatest problem compared to the lack of organizational commitment and the willingness to allocate sufficient resources to preservation efforts. The vicissitudes vicissitudes Noun, pl changes in circumstance or fortune [Latin vicis change] vicissitudes npl → vicisitudes fpl; peripecias fpl of organizational life make it highly problematic to sustain any management program in perpetuity Of endless duration; not subject to termination. The phrase in perpetuity is often used in the grant of an Easement to a utility company. in perpetuity adj. forever, as in one's right to keep the profits from the land in perpetuity. . Organizations and the people who run them change frequently. Any organization that contemplates establishing a program for the permanent preservation of digital data must consider this reality. One Example Pfizer in the United Kingdom provides a specific example of a digital preservation initiative. Pfizer is a United States-based international pharmaceutical company employing some 46,000 people in 57 countries. The company's annual sales are about $14.7 billion. Fortune magazine ranks Pfizer the world's sixth largest pharmaceutical company. As creator, user, and custodian of electronic records, Pfizer's needs include long-term, secure, managed storage of identified, business-critical records. In the pharmaceutical industry, new drug products may take 10 years or longer to develop. Products are subject to rigorous inspection by national regulatory bodies to establish their safety for human (or animal) use and their effectiveness for intended purposes. Records and data that support a marketed drug's development must be available for regulatory inspection during the compound's lifetime, a period generally assumed to be 40 years or longer. In response to these digital preservation requirements, Pfizer's central research department at the company's Sandwich facility designed and implemented a central electronic archive (CEA CEA carcinoembryonic antigen. CEA abbr. carcinoembryonic antigen CEA (Carcinoembryonic antigen) ) to provide long-term, secure, managed storage of the company's critical business records. The CEA is a product of close cooperation among several units: the computing services (IT) department, the pre-clinical information technology department, the records management unit (RMU RMU Robert Morris University (Moon Township, Pennsylvania) RMU Ring Main Unit RMU Remote Management Unit (Oscilloquartz) RMU Removal Unit (Kyoto Protocol) ), and several other interested departments. Under development for about four years, the Years, The the seven decades of Eleanor Pargiter’s life. [Br. Lit.: Benét, 1109] See : Time CEA became operational in 1996. Pfizer's CEA supports critical business records generated electronically that require long-term storage in a managed archival environment. Storage of these records in paper format is neither practical nor an alternative to storing them in their native electronic format. The CEA system, however, is not an electronic document management (EDM (Engineering Data Management) An information system that maintains the details of all engineering data while the product is in the design and concept phase. This includes geometry and changes to geometry. See PLM. EDM - Electronic Data Management ) system, since these software solutions typically support short-term retrieval and workflow needs rather than the myriad requirements for long-term management of electronic records. The CEA is a custom-designed application. Pfizer conducted a global survey of the archival and records management systems market and determined that there was no single packaged software See software package. solution that would meet the company's requirements. To function, they needed to provide direct user access to online and near-line storage of records, supported by retention schedule functionality and by system and record migration plans. Several systems developers and integrators worked with Pfizer during various stages of the CEA's development. The CEA's major features include: Records management functionality -- The CEA is a managed archive with all the records management functions that this term implies. Each record must be indexed by various metadata elements and be properly designated for retention scheduling and archive management actions under rules established by the records management unit. These rules treat records archived to the CEA in the context of their value to the whole company, not just to the department that created them. Data/document types management -- The CEA manages wide varieties of documents and data created on Pfizer's many computing platforms: word processing word processing, use of a computer program or a dedicated hardware and software package to write, edit, format, and print a document. Text is most commonly entered using a keyboard similar to a typewriter's, although handwritten input (see pen-based computer) and documents, spreadsheets, scanned images, raw data, processed data, and many others. Most records are created and managed while active on Pfizer's DEC VAX/Alpha servers. User requirements -- To describe and store records, users must complete various metadata fields in a computer screen window. Users assign a records series type for every record or group of records. The records series type links the record to a departmental retention schedule maintained by the records management unit. User-controlled archiving -- When records become inactive and are ready for storing, users copy them from the VAX (Virtual Address eXtension) A venerable family of 32-bit computers from HP (via Digital and Compaq) introduced in 1977 with the VAX-11/780. VAX models ranged from desktop units to mainframes all running the same VMS operating system, and VAXes could emulate PDP models to a format that permits their management by the CEA. Graphical user interface graphical user interface (GUI) Computer display format that allows the user to select commands, call up files, start programs, and do other routine tasks by using a mouse to point to pictorial symbols (icons) or lists of menu choices on the screen as opposed to having to software, installed on desktop PCs, activates the process, which captures the metadata and archives the files selected. The software preserves their integrity (i.e., their evidential ev·i·den·tial adj. Law Of, providing, or constituting evidence: evidential material. ev and informational value) so that they remain identifiable, accurate, and meaningful in the context of the purpose for which they were created. File restoration -- The CEA's functional components identify file formats and the software version that created the file. Files can be restored to any server as long as security restrictions have been satisfied and the server can accommodate files of that type. Restored files can be read by the source software (the software that created them) or by any other software version that is able to read that particular file. Records management unit responsibilities -- The RMU is responsible for overall management of the CEA's records retention aspects. This unit periodically reviews the CEA's holdings, applies the retention schedules, and provides ongoing user training and education. Hardware and software components -- Three distinct device types support the CEA application: 1) desktop PC clients, which have the CEA client software installed; 2) the Open VMS (1) (Virtual Memory System) A multiuser, multitasking, virtual memory operating system for the VAX series from Digital. VMS applications run on any VAX from the MicroVAX to the largest unit. See OpenVMS. system, which provides file services over the network to the PC clients; and 3) the CEA server, which stores all the archived files. The CEA application runs on a Sun UNIX system Noun 1. UNIX system - trademark for a powerful operating system UNIX, UNIX operating system operating system, OS - (computer science) software that controls the execution of computer programs and may provide various services supported by an optical disk jukebox for bulk data storage. The file index (i.e., the metadata describing each file) resides in an Oracle database, which also runs on the Sun server. Software for managing data migration -- Hierarchical storage management See HSM. (HSM (1) (Hierarchical Storage Management) The automatic movement of files from hard disk to slower, less-expensive storage media. The typical hierarchy is from magnetic disk to optical disc to tape. ) automates the migration process and supports the CEA. The HSM software separates active and inactive data, automatically migrating inactive data from primary storage devices (online disk drives) to near-line or off-line secondary or tertiary storage devices. The HMS HMS abbr. Her (or His) Majesty's Ship HMS (Brit) abbr (= His (or Her) Majesty's Ship) → Namensteil von Schiffen der Kriegsmarine software is Epoch from EMC (1) (EMC Corporation, Hopkinton, MA, www.emc.com) The leading supplier of storage products for midrange computers and mainframes. Founded in 1979 by Richard J. Egan and Roger Marino, EMC has developed advanced storage and retrieval technologies for the world's largest companies. , a leading provider of data storage products. Modular system design -- Since Pfizer's computing environment changes constantly, the CEA has a modular design In the context of systems engineering, modular design — or "modularity in design" — is an approach aiming to subdivide a system into smaller parts (modules) that can be independently created and then used in different systems to drive multiple functionalities. to facilitate hardware and software upgrades. For example, an additional desktop platform (such as Windows NT (Windows New Technology) A 32-bit operating system from Microsoft for Intel x86 CPUs. NT is the core technology in Windows 2000 and Windows XP (see Windows). Available in separate client and server versions, it includes built-in networking and preemptive multitasking. Workstation) can be added by modifying the desktop client software. The CEA also incorporates a plan to migrate electronic records forward in time when new technology platforms are deployed. Future plans include linking CEA record metadata with other corporate databases so that they can be validated or captured automatically. Pfizer is exploring intelligent software links, free text searches, or HTML HTML in full HyperText Markup Language Markup language derived from SGML that is used to prepare hypertext documents. Relatively easy for nonprogrammers to master, HTML is the language used for documents on the World Wide Web. as ways of adding metadata elements without additional user effort for archiving records. Other long-term preservation plans include migration of source software versions, migrating records to appropriate data standards, and purchasing "virtual" computers to run obsolete software via emulation protocols. Subsequent columns will explore the digital preservation problem's other dimensions Other Dimensions is a collection of stories by author Clark Ashton Smith. It was released in 1970 and was the author's sixth collection of stories published by Arkham House. It was released in an edition of 3,144 copies. as well as global solutions to it. They will delve more deeply into best global practices for digital preservation in both the archival and IT communities. These columns will also examine the emergence of digital preservation service providers from the vendor community who may offer new solutions to this global information management problem. REFERENCES Beagrie, N. and D. Greenstein, "A Strategic Policy Framework for Creating and Preserving Digital Collections." British Library British Library, national library of Great Britain, located in London. Long a part of the British Museum, the library collection originated in 1753 when the government purchased the Harleian Library, the library of Sir Robert Bruce Cotton, and groups of manuscripts. Research and Innovation Report 107. London: United Kingdom. 1998. Bowen, David B. "Practical Issues in Implementing a Central Electronic Archive." Proceedings of the DLM See ILM. DLM - Distributed Lock Manager on distributed VMS systems. Forum on Electronic Records. Brussels, Belgium: European Commission European Commission, branch of the governing body of the European Union (EU) invested with executive and some legislative powers. Located in Brussels, Belgium, it was founded in 1967 when the three treaty organizations comprising what was then the European Community . 1996. Bennett, J.C. "A Framework of Data Types and Formats, and Issues Affecting the Long-term Preservation of Digital Material." British Library Research and Innovation Report 50. 1997. Electronic Records Programs: Report on the 1994/95 Survey. Committee on Electronic Records, International Council on Archives. 1996. Feeney, Mary, ed. Digital Culture: Maximizing the Nation's Investment. London: National Preservation Office, The British Library, 1999. Final Guidelines on the Management of Electronic Records from Office Systems. Public Record Office. 1998. Guide to Depositing Data. The Data Archive, University of Essex The University of Essex is a British plate glass university. It received its Royal Charter in 1965. The university's main campus is located at Wivenhoe Park on the outskirts of Colchester (the oldest recorded town in Britain) in the English county of Essex, less than a mile from . Guide for Managing Electronic Records from an Archival Perspective. Committee on Electronic Records, International Council on Archives. 1996. Guidelines on Best Practices for Using Electronic Information. Office for Official Publications, European Commission. 1997. Haynes, D., D. Streatfield, T. Jowett, and M. Blake. "Responsibility for Digital Archiving and Long-term Access to Digital Data." British Library Research and Innovation Report 67. 1997. Hendley. T. "Comparison of Methods and Costs of Digital Preservation." British Library Research and Innovation Report 106. 1998. Lord, Philip. "Strategies and Tactics for Managing Electronic Data Records: A View From the Pharmaceutical Industry." Proceedings of the DLM Forum on Electronic Records. Brussels: European Commission, 1996. Matthews, G., A. Poulter, and E. Blagg. "Preservation of Digital Materials Policy and Strategy Issues for the UK." British Library Research and Innovation Report 41. 1997. Murdock, Alan. "Roles and Responsibilities in Managing an Electronic Archive." Proceedings of the DLM Forum on Electronic Records. Brussels: European Commission, 1996. Ross, S. and A. Gow. "Digital Archaeology: The Recovery of Digital Materials at Risk." British Library Research and Innovation Report 108. 1999. Statement of Principles: Preservation of and Long-term Access to Australian Digital Objects. National Library of Australia The National Library of Australia is located in Canberra, Australia. Established in 1960, the Library grew out of the Federal Parliamentary Library, which was established in 1901. , National Preservation Office. 1997. Waters, D. and J. Garrett. Preserving Digital Information. Washington, D.C.: Commission on Preservation and Access and Research Libraries Group, 1996. David Stephens, CRM (Customer Relationship Management) An integrated information system that is used to plan, schedule and control the presales and postsales activities in an organization. , CMC (Common Messaging Calls) A programming interface specified by the XAPIA as the standard messaging API for X.400 and other messaging systems. CMC is intended to provide a common API for applications that want to become mail enabled. 1. , FAI is vice president for the records management consulting Noun 1. management consulting - a service industry that provides advice to those in charge of running a business service industry - an industry that provides services rather than tangible objects firm of Zasio Enterprises Inc. He has been a consultant in the field of records management for more than 18 years and has published books and articles about information management in the United States United States, officially United States of America, republic (2005 est. pop. 295,734,000), 3,539,227 sq mi (9,166,598 sq km), North America. The United States is the world's third largest country in population and the fourth largest country in area. and abroad. The author may be reached at dostephens@zasio.com. |
|
||||||||||||||||||

Printer friendly
Cite/link
Email
Feedback
Reader Opinion