Printer Friendly
The Free Library
5,666,730 articles and books
Member login
User name  
Password 
 
Join us Forgot password?

The state of the art and practice in digital preservation.


The goal of digital preservation is to ensure long-term access to digitally stored information. In this paper, we present a survey of techniques used in digital preservation. We also introduce representative digital preservation projects and ease studies that provide insight into the advantages and disadvantages of different preservation strategies. Finally, the pros and cons pros and cons
Noun, pl

the advantages and disadvantages of a situation [Latin pro for + con(tra) against]
 of current strategies, critical issues for digital preservation, and future directions are discussed.

Key words: digital preservation; emulation; encapsulation (1) In object technology, the creation of self-contained modules that contain both the data and the processing. See object-oriented programming.

(2) The transmission of one network protocol within another.
; migration; standardization standardization

In industry, the development and application of standards that make it possible to manufacture a large volume of interchangeable parts. Standardization may focus on engineering standards, such as properties of materials, fits and tolerances, and drafting
; XML XML
 in full Extensible Markup Language.

Markup language developed to be a simplified and more structural version of SGML. It incorporates features of HTML (e.g., hypertext linking), but is designed to overcome some of HTML's limitations.
.

1. Introduction

Information preservation is one of the most important issues in human history, culture, and economics, as well as the development of our civilization. While earliest information was recorded in carvings on stone, ceramic, bamboo, or wood, the development of civilization paved the way for new storage media and techniques for recording information, such as writing on silk or printing on paper. Eventually we were able to put photographic images on film and music on records. A revolutionary change occurred in the information storage field with the invention of electronic storage media.

With the advent of high-performance computing High-speed computing, which typically refers to supercomputers used in scientific research.  and high-speed networks, the use of digital technologies is increasing rapidly. Digital technologies enable information to be created, manipulated, disseminated, located, and stored with increasing ease. Ensuring long-term access to the digitally stored information poses a significant challenge, and is increasingly recognized as an important part of digital data management (1,2).

The evolution of data storage media and the development of the preservation technology can be described as shown in Fig. 1. This diagram lists the various media used in data storage (both digital and analog) and the techniques needed to ensure that the data on them is preserved. It also highlights the trend from analog to digital/optical storage media and indicates the transfer of data from one generation of media to the next. It is clear that while it is easier to create, amend, and distribute digital data, the media storing this data such as optical discs are not as robust as traditional analog media such as paper or film. In view of modern information preservation requirements, this paper will focus on the aspects of the technical strategies used in digital information preservation.

Digital preservation involves the retention of both the information object and its meaning. It is therefore necessary that preservation techniques be able to understand and re-create the original form or function of the object to ensure its authenticity and accessibility. Preservation of digital information is complex because of the dependency digital information has on its technical environment. Furthermore, as newer digital technologies rapidly appear and older ones are discontinued, information that relies on obsolete technologies soon becomes inaccessible. Therefore, digital resources present more difficult problems than conventional analog media such as paper-based books (3).

Recently, several approaches for digital preservation have been identified and presented. Conventional methods are mainly technology emulation, information migration, and encapsulation (4,5,6,7,8). However, there is a lack of proven preservation methods to ensure that the information will continue to be readable.

To promote a solid understanding of the pros and cons of different preservation techniques, this paper tries to present a comprehensive survey of them through a review of a wide range of literature and representative projects. We suggest a preservation strategy based on XML (eXtensible Markup Language See XML.

(language, text) Extensible Markup Language - (XML) An initiative from the W3C defining an "extremely simple" dialect of SGML suitable for use on the World-Wide Web.

http://w3.org/XML/.
) [9] for interoperability and interchangeability of preserved digital information. The most appropriate preservation strategy should be determined by considering various aspects including cost-effectiveness, legal restrictions, and user access requirements. However, as stated earlier, this paper particularly focuses on the technical issues of preservation.

This paper is organized as follows. Section 2 summarizes various techniques for digital preservation. Their advantages and disadvantages are presented through a literature survey. By introducing current projects and case studies, the application of these preservation techniques is described in detail in Sec. 3. Finally, points of discussion and a summary are given in Sec. 4.

2. Digital Preservation Techniques

This section introduces the current strategies for digital preservation. Techniques for the preservation of digital information include technology preservation, technology emulation, information migration, and encapsulation.

Digital resources can be stored on any medium that can represent their binary digits See bit.  or bits, such as a CD-ROM CD-ROM: see compact disc.
CD-ROM
 in full compact disc read-only memory

Type of computer storage medium that is read optically (e.g., by a laser).
 or a DVD DVD: see digital versatile disc.
DVD
 in full digital video disc or digital versatile disc

Type of optical disc. The DVD represents the second generation of compact-disc (CD) technology.
. Rothenberg [10] defines a bit stream as an intended meaningful sequence of bits with no intervening spaces, punctuation punctuation [Lat.,=point], the use of special signs in writing to clarify how words are used; the term also refers to the signs themselves. In every language, besides the sounds of the words that are strung together there are other features, such as tone, accent, and , or formatting. To preserve that bit stream, the first requirement is to ensure that the bit stream is stored on a stable medium. If the digital medium deteriorates or becomes obsolete before the digital information has been copied onto another medium, the data will be lost.

Therefore, digital preservation involves copying the digital information onto newer media before the old media becomes so obsolete that the data cannot be accessed. This is referred to as copying or refreshing (6,8). This process preserves the integrity of the digital information. Well-established techniques for preserving the integrity of the digital information exist at this level (5).

For timely refreshing, the lifetime of digital storage media must be predicted. The lifetime of a medium determines the period of time in which the information recorded on the medium is stored safely without loss. The specification of the lifetime of the medium will prompt librarians or archivists to refresh their media before medium deterioration. Recently, Zwaneveld (11) suggested that media could be rated into five classes according to according to
prep.
1. As stated or indicated by; on the authority of: according to historians.

2. In keeping with: according to instructions.

3.
 certain criteria including the lifetime.

However, simply copying the digital information is not always sufficient as a preservation strategy (5). One has to ensure that the information can be retrieved and processed in the future. Retrieving a bit stream requires a hardware device such as a disk drive for reading the physical representation of the bits from the medium, as well as driver software.

Existing preservation strategies can be broadly classified into two main approaches as shown in Fig. 2. The first is the more conservative approach where the original technological environment is fully preserved for decoding de·code  
tr.v. de·cod·ed, de·cod·ing, de·codes
1. To convert from code into plain text.

2. To convert from a scrambled electronic signal into an interpretable one.

3.
 the digital information in the future. This approach can be further divided into two preservation techniques. The first is to preserve the working replicas of all computer hardware and software platforms for future use. This is referred to as the technology preservation strategy (5). The other is to program the newer computer systems to emulate on demand the older obsolete platforms and operating systems Operating systems can be categorized by technology, ownership, licensing, working state, usage, and by many other characteristics. In practice, many of these groupings may overlap. . This is the so-called technology emulation strategy (10).

The second approach is to overcome the technical obsolescence ob·so·les·cent  
adj.
1. Being in the process of passing out of use or usefulness; becoming obsolete.

2. Biology Gradually disappearing; imperfectly or only slightly developed.
 of file formats. It may also be classified into two techniques. The first is to transform or convert the old digital resource to a format that is independent of the particular hardware and software that were applied to create them. This is called the information migration strategy (5). The second technique, termed encapsulation, is where a digital object and anything else necessary to provide access to that object are grouped together and preserved (12).

2.1 Technology Preservation

This solution supposes that complete museums of obsolete equipment could be maintained in order to replicate any old configuration of hardware and software (13). This strategy involves preserving an original application program, operating system operating system (OS)

Software that controls the operation of a computer, directs the input and output of data, keeps track of files, and controls the processing of computer programs.
 software, and hardware platform (6).

The advocates of this strategy emphasize that the original environment needs to be run to really preserve the behavior as well as the look and feel of the digital object. For some digital objects, this may be the best solution at least in the short-term because it ensures that the material is accessible by preserving the access tools as well as the object itself.

However, various issues including space, maintenance, and costs may make this impossible in the longer term. Specifically, equipment ages and breaks, documentation disappears, vendor support vanishes, and the storage medium as well as the equipment deteriorates (14). This strategy also limits the portability of the resource since it will depend on hardware stored in specific locations (3).

2.2 Technology Emulation

This strategy has a lot in common with the technology preservation strategy. It involves preserving the original application program. Emulator programs (networking) Emulator program - (EP) IBM software that emulates a 2701/2/3 hard-wired IBM 360 communications controller and resides in a 370x/372x/374x comms controller.

See also Partitioned Emulation Program (PEP).
 can be designed and run on future computer platforms. The emulator is programmed to mimic the behavior of old hardware platforms Each hardware platform, or CPU family, has a unique machine language. All software presented to the computer for execution must be in the binary coded machine language of that CPU. Following is a list of the major hardware platforms in existence today. See platform.  and operating system software, such as for games and executable files See executable code.  (6,16). However, this strategy does not involve preserving ageing hardware and original operating system software.

The goal of emulation is to preserve the look and feel of the digital object as well as its functionality. The essence of this strategy is to copy the technical context of the resource allowing the original object or a refreshed copy of the original object to be used in the future. Rothenberg illustrates the conceptual view See view.  of the relationships among elements of the emulation process as shown in Fig. 3 (15). This approach decouples application programs from the platform via a virtual machine with the result that applications can run on different platforms by migrating the virtual machine to those platforms. The Java virtual machine A Java interpreter. The Java Virtual Machine (JVM) is software that converts the Java intermediate language (bytecode) into machine language and executes it. The original JVM came from the JavaSoft division of Sun.  might be an example.

Hendley sees emulation as a short to medium term strategy or a specialist strategy where the need to maintain the physical presence of the original digital resource is of great importance to the users. Again, he sees technology emulation being primarily used in cases where digital resources cannot be converted into software independent formats and migrated forward.

Waugh et al. (12) indicate that the application software itself could contain viruses that would result in the loss of information over time. Particularly, they note that emulation requires preserving a significant amount of information. They consider it useful if the goal is to preserve the software as an artifact A distortion in an image or sound caused by a limitation or malfunction in the hardware or software. Artifacts may or may not be easily detectable. Under intense inspection, one might find artifacts all the time, but a few pixels out of balance or a few milliseconds of abnormal sound  itself and if future user organizations lack sufficient knowledge to understand the format of the digital information. Granger (16) discusses its potential advantages such as the re-creation of the look and feel of the resource. They identify the possible disadvantages as the undefined nature of technological change and the complexity of creating emulator specifications. His conclusion is that emulation is not a complete digital preservation solution but a partial one.

On the contrary, Russell (3) states that emulation potentially offers the best solution for very long-term preservation of digital material, especially for resources for which the value is unknown and where future use of the material is unlikely. Woodyard (1) points out that emulation could be a more suitable solution than migration for long-term access to complex digital resources such as executable files. Gilheany (18) discusses the need for emulators to permanently preserve the functionality of computers. Rothenberg advocates and recommends emulation as the best possible solution (10,18) and presents results from the first phase of an emulation experiment (15) in the Networked European Deposit Library (NEDLIB NEDLIB Networked European Deposit Library ) (1) project (19).

Holdsworth and Wheatley (20) consider the method of Rothenberg, in which the production of an emulator can be postponed by instead producing an emulator specification at the time of platform obsolescence, as a very risky approach to preserving information forever. This is because there is no guarantee that a specification produced can be used to produce an emulator in the future. They present some illustrative il·lus·tra·tive  
adj.
Acting or serving as an illustration.



il·lustra·tive·ly adv.

Adj. 1.
 guidelines for the use of emulation and argue that emulation is a valid method of digital preservation, at least for systems where the documentation is not available in electronic form. Holdsworth (21) also suggests the use of widely available programming languages such as C in the implementation of emulation.

The emulation approach requires detailed specifications for the outdated hardware and operating system software (10). Therefore, standard formats for the emulation specification need to be developed. The extent to which emulation should mimic the original technical environment entirely or emulates only those components necessary to access the data remains an issue for debate.

2.3 Information Migration

The migration of digital information refers to the periodic transfer of digital materials from one hardware and software configuration to another, or from one generation of computer technology to a subsequent generation (5). The purpose of migration is to preserve the integrity of digital objects and to retain the ability of users to retrieve, display, and use them in the face of constantly changing technology.

The Open Archiving Information System (OAIS OAIS Open Archival Information System (library and information science)
OAIS Officer Assignment Information System
OAIS Opinion, Attitude, and Interest Survey
) model (22), developed by the Consultative Committee for Space Data Systems The Consultative Committee for Space Data Systems (CCSDS) was formed in 1982 by the major space agencies of the world to provide a forum for discussion of common problems in the development and operation of space data systems.  (CCSDS CCSDS Consultative Committee for Space Data Systems
CCSDS Consultative Committee for Space Data System (NASA)
CCSDS Consultative Committee on Space Data Standards
CCSDS Consultative Committee for Standard Data Services
) (2), breaks migration into four categories: refreshment, replication, repackaging, and transformation. Refreshment ensures that a reliable copy of the bit stream of a digital object is maintained while replication and repackaging ensure that a manageable package of the object is available. On the other hand, transformation actually modifies the bit stream of a digital object and this is what is considered as the process of migration in this paper.

Rather than focus on the technology, information migration tends to focus on the intellectual content and on ensuring its accessibility using current technology (3). This strategy could be facilitated through copying the digital information to an analog medium or through application programs that are backward compatible Refers to hardware or software that is compatible with earlier versions of the product. Also called "downward compatible." Contrast with forward compatible.

backward compatible - backward compatibility
. It could also be facilitated through the conversion of digital resources into a small number of standard formats that are hardware and software independent (6).

The first solution involves the transfer of digital resources from less stable to more stable analog media such as paper or microfilm A continuous film strip that holds several thousand miniaturized document pages. See micrographics.


Microfilm and Microfiche
 (23). While analog media have better proven long-term reliability, they may not provide adequate representations of the original object. This solution may also lead to severe loss in functionality and presentation of the original digital object. It is not possible, for example, to microfilm the equations embedded Inserted into. See embedded system.  in a spreadsheet, to print out an interactive video, or to preserve a multimedia document as a flat file (5,18,24).

The second solution, software that is backward compatible, will simplify migration. The latest versions of most popular word processing word processing, use of a computer program or a dedicated hardware and software package to write, edit, format, and print a document. Text is most commonly entered using a keyboard similar to a typewriter's, although handwritten input (see pen-based computer) and  packages, for example, will be able to decode (1) To convert coded data back into its original form. Contrast with encode.

(2) Same as decrypt. See cryptography.

(cryptography) decode - To apply decryption.
 files created on earlier versions of the same package. While this strategy may work over the short term for simple digital resources created on some of the leading application packages, it cannot be relied upon over the medium to long-term or for more complex digital resources. Meanwhile, interoperability of systems will also facilitate migration because a specific program is not needed to access a digital resource. However, these features become harder to achieve with greater software complexity. Hence, any information migration through backward compatibility See backward compatible.

(jargon) backward compatibility - Able to share data or commands with older versions of itself, or sometimes other older systems, particularly systems it intends to supplant.
 or interoperability between application software packages would represent a short-term strategy (6).

Another approach entails initially migrating digital information from the great multiplicity of formats to a smaller more manageable number of standard formats. These are less volatile than the wide array of nonstandard non·stan·dard  
adj.
1. Varying from or not adhering to the standard: nonstandard lengths of board.

2.
 formats and can still encode (1) To assign a code to represent data, such as a parts code. Contrast with decode.

(2) To convert from one format or signal to another. See codec and D/A converter.

(3) The term is sometimes erroneously used for "encrypt.
 the complexity of structure and form of the original resource (5). Decisions on which formats to convert digital resources into should be based on the structure of the digital resources, on the objectives set by the collection manager, and on the requirements of the users of that collection (6). For example, one important issue is whether the top priority is given to preserving the ability to process the digital resource or to preserving the format or visual presentation of the digital resource.

A report (5) from the task force on the Commission on Preservation and Access (CPA (Computer Press Association, Landing, NJ) An earlier membership organization founded in 1983 that promoted excellence in computer journalism. Its annual awards honored outstanding examples in print, broadcast and electronic media. The CPA disbanded in 2000. ) and the Research Libraries Group (RLG RLG Research Libraries Group, Inc. (Dublin, OH)
RLG Ring Laser Gyro
RLG RedLightGreen Project
RLG Royal Laotian Government
RLG Resident Love Goddess
RLG Right, Let's Go
) makes the point that there are a variety of migration strategies. Particularly, Wheatley (25) attempted to break migration into more specific cases in the Creative Archiving at Michigan and Leeds: Emulating the Old and the New (CAMiLEON) (3) project: minimum migration, preservation migration, recreation, human conversion migration, and automatic conversion migration. He applied each case to test materials and discussed its usefulness for digital preservation.

Recently, the concept of migration on request has been conceived by the Consortium of University Research Libraries Exemplars for Digital Archives (CEDARS) (4) project (3), in which original objects are maintained and preserved in addition to a migration tool that runs on a current computing platform See platform.  and can be employed by users as necessary. When the current platform becomes obsolete, the migration tool will no longer work. Therefore, the preservation problem in this case is obviously focused on the maintenance of the migration tool. Preserving the original bit stream and a migration tool is compatible with emulation strategy.

The task force report and Hendley consider this migration strategy the most promising for the future. Bearman (26) also believes that it is the most promising strategy for preservation of electronic records and the only one that has worked to date. Russell (3) points out that the migration strategy is the most practical approach at least for the short and medium term.

However, for resources where it is difficult to disentangle format from content such as complex multimedia documents, this is not an easy option. Multiple components may require separate migration activities and this can be very complex. Indeed, for some multimedia resources, migration may not be possible without significant compromises in functionality. Russell states that the costs of migration may, in the long run, exceed those costs necessary for preserving either the technology itself or the detailed technical specification that will allow future emulation.

Waugh et al. believe that the key to successful migration is the knowledge of the original data format and a close match in functionality between the original and replacement format. They consider migration the simplest over the short and medium term for digital information that is actively managed. On the other hand, Rothenberg dismisses the strategy of migrating electronic records systematically before they become inaccessible (18).

Lawrence et al. (27) have attempted to quantify the risks involved in the use of migration and have analyzed several commercially available migration tools for their relative accuracy. This practical investigation has led to the identification of some key requirements for migration software: access to the source file specification, analysis of the differences between it and the target format, identification of the degree of risk in the case of a mismatch mismatch

1. in blood transfusions and transplantation immunology, an incompatibility between potential donor and recipient.

2. one or more nucleotides in one of the double strands in a nucleic acid molecule without complementary nucleotides in the same position on the other
, accurate conversion of the source file to a target specification, and so on.

However, none of current migration methods or tools comes close to meeting all of these requirements. It is important to realize that technical standards may change rapidly and that this strategy may not ensure that digital information remains accessible. Recently, however, XML has become widely accepted as a universal standard format in various fields of digital library, electronic commerce, and the Web. Standardization based on XML could be helpful in addressing the digital preservation problem.

2.4 Encapsulation

Encapsulation aims to overcome the problems of the technological obsolescence of file formats by making the details of how to interpret the digital object part of the encapsulated encapsulated Localized Oncology adjective Confined to a specific area, surrounded by a thin layer of fibrous tissue; encapsulation generally refers to a tumor confined to a specific area, surrounded by a capsule. See Islet encapsulation.  information. This strategy involves creating the original application that was used to create or access the digital object on future computer platforms. Part of the process of encapsulation may be to migrate the record to a more easily documented format.

The concept of encapsulation is similar to the Bento A data structure used to store embedded documents in an OpenDoc compound document. Bento, which stands for lunch box in Japanese, provides a "container" to hold the data and a format for defining its contents.  (5) (28) container, which was developed to increase compatibility of data between computer applications. Bento is a specification for storage and interchange of compound content, and is designed to be platform and content neutral. Thus, it provides a convenient container for transporting any type of compound content between multiple platforms Refers to two or more operating environments, which typically include the CPU family and operating system. For example, if versions of a program run on Windows and the Macintosh, the software is said to support multiple platforms. .

Encapsulation can be achieved by using physical or logical structures called containers or wrappers In data mining and treatment learning, wrappers were used by Ron Kohavi and George John. Their idea was to wrap their treatments learners in a preprocessor that would search to make subsets from the current set of attributes.  to provide a relationship between all information components such as the digital object and some supporting information including metadata (29,30). The reference model for the OAIS also describes the types of supporting information that should be included in an encapsulation. They include the representation information used to interpret the bits appropriately, the provenance prov·e·nance  
n.
1. Place of origin; derivation.

2. Proof of authenticity or of past ownership. Used of art works and antiques.
 to describe the source of the object, the context to describe how the object relates to other information outside the container, a reference to one or more identifiers to uniquely identify the object, and fixity fix·i·ty  
n. pl. fix·i·ties
1. The quality or condition of being fixed.

2. Something fixed or immovable.
 to provide evidence that the object has not been altered.

The Universal Preservation Format (UPF UPF Universitat Pompeu Fabra (Barcelona, Spain)
UPF University Press of Florida
UPF Ultraviolet Protection Factor
UPF Universal Preservation Format
UPF Upcountry People's Front (Sri Lanka) 
) (31,32,33) is a method being developed for digital preservation, based on the theory of encapsulation. It is a self-describing storage technology, which uses a wrapper A data structure or software that contains ("wraps around") other data or software, so that the contained elements can exist in the newer system. The term is often used with component software, where a wrapper is placed around a legacy routine to make it behave like an object.  to hold the digital object and the metadata together to protect against technological obsolescence. The Digital Rosetta Stone Rosetta Stone: see under Rosetta.
Rosetta Stone

Inscribed stone slab, now in the British Museum, that provided an important key to the decipherment of Egyptian hieroglyphs.
 (34) is a method for storing the representation information needed to interpret the digital content of an object separate from the encapsulation to avoid duplication of effort and inefficient use of storage space.

Encapsulation has been widely promoted by Rothenberg who is a strong advocate of encapsulation and emulation methods for digital preservation. Day (30) and Shepard (35) have also supported the encapsulation approach. However, Bearman (26) disputes the theory of Rothenberg stating that it is not clear as to how metadata encapsulation strategies may be practically implemented.

Waugh et al. argue that encapsulation is the best basis for long-term preservation. They introduce three challenges of encapsulation: the requirements for applications to generate encapsulated records, the potential storage overhead of including documentation about the format within each record, and information about unpublished data formats. However, even if format specifications are publicly available, they are often incomplete and substantial components of file specifications often consist of nonstandard elements.

Encapsulation can be considered to be a type of migration technique. Although documentation may delay the need for migration for a long time, the encapsulated information will eventually need to be migrated. Therefore, encapsulation techniques can be applied to the digital resources whose format is well known and that are unlikely to be accessed actively.

2.5 The Digital Tablet See digitizer tablet, pen tablet and tablet PC.

Kranch (36,37) proposes developing a digital tablet for preservation. The digital tablet technique does not precisely fit into the above categories. The tablet would have a self-contained power source, present the stored digital information on a screen as glyphs from a written language appropriate for the information, and have touch-sensitive controls to change the presentation and manipulate the information. Additionally, it should be able to withstand millennia of neglect under harsh conditions yet cost no more than a few dollars to produce and encode. It should have a storage capacity of dozens or hundreds of terabytes, and contain a simple serial read-only port to download the original digital information into an external system along with instructions on how to do it. However, the digital tablet may be considered to be another technology preservation method.

3. Projects and Case Studies

This section introduces representative projects for digital preservation, describes their goals and results, and presents their specific preservation strategies. Recently, Cloonan and Sanett (38) presented a report on the initial phase of projects involved in developing, evaluating, and/or implementing digital preservation strategies in Europe, Australia, and the United States United States, officially United States of America, republic (2005 est. pop. 295,734,000), 3,539,227 sq mi (9,166,598 sq km), North America. The United States is the world's third largest country in population and the fourth largest country in area. .

3.1 Australian Projects

Australia has been examining digital preservation issues since 1994. Several projects are aiming to preserve various digital materials such as electronic records, online publications, digital audio resources, theses, and cartographic car·tog·ra·phy  
n.
The art or technique of making maps or charts.



[French cartographie : carte, map (from Old French, from Latin charta, carta, paper made from papyrus
 materials (1,12,39). The Victorian Electronic Records Strategy (VERS) (6) project (12) produced a standard for management and preservation of electronic records (40,41). The standard proposed by the VERS project recommends encapsulating the documents and their context in a single object based on XML.

The Preserving and Accessing Networked Documentary Resources of Australia (PANDORA) (7) project by the National Library of Australia The National Library of Australia is located in Canberra, Australia. Established in 1960, the Library grew out of the Federal Parliamentary Library, which was established in 1901.  has led the way in archiving the Web (42,43). The primary objective of PANDORA is to capture, archive, and provide long-term access to significant online publications. The project aims at addressing both archiving and preservation processes. The PANDORA archiving processes refer to the collection and provision of immediate access to the publications while preservation processes involve managing the materials and applying appropriate strategies (e.g., migration) to ensure long-term access. The archiving processes have been developed while techniques for managing long-term access to these digital resources is still being developed.

The project has also embarked on migration experiments with some HTML HTML
 in full HyperText Markup Language

Markup language derived from SGML that is used to prepare hypertext documents. Relatively easy for nonprogrammers to master, HTML is the language used for documents on the World Wide Web.
 pages. The format of these pages is not yet obsolete, but the HTML specification (44) has declared a number of mark-up tags as dead and not to be supported in this or future versions. The aim of these trials is to make changes in the HTML source code to remove tags declared as dead and replace them with current tags, effectively migrating the source code to a different version to reduce problems of future compatibility with Web browsers The following is a list of web browsers. Historical
Historically important browsers
In order of release:
  • WorldWideWeb, February 26, 1991
  • Erwise, April 1992
  • ViolaWWW, May 1992, see Erwise
. Additionally, the National Archives of Australia The National Archives of Australia is a body established by the Government of Australia for the purpose of preserving Commonwealth Government records. It is an Executive Agency of the Department of Communications, Information Technology and the Arts and reports to the Minister for  is undertaking a project to develop advice for commonwealth agencies on using migration as a preservation strategy for electronic records (45).

3.2 CEDARS

The Consortium of University Research Libraries, which represents both university and national libraries across the UK and Ireland, leads the CEDARS project. The project is based on the OAIS model described earlier. Essentially it is an archival model for an archiving system but does not explicitly include a preservation module. Current projects investigating the use of the model include NEDLIB in Europe, CEDARS in the UK, and PANDORA in Australia.

The CEDARS project aims to address the strategic, methodological, and practical issues of digital preservation as well as providing guidance for libraries in best practices. Specifically, the project is producing guidelines for developing digital collection management policies and preserving different classes of digital resources. Additionally, it is performing an analysis of the cost implications of digital preservation. The project is running pilot projects to test and promote the chosen strategy for digital preservation. The CEDARS Data Preservation Strategies Working Group is looking at preservation issues that are related with migration, emulation, and data refreshing.

Specifically, the project has done some work comparing migration and emulation across older digital materials in conjunction with the CAMiLEON project (20,25). They suggest that both migration and emulation strategies are viable for different types of digital materials. They also believe that no strategy is a panacea Some antidote or remedy that completely solves a problem. Most so-called panaceas in this industry, if they survive at all, wind up sitting alongside and working with the products they were supposed to replace. , and the strategy adopted for providing access to preserved resources will very much depend on the nature of the resource itself and the reason for its preservation. More work is planned to investigate the information loss associated with each strategy.

3.3 CAMiLEON

The CAMiLEON project is funded by the Joint Information Systems Committee (JISC JISC Joint Information Systems Committee (UK)
JISC Japan Industrial Standards Committee
JISC Joint Industry Safety Committee
) in the UK and the National Science Foundation (NSF NSF - National Science Foundation ) in the USA. The project looks at the issues surrounding the implementation of technology emulation as a digital preservation strategy. The project recognizes the potential of emulation for the retention of the functionality and the look and feel of digital objects [20]. It aims to develop tools, guidelines, and costs for emulation as compared to other digital preservation options.

The project is performing user testing of various digital resources both in their original environments and in emulated environments. The project presents guidelines for use of emulation and argues that emulation is a valid method for both complex digital resources that include executable files and resources for which the documentation is not available in electronic form.

3.4 NEDLIB

The NEDLIB project was initiated by a permanent standing committee of the Conference of European National Libraries (CENL CENL Conference of European National Librarians
CENL Conference of European National Libraries
) in 1998, with funding from the European Commission's Telematics Application Programme. Eight national libraries in Europe, one national archive A national archive is a central archive maintained by a nation. List of national archives
  • National Archives of India
  • Archives nationales (France)
  • Archives New Zealand
  • Arquivo Nacional da Torre do Tombo, Portugal
  • Archivo General de Indias, Spain
, and major publishers are participating in the project. The National Library of the Netherlands leads the project.

The project aims to develop a common architectural framework and basic tools for building deposit systems for electronic publications. The project has also adopted the OAIS migration model described earlier. The main objective of the NEDLIB project is to provide better insight into the merit and weakness of different long-term preservation strategies. The project defines the characteristics of electronic publications and other categories of digital deposit material and associated preservation and authenticity requirements. It is recognized that many aspects including cost-effectiveness, legal restrictions, agreements with publishers, and user access requirements ultimately need to be taken into account when policy choices for preservation strategies are set. However, the project focuses on the technical issues of preservation.

The project has taken a first step to test the technicalities of the preservation mechanisms by starting an emulation experiment. The fundamental idea of the work is to test whether emulating obsolete computer hardware on future systems could be used to ensure long-term access to digital publications. Rothenberg has performed the first phase of this experimental work (15). It involved developing a prototype experimental environment for trying out emulation-based preservation and using commercial emulation tools to provide an initial proof-of-concept. The experimental results indicate that emulation should work in principle, assuming that suitable emulators for obsolete computing platforms can be hosted on future platforms.

3.5 Kulturarw [3] Heritage

The Kulturarw [3] Heritage Project [8] of the Royal Library in Sweden is testing methods of collecting, archiving, and providing access to Swedish electronic documents. Web crawlers See crawler and WebCrawler.  or robots are used in order to collect all the Swedish Web pages automatically. Although the project currently does not focus on preservation, it is growing into a broader Nordic initiative that may explore the long-term preservation of this archive.

3.6 Library of Congress

Recently, the Computer Science and Telecommunications Board (CSTB CSTB Centre Scientifique et Technique du Batiment (France)
CSTB Computer Science and Telecommunications Board (NRC)
CSTB Computer Science and Technology Board
CSTB Computer Science and Telecommunication Board
) of the National Academies (46,47) convened the Committee on the Information Technology Strategy for the Library of Congress for advice on digital preservation. The committee report includes specific recommendations for enhancing technology infrastructure, particularly in the area of networks, databases, and information technology security.

The Library of Congress' pilot project, working with the Internet Archive See Wayback Machine and Web archiving.  (9), has worked through all aspects of archiving the Web in the area of political Web sites. This project uses the Digital Library SunSITE Collection and Preservation Policy (10) from the University of California, Berkeley The University of California, Berkeley is a public research university located in Berkeley, California, United States. Commonly referred to as UC Berkeley, Berkeley and Cal , which provides several digital collecting levels, as guidance.

3.7 NARA Nara (nä`rä), city (1990 pop. 349,349), capital of Nara prefecture, S Honshu, Japan. An ancient cultural and religious center, it was founded in 706 by imperial decree and was modeled after Chang'an (see Xi'an), the capital of T'ang China. : Persistent Archives and Electronic Records Management

The NARA (National Archives National Archives, official depository for records of the U.S. federal government, established in 1934 by an act of Congress. Although displeasure concerning the method of keeping national records was voiced in Congress as early as 1810, the United States continued  and Records Administration) project, (11) which is led by the San Diego Supercomputer Center “SDSC” redirects here. For the Satish Dhawan Space Centre, see Satish Dhawan Space Centre.

The San Diego Supercomputer Center (SDSC) is an organized research unit of the University of California, San Diego (UCSD).
 and funded by NARA, aims to develop a persistent archive to support ingestion ingestion /in·ges·tion/ (-chun) the taking of food, drugs, etc., into the body by mouth.

in·ges·tion
n.
1. The act of taking food and drink into the body by the mouth.

2.
, archival storage, information discovery, and preservation of digital collections. One of its premises is the importance of preserving the organization of digital collections simultaneously with the digital objects that comprise the collection (48).

The ultimate goal is to preserve not only the original data, but also the context that permits the data to be interpreted. The project proposes an approach for maintaining digital data for hundreds of years through development of an environment that supports migration of collections onto new software systems. The proposed infrastructure combines elements from supercomputer supercomputer, a state-of-the-art, extremely powerful computer capable of manipulating massive amounts of data in a relatively short time. Supercomputers are very expensive and are employed for specialized scientific and engineering applications that must handle very  centers, digital libraries, and distributed computing environments See DCE.

Distributed Computing Environment - (DCE) An architecture consisting of standard programming interfaces, conventions and server functionalities (e.g. naming, distributed file system, remote procedure call) for distributing applications transparently across networks
. The project emphasizes the synergy that is achieved through the identification of the unique capabilities provided by each environment, and the construction of interoperability mechanisms for integrating these environments. According to this project, collection-based persistent archives are now feasible and can manage massive amounts of information based on XML (49).

3.8 InterPARES

The International Research on Permanent Authentic Records Authentic Records is an independent record label based in Des Moines, Iowa. It was created by the band The Nadas and has signed a number of rock artists, particularly in the Midwest.  Electronic Systems (InterPARES) (12) project [38] is a multinational research initiative, in which archival scholars, computer engineering scholars, national archival institutions, and private industry representatives are collaborating to develop the theoretical knowledge and methodology required for the permanent preservation of authentic records created using electronic systems [50].

The research areas include identifying the electronic record elements that need to be maintained, developing criteria to appraise appraise v. to professionally evaluate the value of property including real estate, jewelry, antique furniture, securities, or in certain cases the loss of value (or cost of replacement) due to damage.  electronic records for preservation, and formulating principles for development of international, national, and organizational preservation strategies. Specifically, the research areas are divided into four complementary domains: authenticity, appraisal, preservation, and strategies. In terms of authenticity, the purpose is to identify the specific elements of electronic records that must be preserved, over time and across technologies, in order to verify the record's authenticity. As a first step, a template to guide the analysis of electronic records was developed and is being evaluated. The project is based in the School of Library, Archival and Information Studies at the University of British Columbia Locations
Vancouver
The Vancouver campus is located at Point Grey, a twenty-minute drive from downtown Vancouver. It is near several beaches and has views of the North Shore mountains. The 7.
 in Canada. The current phase of the project would end on December 31, 2001.

3.9 PRISM prism, in optics, a piece of translucent glass or crystal used to form a spectrum of light separated according to colors. Its cross section is usually triangular.

The Preservation, Reliability, Interoperability, Security, Metadata (PRISM) (13) project [29] of the Cornell University Cornell University, mainly at Ithaca, N.Y.; with land-grant, state, and private support; coeducational; chartered 1865, opened 1868. It was named for Ezra Cornell, who donated $500,000 and a tract of land. With the help of state senator Andrew D.  is a 4 year project funded by the Digital Library Initiative Digital Library Initiative - A project to research digital libraries which aims to provide real collections to real users (high school students, University researchers and students, users in public libraries).  to investigate and develop policies and mechanisms needed for information integrity in digital libraries. The project focuses on long-term survivability sur·viv·a·ble  
adj.
1. Capable of surviving: survivable organisms in a hostile environment.

2. That can be survived: a survivable, but very serious, illness.
 of digital information, reliability of information resources (1) The data and information assets of an organization, department or unit. See data administration.

(2) Another name for the Information Systems (IS) or Information Technology (IT) department. See IT.
 and services, interoperability, security, and metadata.

The current direction of the project is toward developing techniques for monitoring the integrity of distributed Web-based information resources and enforcing preservation policies set by the owners and users of collections. Monitoring resources will involve both the automated capture of information using a specialized Web crawler and the manual gathering of data on the organizational status of particular resources and collections. The ultimate objective is to develop a cost-effective and event-based metadata scheme that will enable users to define preservation policies and enforce them automatically.

3.10 Canadian Projects

E-preservation (14) was developed through a cooperative effort between the National Library of Canada and the Canadian Initiative on Digital Libraries (CIDL CIDL Canadian Initiative on Digital Libraries
CIDL Configuration Item Data List
). E-preservation is intended 1) to provide Canadians with easy access to policies and 2) to perform research on the creation, use, and preservation of digital collections. The project includes guidelines about various aspects including acquiring digital materials, formats, and metadata.

3.11 Preservation Projects at the National Institute of Standards and Technology National Institute of Standards and Technology, governmental agency within the U.S. Dept. of Commerce with the mission of "working with industry to develop and apply technology, measurements, and standards" in the national interest.  (NIST (National Institute of Standards & Technology, Washington, DC, www.nist.gov) The standards-defining agency of the U.S. government, formerly the National Bureau of Standards. It is one of three agencies that fall under the Technology Administration (www.technology. )

The earliest work on data preservation at NIST can be traced back to the 1980s when Podio performed research on the lifetime measurement of compact discs [51]. His work provided a basis for a standard methodology for the lifetime measurement of optical discs. With the increasing usage of digital storage in libraries and the archiving of the government agencies, the great importance of digital preservation became clear to NIST's Information Technology Laboratory, (15) and accordingly new projects on the study of digital data preservation have started in the following aspects:

* Longevity testing. This project initially consisted of an examination of the effects of heat and humidity on the lifetime of optical discs and was later extended to include the effects of light exposure. The focus is not only on the lifetime itself, but also on the deterioration process. The results may he useful both for new disc production and for the classification of existing recorded discs.

* Testing of interchangeability and interoperability of optical discs for use in high-density storage systems such as optical disc "Jukeboxes." Combined with the application and further development of XML, a new preservation strategy may be developed. This program is being conducted in collaboration with the High Density Storage Association (HDSA HDSA Huntington's Disease Society of America
HDSA Historically Disadvantaged South Africans
). (16) An open testing laboratory is being developed that will include interoperability and interchangeability testing as well as testing of the suitability of various types of high capacity storage systems for different applications including preservation.

* Development of the Turbo coding system Noun 1. coding system - a system of signals used to represent letters or numbers in transmitting messages
code - a coding system used for transmitting messages requiring brevity or secrecy
 [52]. Unlike the traditional digital preservation techniques that aim to keep the information readable in the long term, this new technique aims to develop a method for finding and recovering useful information from failed discs. The failure of many commercial error correction codes Noun 1. error correction code - (telecommunication) a coding system that incorporates extra parity bits in order to detect errors
ECC

telecommunication - (often plural) the branch of electrical engineering concerned with the technology of electronic
 to support retrieval of important information from aging or damaged digital storage devices has indirectly resulted in the reduction of the rated life expectancy Life Expectancy

1. The age until which a person is expected to live.

2. The remaining number of years an individual is expected to live, based on IRS issued life expectancy tables.
 and archival properties of digital storage devices. Turbo code A type of channel coding that uses a convolutional code and a type of Viterbi decoder that outputs a continuous value rather than a 0 or 1. See convolutional code and Viterbi decoder.  has shown promise in retrieving information previously inaccessible. Turbo codes are two parallel, recursive See recursion.

recursive - recursion
, and systematic convolution convolution /con·vo·lu·tion/ (-loo´shun) a tortuous irregularity or elevation caused by the infolding of a structure upon itself.  codes. These codes are used for the channel coding A way of encoding data in a communications channel that adds patterns of redundancy into the transmission path in order to lower the error rate. Such methods are widely used in wireless communications. See convolutional code and Viterbi decoder.  and decoding in order to detect and correct the errors that may occur in the transmission of digital data through different channels. The iterative method In computational mathematics, an iterative method attempts to solve a problem (for example an equation or system of equations) by finding successive approximations to the solution starting from an initial guess.  of the decoding scheme helps to achieve the theoretical limit (near Shannon-limit [53]) in error correction performance. This progr am is being conducted in cooperation with Carnegie-Mellon University.

4. Discussion and Summary

In this paper, three main preservation techniques have been discussed in detail. Each of them has advantages and disadvantages as shown in Table 1. Different strategies are viable for different types of digital materials. In cases of complex resources and application softwares such as games and executable files, emulation is a suitable approach. In particular, emulation is the most feasible choice where there is a lack of sufficient knowledge regarding the format of the digital information and where the look and feel of digital information is important.

On the other hand, migration or encapsulation is appropriate for digital resources where knowledge of the format is sufficient and where the resource is relatively simple. Specifically, migration is appropriate for resources that are actively accessed and managed. Encapsulation is suitable for resources that are unlikely to be actively accessed. Fig. 4 shows a schematic diagram to select the suitable preservation techniques for digital resources according to the type and complexity of digital information, the availability of the data format, and its usage.

As mentioned before, the objective of long-term digital preservation is to ensure continuing access to stored digital information. Future users will be able to access digital information, which will be preserved by digital libraries, through their own computing environments or portable reading devices. This means that the digital information may have to be migrated for long-term digital preservation even if some emulation techniques have already been applied. Therefore, long-term preservation may involve several preservation techniques. In other words Adv. 1. in other words - otherwise stated; "in other words, we are broke"
put differently
, for successful preservation, combined application of different strategies should be taken into account. These techniques are not mutually exclusive Adj. 1. mutually exclusive - unable to be both true at the same time
contradictory

incompatible - not compatible; "incompatible personalities"; "incompatible colors"
, that is, one approach may include some aspects of other approaches.

The preservation of physical artifacts artifacts

see specimen artifacts.
 generally occurs through developing and maintaining collections in a decentralized de·cen·tral·ize  
v. de·cen·tral·ized, de·cen·tral·iz·ing, de·cen·tral·iz·es

v.tr.
1. To distribute the administrative functions or powers of (a central authority) among several local authorities.
 manner. By contrast, the preservation of digital resources does not lend itself well to such a process and no single institution can preserve all digital information. Fortunately, the Internet enables digital libraries worldwide to communicate and share information online. For this to happen successfully, the interoperability of archives and digital library systems and the interchangeability of preserved digital information are necessary.

Standard formats for preservation of digital information should be developed. Standards can provide optimum interoperability and interchangeability, well-constructed tool sets for developers, and solutions for complex systems. Furthermore, standard solutions are likely to be available for use for far longer than nonstandard solutions. Migration from standard formats is much easier, cheaper, and more accurate that migration from non-standard formats.

Standards are required for both emulation and encapsulation. Conventional standards were based on various kinds of encoding See encode.  schemes according to the types of digital resources, with the result that quite a few software packages support these standards. Meanwhile, XML is being adopted as a standard in various fields due to the recognition of its advantages, including its independence of hardware and software, and due to its wide dissemination. Although XML is a relatively new technology for the Web, SGML SGML
 in full Standard Generalized Markup Language

Markup language for organizing and tagging elements of a document, including headings, paragraphs, tables, and graphics.
 (Standard Generalized Markup Language (language, text) Standard Generalized Markup Language - (SGML) A generic markup language for representing documents. SGML is an International Standard that describes the relationship between a document's content and its structure. ) [54], which is a superset A group of commands or functions that exceed the capabilities of the original specification. Software or hardware components designed for the original specification will also operate with the superset product. However, components designed for the superset will not work with the original.  of XML, is a proven technology that has been tested and evaluated since 1986, when it was accepted as an international standard. Furthermore, standards based on XML will facilitate support for the Internet and Web. For example, a standard [55] for digital representation of paper-based books is based on XML.

Since the use of XML is expected to grow wider in the areas of interoperability and interchangeability, preservation standards based on XML are needed. A single standard format is desirable when implementing a preservation system.

The most successful preservation strategies will contain elements of migration based on standardization. While techniques for the migration of digital resources comprised of simple data appear to be widely accepted and followed, the preservation community is only beginning to address migration of more complex digital objects. Additional research on migration is needed to test the technical feasibility of various approaches to migration, determine the costs associated with these approaches, and establish benchmarks and best practices.

Furthermore, a test bed is needed in which specific migration techniques can be prototyped. This test bed should be used to quantitatively test the performance of migration tools that are currently available on the market and new tools being developed. Results from this test bed shall give good guidelines for choosing appropriate tools to migrate digital resources that are in different formats. For this purpose, systematic categorization of digital resources must be developed.

Other critical technical issues must be resolved for preserving digital information. They include requirements relating to relating to relate prepconcernant

relating to relate prepbezüglich +gen, mit Bezug auf +acc 
 metadata, authenticity and integrity, cost modeling, content and structure, format and styles, storage media, and workflow process. Brief descriptions of some critical issues are as follows:

Metadata

The metadata for digital preservation is critical but complex. Metadata types include descriptive metadata for resource discovery, administrative metadata for the preservation process itself, technical metadata, and rights metadata to describe copyright information [3]. Metadata must enable access to the intellectual content of the object (whether by migration or emulation), find the object, manage the object, and allow other versions of the object to be produced. There have been several specifications for preservation metadata by CEDARS [56], the National Archives and Library of Australia [57,58], and NEDLIB [59]. Recently, collaborative efforts between RLG and the Online Computer Library Center (OCLC OCLC - Online Computer Library Center ) have produced a report [60] identifying common goals and approaches to these digital preservation metadata. The metadata should cover the requirements of various types of organizations and be standardized standardized

pertaining to data that have been submitted to standardization procedures.


standardized morbidity rate
see morbidity rate.

standardized mortality rate
see mortality rate.
 for interoperability. Efforts to develop tools for automatically creating metadata are needed.

Interperability and Interchangeability

As mentioned before, the interoperability of archives and digital libraries as well as the interchangeability of digital information are important issues, particularly in a distributed environment such as the Internet. Standard formats and procedures should therefore be developed.

Authenticity and Integrity

Authentication (1) Verifying the integrity of a transmitted message. See message integrity, e-mail authentication and MAC.

(2) Verifying the identity of a user logging into a network.
 allows the user to be certain of the originality of the digital resource when it is needed. Although there are several techniques used for authentication including cryptography, hashing Creating hash totals or hash tables. See hash total and hash table.

hashing - hash coding
, and time stamping time stamping

The stamping of order tickets with the time of entry and execution. For example, options exchanges require stamping of order tickets with the times of execution to the nearest minute.
 [15], there are multiple meanings and implications of authenticity and integrity. Therefore, creating a common understanding of authenticity and integrity is critical in the digital environment [61]. Dollar [62] identifies maintaining the authenticity of a digital object as a key aim in the migration process.

Cost Model

Decisions about preserving information should consider the costs [6]. However, there are no proven techniques for estimating the costs of long-term preservation of digital information. Recently, Russell and Weinberger [63] postulate postulate: see axiom.  that the ongoing costs of digital preservation span a more extended timeframe than traditional preservation and will therefore require resource commitments of a different nature. Different strategies may necessitate ne·ces·si·tate  
tr.v. ne·ces·si·tat·ed, ne·ces·si·tat·ing, ne·ces·si·tates
1. To make necessary or unavoidable.

2. To require or compel.
 different costing timeframes and schedules. They state that current cost models have yet to reflect this more complex environment.

5. Appendix A. Glossary

The following is a list of important words that are used in this paper. Brief definitions of these words are summarized here for readers' convenience. For more specific definitions refer to the URLs.

Emulation

The process of setting up a system to perform in the same way as another system of a different type in order to run its programs.

<http://www.nla.gov.au/download/dsp/appendices.pdf>

Encapsulation

A technique of grouping together a digital object and anything else necessary to provide access to that object.

http://www.nla.gov.au/padi/topics/20.html>

Migration

The periodic transfer of digital materials from one hardware/software configuration to another or from one generation of computer technology to a subsequent generation. The purpose of migration is to preserve the integrity of digital objects and to retain the ability for clients to retrieve, display, and otherwise use them in the face of constantly changing technology.

<http://info.wgbh.org/upf/glossary.html>

Open Archiving Information System (OATS)

An archive, consisting of an organization of people and systems, that has accepted responsibility to preserve information and make it available for one or more designated communities.

<http://www.kb.n1/coop/nedlib/glossary.pdf>

Refreshing

To copy digital information from one long-term storage medium to another of the same type, with no change whatsoever in the bit-stream (e.g. from a decaying 800bpi tape to a new 800bpi tape, or from an older 5 1/4" floppy to a new 5 1/4" floppy).

<http://info.wgbh.org/upf/glossary.html>

XML (eXtensible Markup Language)

Subset of SGML designed to be transmissible transmissible /trans·mis·si·ble/ (trans-mis´i-b'l) capable of being transmitted.

trans·mis·si·ble
adj.
Capable of being conveyed from one person to another.
 over the Internet in such a way that document browsers do not need to access the document type definition to validate the document before display. As well as requiring all elements to be "well-formed," e.g., have both start and end tags end tag - tag  present, the specification provides XML specific attributes and processing instructions that can be used to control the way document are presented to users.

<http://www.w3.org/TR/PR-xml.html>

[FIGURE 1 OMITTED]

[FIGURE 2 OMITTED]

[FIGURE 3 OMITTED]

[FIGURE 4 OMITTED]
Table 1

Advantages and disadvantages of preservation techniques

Technique      Advantages               Disadvantages

Emulation      Maintains the look       The complexity of creating
               and feel.                emulator specifications.


                                        The large amount of
                                        information that must be
                                        preserved.


                                        Archaic software required
                                        to access information.





Migration      Does not need to retain  Significant cost for long-
               original applications.   term preservation.


               Supports active access   Information degradation.
               and management.

                                        Lack of preservation metadata.

                                        Need for continued diligence
                                        on the part of archivists.

Encapsulation  Maintains preservation   Knowledge about the format
               information.             must be preserved.


                                        Systems required for capturing
                                        the digital information.

Technique      Domain

Emulation      Application software.

               Complex digital resources
               such as those that contain
               executable files.

               Resources for which there
               is a lack of sufficient
               knowledge.

               Resources for which the
               value is unknown and for
               which future use is
               unlikely. Resources whose
               look and feel are important.

Migration      Resources that are actively
               accessed and managed, such as
               scientific data or database.

               Resources whose formats are
               sufficiently well known.






Encapsulation  Resources that are unlikely
               to be accessed and managed
               actively.

               Resources whose formats are
               sufficiently well-known.


Acknowledgments

The authors would like to thank Mr. John Roberts, Dr. Albert Paul, and Dr. Theodore Vorburger for their helpful and constructive comments and suggestions.

Accepted: November 29, 2001

(1.) http://www.konbib.nl/nedlib/

(2.) http://www.ccsds.org/

(3.) http://www.si.umich.edu/CAMILEON/

(4.) http://www.leeds.ac.uk/cedars/

(5.) Certain commercial equipment, instruments, or materials are identified in this paper to foster understanding. Such identification does not imply recommendation or endorsement by the National Institute of Standards and Technology, nor does it imply that the materials or equipment identified are necessarily the best available for the purpose.

(6.) http://www.prov.vic.gov.au/vers/

(7.) http://www.nla.gov.au/policy/plan/pandora.html

(8.) http://kulturarw3.kb.se/html/projectdescription.html

(9.) http://www.archive.org/

(10.) http://sunsite.berkeley.edu/

(11.) http://www.sdsc.edu/NARA/

(12.) http://www.interPARES.org/

(13.) http://www.prism.cornell.edu/

(14.) http://www.nlc-bnc.ca/cidl/preserv-conserv/index.htm

(15.) http://www.itl.nist.gov/

(16.) http://www.hidensity.org/

6. References

(1.) D. Woodyard, Digital Preservation: The Australian Experience, Proc. Third Conf. Digital Library: Positioning the Fountain of Knowledge, Malaysia (2000), http://www.nla.gov.au/nla/staffpaper/dw00l004.html.

(2.) D. M. Levy, Heroic Measure: Reflections on the Possibility and Purpose of Digital Preservation, Proc. Conf. ACM (Association for Computing Machinery, New York, www.acm.org) A membership organization founded in 1947 dedicated to advancing the arts and sciences of information processing. In addition to awards and publications, ACM also maintains special interest groups (SIGs) in the computer field.  Digital Libraries, Pittsburgh (1998) pp. 152-161.

(3.) K, Russell, Digital Preservation and the CEDARS Project Experience, Proc. Int'l Conf. Preservation and Long Term Accessibility of Digital Materials, York, England (2000) pp. 139-154.

(4.) S-S S-S Surface-to-Surface
S-S Space to Space
. Chen, The Paradox of Digital Preservation, IEEE (Institute of Electrical and Electronics Engineers, New York, www.ieee.org) A membership organization that includes engineers, scientists and students in electronics and allied fields.  Computer, 34 (3), 24-28 (2001).

(5.) J. Garrett and D. Waters, Preserving Digital Information, Report of the Task Force on Archiving of Digital Information, The Commission on Preservation and Access and The Research Libraries Group, Washington DC and Mountain View CA, May 1996, 64 pp.

(6.) T. Hendley, Comparison of Methods & Costs of Digital Preservation, British Library British Library, national library of Great Britain, located in London. Long a part of the British Museum, the library collection originated in 1753 when the government purchased the Harleian Library, the library of Sir Robert Bruce Cotton, and groups of manuscripts.  Research and Innovation Report 106, British Library Research and Innovation Center, West Yorkshire West Yorkshire, former metropolitan county, N central England. Created in the 1974 local government reorganization, the county largely embraced the Leeds conurbation and comprised five metropolitan districts: Calderdale, Bradford, Leeds, Wakefield, and Kirklees.  (1998) 121 pp.

(7.) M. Fresko and K. Tombs, Digital Preservation Guidelines, European Commission European Commission, branch of the governing body of the European Union (EU) invested with executive and some legislative powers. Located in Brussels, Belgium, it was founded in 1967 when the three treaty organizations comprising what was then the European Community  DG XIII/E, Luxembourg (1998) 69 pp.

(8.) D. Woodyard, Farewell My Floppy: A Strategy for Migration of Digital Information, Proc. Conf. VALA VALA Victorian Association for Library Automation (Australia)
VALA Vancouver Association of Legal Administrators
, Melbourne (1998), http://www.nla.gov.au/nla/staffpaper/valadw.html.

(9.) W3C Recommendation A W3C Recommendation is the final stage of a ratification process of the World Wide Web Consortium (W3C) working group concerning the standard. It is the equivalent of a published standard in many other industries. , Extensible Markup Language (XML) 1.0 (Second Edition), World Wide Web Consortium (2000) http://www.w3c (World Wide Web Consortium, www.w3.org) An international industry consortium founded in 1994 by Tim Berners-Lee to develop standards for the Web. It is hosted in the U.S. by the Computer Science and Artificial Intelligence Laboratory (CSAIL) at MIT (www.csail.mit.edu/index.php). .org/TR/REC-xml.

(10.) J. Rothenberg, Ensuring the Longevity of Digital Documents, Sci. Amer. 272 (1),42-47 (1995).

(11.) E. H. Zwaneveld, Standards and New Technology Strategies to Preserve Content on Magnetic and Disc Media, Point of View, SMPTE (Society of Motion Picture and Television Engineers, White Plains, NY, www.smpte.org) A professional society for motion picture and TV engineers with more than 9,000 members worldwide. It prepares standards and documentation for TV production.  J.109, 628-635 (2000).

(12.) A. Waugh et al., Preserving Digital Information Forever, Proc. Conf. ACM Digital Libraries, San Antonio San Antonio (săn ăntō`nēō, əntōn`), city (1990 pop. 935,933), seat of Bexar co., S central Tex., at the source of the San Antonio River; inc. 1837.  (2000) pp. 175-184.

(13.) D. Bearman, Collecting Software: A New Challenge for Archives & Museums, Technical Reports 1 (2). Archival Informatics Archival informatics is the study of systems and technology in archives and how they have affected the nature and use of archives.  (1987) 80 pp.

(14.) P.S. Graham, Intellectual Preservation in the Electronic Environment, After the Electronic Revolution, Will You Be the First to Go?, Arnold Hirshon, ed., American Library Association American Library Association, founded 1876, organization whose purpose is to increase the usefulness of books through the improvement and extension of library services. , Chicago (1993).

(15.) J. Rothenberg, An Experiment in Using Emulation to Preserve Digital Publications, Koninklijke Bibliotheek Koninklijke Bibliotheek (Royal Library in English) can stand for:
  • Royal Library of Belgium, the national library in Brussels, in the Kingdom of Belgium.
  • Dutch Royal Library, the national library in The Hague, in the Kingdom of the Netherlands.
, Hague, Netherlands (2000) 85 pp.

(16.) S. Granger, Emulation as a Digital Preservation Strategy, D-Lib Mag. 6 (10), (2000), http://www.dlib.org/dlib/october00/granger/10granger.html.

(17.) S. Gilheany, Preserving Information Forever and a Call for Emulators, Proc. Digital Libraries Conference and Exhibition: The Digital Era: Implications, Challenges and Issues, Singapore (1998), http://www.archivebuilders.com/aba0l0.html.

(18.) J. Rothenberg, Avoiding Technological Quicksand quicksand

State in which water-saturated sand loses its supporting capacity and acquires the characteristics of a liquid. Quicksand is usually found in a hollow at the mouth of a large river or along a flat stretch of stream or beach where pools of water become partly filled
: Finding a Viable Technical Foundation for Digital Preservation (A Report to the Council on Library and Information Resources), Council on Library and Information Resources, Washington, DC (1999), http://www.clir.org/pubs/reports/rothenberg/contents.html.

(19.) T. Werf-Davelaar, Long-term Preservation of Electronic Publications, D-Lib Mag. 5 (9), (1999), http://www.dlib.org/dlib/september99/vanderwerf/09vanderwerf.html.

(20.) D. Holdsworth and P. Wheatley, Emulation, Preservation and Abstraction (2000), http://www.leeds.ac.uk/CAMiLEON/dh/ep5.html.

(21.) D. Holdsworth, Emulation: C-ing Ahead (2000), http://129.11.152.25/CAMiLEDN/dh/cingahd.html.

(22.) Reference Model for an Open Archival Information System An Open Archival Information System (or OAIS) is an archive, consisting of an organization of people and systems, that has accepted the responsibility to preserve information and make it available for a Designated Community.  (OAIS), CCSDS 650.0-R-1, Consultative Committee for Space Data Systems, May 1999, 140 pp, http://www.ccsds.org/documents/pdf/CCSDS- 650.0-R-1.pdf.

(23.) F. L. Eaton, The National Archives and Electronic Records for Preservation, Preservation of Electronic Formats & Formats for Electronic Preservation, Janice Mohlhenrich, ed., Ft. Atkinson, Highsmith (1993).

(24.) D. Bearman, The Implications of Armstrong vs. Executive Office of the President for the Archival Management of Electronic Records, Amer. Archivist ARCHIVIST. One to whose care the archives have been confided.  56, 150-160 (1993).

(25.) P. Wheatley, Migration--A CAMiLEON Discussion Paper (2000), http://www.personal.leeds.ac.uk/-issprw/camileon/migration.htm.

(26.) D. Bearman, Reality and Chimeras in the Preservation of Electronic Records, D-Lib Mag. 5 (4), (1999), http://www.dlib.org/dlib/april99/bearman/04bearman.html.

(27.) G. W Lawrence et al., Risk Management of Digital Information: A File Format Investigation, The Council on Library and Information Resources, Washington DC, June 2000.

(28.) J. Harris and I. Ruben, Bento Specification, Apple Computer Inc. July 1993, http://info.wgbh.org/upf/pdfs/BentoSpec1_0d5.pdf.

(29.) O. Y. Rieger, Project PRISM: Preservation Metadata Research, Proc. Conf. Information Infrastructures for Digital Preservation, York, England (2000), http://www.rlg.org/events/pres-2000/infopapers.html.

(30.) M. Day, Issues and Approaches to Preservation Metadata, Proc. RLG/NPO Conf. Guidelines for Digital Imaging, University of Warwick In the 1960s and 1970s, Warwick had a reputation as a politically radical institution.[3] More recently, the University has been seen as a favoured institution of the British New Labour government.  (1998), http://www.rlg.org/preserv/joint/day.html.

(31.) T. Shepard and D. MacCan, The Universal Preservation Format: Background and Fundamentals, Proc. Sixth Delos Workshop, Tomar, Portugal (1998), http://www.ercim.org/publication/ws-proceedings/DELOS6/upf.pdf.

(32.) D. MacCarn, Toward a Universal Data Format for the Preservation of Media, SMPTE J. 196, 477 (1997).

(33.) T. Shepard and D. MacCarn, The Universal Preservation Format: A Recommended Practice for Archiving Media and Electronic Records, WGBH Educational Foundation, Boston (1999).

(34.) A. R. Heminger and S. B. Robertson, Digital Rosetta Stone: A Conceptual Model for Maintaining Long-term Access to Digital Documents, Proc. Sixth Delos Workshop, Tomar, Portugal (1998), http://www.ercim.org/publication/ws-proceedings/DELOS6/rosetta.pdf.

(35.) T. Shepard, Universal Preservation Format (UPF): Conceptual Framework For the concept in aesthetics and art criticism, see .

A conceptual framework is used in research to outline possible courses of action or to present a preferred approach to a system analysis project.
, RLG DigiNews 2 (6), (1995), http://www.rlg.org/preserv/diginews/diginews2-6.html.

(36.) D. A. Kranch, Preserving Electronic Documents, Proc. Conf. ACM Digital Libraries, Pittsburgh (1998) pp. 295-296.

(37.) D. A. Kranch, Beyond Migration: Preserving Electronic Documents with Digital Tablets, Inform. Technol. Libraries 17 (3), 138-148 (1998).

(38.) M. V. Cloonan and S. Sanett, Comparing Preservation Strategies and Practices for Electronic Records, Proc. Int'l Conf. Preservation and Long Term Accessibility of Digital Materials, York, England (2000), http://www.rlg.org/events/pres-2000/cloonan.html.

(39.) A. R. Kenney and O. Y. Rieger, The National Library of Australia's Digital Preservation Agenda: An Interview with Colin Webb, RLG DigiNews 5 (1), (2001), http://www.rlg.org/preserv/diginews/diginews5-1.html.

(40.) PROS 99/007 Standard for the Management of Electronic Records, Public Record Office Victoria (1999), http://www.prov.vic.gov.au/gservice/standard/pros9907.htm.

(41.) Victorian Electronic Records Strategy Final Report, Public Records Office Victoria (1999), http://www.prov.vic.gov.au/vers/final.htm.

(42.) M. E. Phillips, Ensuring Long-term Access to Online Publications, J. Electron. Publ. 4 (4), (1999), http://www.press.umich.edu/jep/04-04/phillips.html.

(43.) C. Webb, Towards a Preserved National Collection of Selected Australian Digital Publications, Proc. Int'l Conf. Preservation and Long Term Accessibility of Digital Materials, York, England (2000), http://www.rlg.org/events/pres-2000/webb.html.

(44.) W3C Recommendation, Hypertext Markup Language (hypertext, World-Wide Web, standard) Hypertext Markup Language - (HTML) A hypertext document format used on the World-Wide Web. HTML is built on top of SGML. "Tags" are embedded in the text. A tag consists of a "<", a "directive" (in lower case), zero or more parameters and a ">".  (HTML) 4.01, World Wide Web Consortium (1999), http://www.w3.org/TR/html4/.

(45.) Electronic Records: Preservation and Migration of Electronic Records, National Archives of Australia (2000), http:// www.naa.gov.au/recordkeeping/er/summary.html#preservation.

(46.) A. S. Inouye, A Digital Strategy for the Library of Congress, Commun. ACM 44 (5), 43 (2001).

(47.) Committee on an Information Technology Strategy for the Library of Congress, LC21: A Digital Strategy for the Library of Congress, National Academy Press, Washington, DC (2001).

(48.) R. Moore et al., Collection-Based Persistent Digital Archives: Part 1, D-Lib Mag. 6 (3), (2000), http://www.dlib.org/dlib/march00/moore/03moore-pt1.html.

(49.) R. Moore et al., Collection-Based Persistent Digital Archives: Part 2, D-Lib Mag. 6 (4), (2000), http://www.dlib.org/dlib/april00/moore/04moore-pt2.html.

(50.) L. Duranti, The Impact of Technological Change on Archival Theory, Proc. Int'l Congress of Archives, Seville (2000), http://www.interpares.org/documents/Id_sevilla_2000.pdf.

(51.) F. Podio, Development of a Testing Methodology to Predict Optical Disk Life Expectancy Values, NIST Special Publication 500-200, NIST, Gaithersburg, MD (1991).

(52.) V. K. Bhagavatula, Turbo Coding for Data Storage, Carnegie Mellon University Carnegie Mellon University, at Pittsburgh, Pa.; est. 1967 through the merger of the Carnegie Institute of Technology (founded 1900, opened 1905) and the Mellon Institute of Industrial Research (founded 1913).  (2001), http://amp.ece.cmu.edu/ECESeminar/slides/kumar/kumar.pdf.

(53.) C. E. Shannon, A Mathematical Theory of Communication The article entitled "A Mathematical Theory of Communication", published in 1948 by mathematician Claude E. Shannon, was one of the founding works of the field of information theory. , Bell Sys. Tech. J. 27, 379-423 (1948); ibid. 27, 623-656 (1948).

(54.) ISO/IEC ISO/IEC International Organization for Standardization/International Electrotechnical Commission (ITU-T M 3000)  8879, Information Processing--Text and Office Systems-Standard Generalized Markup Language markup language

Standard text-encoding system consisting of a set of symbols inserted in a text document to control its structure, formatting, or the relationship among its parts. The most widely used markup languages are SGML, HTML, and XML.
 (SGML), International Organization for Standardization International Organization for Standardization (ISO)

Organization for determining standards in most technical and nontechnical fields. Founded in Geneva in 1947, its membership includes more than 100 countries.
 (1986).

(55.) OEB See Open eBook.  Publication Structure 1.0, Open eBook An XML-based standard for electronic books and Web publishing from the International Digital Publishing Forum (www.idpf.org). Introduced in 1999 and officially known as the "Open eBook Publication Structure Specification" (OEBPS), Open eBook publications are not read directly by an e-book  Forum, 1999, http://www.openebook.org/OEB1.html.

(56.) The CEDARS Project Team and UKOLN UKOLN United Kingdom Office for Library and Information Networking , Metadata for Digital Preservation: The CEDARS Project Outline Specification, March 2000, http://www.leeds.ac.uk/cedars/MD-STR~5.pdf.

(57.) National Archives of Australia, Recordkeeping Metadata Standard for Commonwealth Agencies Version 1.0, May 1999, http://www.naa.gov.au/govserv/TECHPUB/rkms/intro.htm.

(58.) National Library of Australia, Preservation Metadata for Digital Collections: Exposure Draft, October 1999, http://www.nla.gov.au/preserve/pmeta.html.

(59.) C. Lupovici and J. Masanes, Metadata for Long Term Preservation, July 2000, http://www.kb.nl/coop/nedlib/results/preservationmetadata.pdf.

(60.) OCLC/RLG Working Group on Preservation Metadata, Preservation Metadata for Digital Objects: A Review of the State of the Art, January 2001, http://www.oclc.org/difitalpreservation/presmeta_wp.pdf.

(61.) C. T. Cullen et al., Authenticity in a Digital Environement, The Council on Library and Information Resources, Washington, DC, May 2000, 84 pp.

(62.) C. Dollar, Authentic Electronic Records: Strategies for Long-Term Access, Cohasset Associates, Chicago (1998).

(63.) K. Russell and E. Weinberger, Cost Elements of Digital Preservation (2000), http://www.leeds.ac.uk/cedars/documents/CIW01r.html.

About the authors: Kyong-Ho Lee, Oliver Slattery, and Richang Lu are guest researchers in the Convergent Information Systems Division of the NIST Information Technology Laboratory. Xiao Tang tang, in zoology
tang: see butterfly fish.
 is the Group Leader of the Information Storage and Integrated Systems Group within the Convergent Information Systems Division. Victor McCrary is the Chief of the Convergent Information Systems Division. The National Institute of Standards and Technology is an agency of the Technology Administration, U.S. Department of Commerce.
COPYRIGHT 2002 National Institute of Standards and Technology
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2002, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.

 Reader Opinion

Title:

Comment:



 

Article Details
Printer friendly Cite/link Email Feedback
Author:McCrary, Victor
Publication:Journal of Research of the National Institute of Standards and Technology
Geographic Code:1USA
Date:Jan 1, 2002
Words:9366
Previous Article:Quantitating fluorescence intensity from fluorophore: the definition of MESF assignment.
Next Article:Treasure of the past X.
Topics:



Related Articles
Video preservation: a report from the trenches.
Archives and records management in the Netherlands.(International)
LANGLOIS FOUNDATION GRANTS.(The Daniel Langlois Foundation for Art, Science, and Technology)(Brief Article)
Digital Preservation: A Global Information Management Problem.
Digital Preservation in the United Kingdom.(Industry Overview)
The digital future: a look ahead: information management professionals will find new challenges, strategies, and approaches in store with digital...
Introduction.
Practical preservation: the PREMIS experience.(Preservation Metadata: Implementation Strategies)
Building preservation partnerships: the Library of Congress National Digital Information Infrastructure and Preservation Program.
Past becomes future: a Brooklyn high school lays the foundation for the future of students by teaching them to value the past.(CAREER/VOCATION)

Terms of use | Copyright © 2009 Farlex, Inc. | Feedback | For webmasters | Submit articles