Printer Friendly
The Free Library
14,497,001 articles and books
Member login
User name  
Password 
 
Join us Forgot password?

Preserving the last copy: building a long-term digital archive.


The problem of preserving digital records for long-term access seems straightforward but requires careful consideration about process and technology. Until today there has been no storage platform that can be trusted to store the last existing copy of a critical electronic record. Preserving digital information is more difficult than preserving records on materials such as paper or film. The sheer volume and the volatility introduced by digital demand new software architecture capable of scaling and of preventing accidental changes to the records. Procedures need to be put in place to identify, classify clas·si·fy  
tr.v. clas·si·fied, clas·si·fy·ing, clas·si·fies
1. To arrange or organize according to class or category.

2. To designate (a document, for example) as confidential, secret, or top secret.
, move, evolve, access and occasionally dispose of dis·pose  
v. dis·posed, dis·pos·ing, dis·pos·es

v.tr.
1. To place or set in a particular order; arrange.

2.
 digital records. Library science and traditional archival practice provide an extensive body of knowledge that can be leveraged with technology to create a true modern archive.

IT departments have never thought of themselves as performing library functions but now find themselves in the role of traditional archivists having to guarantee not only backups but the long-term access to information. The risks associated with being unable to produce critical records during litigation An action brought in court to enforce a particular right. The act or process of bringing a lawsuit in and of itself; a judicial contest; any dispute.

When a person begins a civil lawsuit, the person enters into a process called litigation.
 have put pressure on IT to deliver continuous access to historical information. The new role of compliance officer has been created to integrate the legal and technology responsibilities.

This article should help IT managers and compliance officers navigate (1) "Surfing the Web." To move from page to page on the Web.

(2) To move through the menu structure in a software application.
 this complex new field. It discusses the practical considerations necessary to create a digital archive. It combines traditional concepts in library science with new software architecture to enable a modern archive that guarantees the long-term preservation and access to digital records.

Archives Matter

Archives are repositories for organizational records that are no longer in use but that may need to be accessed in the future. These are not working documents that can be modified. They are fixed content files that have become records and that should not be modified. The primary goal of a digital archive is to preserve these records from change. But since saving information and later being unable to access it renders the archive meaningless, an archive also needs to provide the necessary means to find and retrieve the records it is preserving.

Forget Tape. Keep Everything Online

In IT we have come to think of an archive as a tape library that stores backups. The tapes are stored in robotic ro·bot·ic
adj.
Relating to, characteristic of, or employing robots.
 tape silos and later trucked to a warehouse. The shortcomings A shortcoming is a character flaw.

Shortcomings may also be:
  • Shortcomings (SATC episode), an episode of the television series Sex and the City
 of dropping everything into tape become apparent when someone needs to access a file that has been offline (i.e., at the warehouse) for some time. The moment a digital record leaves a computer connected via Ethernet to the corporate network, it begins a much riskier life as a physical asset. The process of recovering a document stored in a tape cartridge See cartridge. , buried bur·y  
tr.v. bur·ied, bur·y·ing, bur·ies
1. To place in the ground: bury a bone.

2.
a. To place (a corpse) in a grave, a tomb, or the sea; inter.

b.
 under a stack of cartridges
  • List of rifle cartridges
  • List of handgun cartridges
  • Table of pistol and rifle cartridges
  • List of cartridges by caliber
, can be compared to throwing every record into the Grand Canyon Grand Canyon, great gorge of the Colorado River, one of the natural wonders of the world; c.1 mi (1.6 km) deep, from 4 to 18 mi (6.4–29 km) wide, and 217 mi (349 km) long, NW Ariz.  in the hope that because we threw everything in there, one day we may be able to recover what we want.

The simplest and most cost-effective means of providing continuous long-term access to digital records is to keep them online. Files on the network are easier to track, faster to index and less cumbersome cum·ber·some  
adj.
1. Difficult to handle because of weight or bulk. See Synonyms at heavy.

2. Troublesome or onerous.



cum
 to migrate than files stored to tape. And while digital files are generally considered a risky physical asset, even while still on the network, because their potential for corruption is high, the technology exists today to allow organizations to create cost-effective, reliable, usable USable is a special idea contest to transfer US American ideas into practice in Germany. USable is initiated by the German Körber-Stiftung (foundation Körber). It is doted with 150,000 Euro and awarded every two years.  and practically unlimited online archives. Inexpensive servers connected using Ethernet provide the physical infrastructure. Storage software running on clusters of these servers protects records from change while enabling instant access.

Improve Corporate Governance Corporate Governance

The relationship between all the stakeholders in a company. This includes the shareholders, directors, and management of a company, as defined by the corporate charter, bylaws, formal policy, and rule of law.
 

Archives matter. Any organization suffering the effects of failing to find records during discovery in litigation understands this. Many organizations are opting to create archives to support better corporate governance. Compliance is only one aspect of governance that has driven awareness as to the importance of retaining access to records; there are operational advantages as well. Knowledge workers want to have unlimited inboxes because it helps them get their work done. The ability to quickly find records improves project planning project planning - project management  and decision making. When embarking on a new project, an organization can look at what it has already done in the past. Was this project or a similar one done before? By whom was it done, how and with what results? Records contain the experience and history of an organization, and can help organizations make smarter decisions.

The Lessons from Traditional Archives

Meet the Archivist ARCHIVIST. One to whose care the archives have been confided.  

Traditional archives benefit from having a person or group in charge of the selection, organization and storage of and eventual access to the records being archived. The archivist preserves the records and their context as evidence of what an organization did, and to provide access to the records and their context through printed or online tools, and through reference services. Traditional archives have, over time, developed a number of general theories about the characteristics of archival records and best practices for managing them. An archivist's governing principles are provenance prov·e·nance  
n.
1. Place of origin; derivation.

2. Proof of authenticity or of past ownership. Used of art works and antiques.
 and original order. Provenance establishes that records generated by a person or group be kept together so that their context will be preserved. Original order requires that the sequence in which these records were found be maintained as well.

Traditional archival records are unique. There is never more than one copy of a single record within an archive, and the record can usually be found nowhere but in the archive. Archival records were not originally created for the purpose for which they may be used in the future. The records were created to move the business along, not specifically to give information to workers many years later. This means that they have value as evidence of what the organization did and how it did it as much as they are useful for directly communicating information.

Records are interdependent in·ter·de·pen·dent  
adj.
Mutually dependent: "Today, the mission of one institution can be accomplished only by recognizing that it lives in an interdependent world with conflicts and overlapping interests" 
. Nearly every record is related to other records, and few records can be fully understood without the context of other records. Archival records are kept permanently, and should be handled in such a way that they can be accessed, understood and used 50 or even 100 years from now. An archivist deals mostly with groups of records and will accept a single record only in extraordinary cases. Archivists do not see individual records as the most useful unit of information.

Provenance and Original Order

Records in archives are arranged according to according to
prep.
1. As stated or indicated by; on the authority of: according to historians.

2. In keeping with: according to instructions.

3.
 provenance, a principle that states records of different people or groups should never be mixed. Along with preserving the context of a file, provenance serves two other functions. It maintains the chain of custody The movement and location of physical evidence from the time it is obtained until the time it is presented in court.

Judges in bench trials and jurors in jury trials are obligated to decide cases on the evidence that is presented to them in court.
. The chain of custody names the previous curators for a particular record. When a record's chain of custody is unclear, that record's value as evidence is lessened less·en  
v. less·ened, less·en·ing, less·ens

v.tr.
1. To make less; reduce.

2. Archaic To make little of; belittle.

v.intr.
To become less; decrease.
 considerably.

Retain-ing chain of custody is also an important intellectual property consideration. Without contextual information, much of the record often cannot be understood, or might be misunderstood mis·un·der·stood  
v.
Past tense and past participle of misunderstand.

adj.
1. Incorrectly understood or interpreted.

2.
, particularly after a long time has passed. Provenance also helps in information retrieval information retrieval

Recovery of information, especially in a database stored in a computer. Two main approaches are matching words in the query against the database index (keyword searching) and traversing the database using hypertext or hypermedia links.
, especially over time. The questions that users bring to an archive form a diverse group. Questions will vary from person to person and from year to year. Because researchers come to the archive with so many different perspectives, reorganizing archives into subject-based, format-based or other groupings, which might at first glance seem like the most logical approach, often makes finding records more difficult. A useful subject category one year will not be useful to later researchers who have different research questions. In a traditional archive, records can only be in one place on the shelf--so they can only really be in one context within the archive. Provenance is the best single place for records to be. Provenance has for many years been a central principle of traditional archives.

The second principle of traditional archival organization is original order. This principle dictates that records should be kept arranged in the order in which they were found. Unlike the rule of provenance, the rule of original order can sometimes be broken--for example, if records were not kept in good order, or were kept in no discernible dis·cern·i·ble  
adj.
Perceptible, as by the faculty of vision or the intellect. See Synonyms at perceptible.



dis·cerni·bly adv.
 order. In addition to providing contextual information, original order is an existing system of organization, and saves spending time "Spending Time" is the first single released by Christian artist Stellar Kart.

The lyrics describe the band members desire to spend "more time with God". "Sometimes it’s a real struggle to spend time with God.
 and resources to create a new system that would probably be less helpful. Original order is a universal organizational principle; hence it is the most likely arrangement to be understood in the future.

These two principles, provenance and original order, are central theories in library science. Both have proven their value over time.

The Modern Archive

Digital records present extraordinary opportunities and challenges. Digital allows for perfect copies of the records to be made and, because digital records require no physical space, allows enormous amounts of information to be preserved. The records can also be indexed in various ways simultaneously to ensure instant retrieval of files even in a repository the virtual size of the Grand Canyon. But the challenges are very real. The ones and zeroes representing digital records are inherently unstable. There is no direct access to a digital record; computers must be used to convert the raw data into information. Data formats evolve over time, encumbering our ability to decode (1) To convert coded data back into its original form. Contrast with encode.

(2) Same as decrypt. See cryptography.

(cryptography) decode - To apply decryption.
 old records. These challenges can be overcome. Digital signatures can prevent records from being changed. Clusters of networked computers can replicate rep·li·cate
v.
1. To duplicate, copy, reproduce, or repeat.

2. To reproduce or make an exact copy or copies of genetic material, a cell, or an organism.

n.
A repetition of an experiment or a procedure.
 and distribute the records to ensure that there will always be enough copies and computers to access every record in the repository. Software running inside the cluster can evolve data from legacy formats to new standards while preserving the chain of custody.

[ILLUSTRATION OMITTED]

More Data Than Ever ...

The amount of data that needs to be archived by an organization is exploding. Organizations spend massive amounts of time and energy creating, reading, transmitting, maintaining hardware and software, and otherwise using these records. Much of this data, once its most active life is over, becomes fixed content that has to be stored. In fact, eighty percent of the information in a typical organization is fixed content. IT managers have known for decades that when it comes to digital preservation, we have been writing in the sand. Hardware obsolescence ob·so·les·cent  
adj.
1. Being in the process of passing out of use or usefulness; becoming obsolete.

2. Biology Gradually disappearing; imperfectly or only slightly developed.
, changing data formats and the sheer volume of data all play their part in making the long-term archival of digital records a daunting daunt  
tr.v. daunt·ed, daunt·ing, daunts
To abate the courage of; discourage. See Synonyms at dismay.



[Middle English daunten, from Old French danter, from Latin
 task. Many law firms This list of the world's largest law firms by revenue is taken from The Lawyer and The American Lawyer and is ordered by 2006 revenue:[1]
  1. Clifford Chance, £1,030.2m – International law firm (headquartered in the UK);
  2. Linklaters, £935.
 still print important documents that need to be preserved but organizations today cannot afford to print every record that may be needed in the future.

... Stored Forever

The modern archive is an online repository that preserves and provides long-term access to digital records. It protects records against hardware failure and obsolescence, and user or application errors. It integrates with archive sources in the organization, such as email, document repositories, file systems, etc. It provides multiple means of access to its vast amount of information. A modern archive is more than a storage device: it is an organized collection of records that are being preserved so that they can be accessed and understood tomorrow and a hundred years from now.

The modern archive is a platform that complies with government regulations on electronic record preservation. It is a system that is policy driven, portable and low cost. Policies automate To turn a set of manual steps into an operation that goes by itself. See automation.  the management of digital records. Policies check data integrity, protect records, evolve data formats and can eventually dispose of the information. The policies are different from the mechanisms for storage necessary to implement policies. For instance, retention may be implemented at the logical level in software, at the storage controller or at the media layer, but this should not affect its specification. The mechanisms for storage are the traditional equivalent to bookshelves. The modern archive needs to be portable or hardware/software agnostic ag·nos·tic  
n.
1.
a. One who believes that it is impossible to know whether there is a God.

b. One who is skeptical about the existence of God but does not profess true atheism.

2.
. Hardware agnostic implies that the software required to run the core preservation layer must be portable and not tie the organization to any specific hardware vendor. Software agnostic implies that the digital records being preserved must be able to exist beyond the life of the applications that created them, and that the core archive platform must provide standard gateways to migrate its data and metadata (1) (meta-data) Data that describes other data. The term may refer to detailed compilations such as data dictionaries and repositories that provide a substantial amount of information about each data element. .

The archive should be flexible and make it possible to integrate with the many applications that generate records; however, it should be rigid in requiring that vital information, particularly about provenance, be preserved, so that these carefully maintained records can be understood. Finally, the archive must be low cost. IT managers should avoid being locked to a single vendor, hardware dependencies, proprietary protocols or any other methods by which vendors try to keep the price of storage artificially high. There is just too much data that needs to be archived. Organizations should seek to ride the commodity trend towards low-cost disk and servers.

An IT manager or compliance officer seeking to build a true modern archive should consider the following questions:

* How do the records get in? (Ingestion ingestion /in·ges·tion/ (-chun) the taking of food, drugs, etc., into the body by mouth.

in·ges·tion
n.
1. The act of taking food and drink into the body by the mouth.

2.
)

* How are the records being preserved? (Preservation)

* How does the archive provide access to these records? (Access)

Ingestion

Ingestion offers an archivist the opportunity to control how records are organized, and to impose provenance and original order. Ingestion is divided into appraisal and accession Coming into possession of a right or office; increase; augmentation; addition.

The right to all that one's own property produces, whether that property be movable or immovable; and the right to that which is united to it by accession, either naturally or artificially.
.

During appraisal the modern archivist determines what information sources in the organization should be publishing to the archive. The archivist is aided by applications responsible for extracting data out of the production systems. The original sources of records include e-mail servers See mail server. , file systems and data bases. The archive connects to the applications and ingests the stream of information that comes from that source. Each record has metadata and on entering the archive generally acquires more life-cycle metadata, such as when the record was archived. In ingestion, records are automatically checked to ensure that they are not corrupt or infected in·fect  
tr.v. in·fect·ed, in·fect·ing, in·fects
1. To contaminate with a pathogenic microorganism or agent.

2. To communicate a pathogen or disease to.

3. To invade and produce infection in.
. Instead of examining individual records, the modern archivist manages information streams. The metadata added at this stage will become crucial to enabling later access to the records.

Unlike IT managers, who have access to or control over a computer network, the traditional archivists do not have simple ways of querying what kinds of records are being created and where they are located. So traditional archivists must constantly scan the organization or work with records management to ensure that non current records are actually being sent to the archives for appraisal. When the information assets are valued at appraisal, archivists ask questions like:

* Do these records fit the purpose of the archive?

* Are there duplicates?

* Do we want them?

* Do we have the resources to take care of them in our archive?

* Are they worth the use of those resources?

These same questions need to be asked by the modern IT archivist; however, IT managers do have access to all of the resources in the network, and software can be used to assess, classify and value the records stored in the infrastructure.

The modern archivist needs to identify these sources of files and associate them with applications that harvest the raw data and create digital records. These records typically contain data (the original raw data) and metadata (information about the data that has been added by the archival application). The digital records are now ready to be published into the archive.

Accession is the actual movement of the records into the archive. Traditional archivists often do not accession all the records that are given them; archivists may determine that some of the records are not of permanent value or do not fit the archive's collection policy. Modern archivists have different challenges and more choices. In a digital archive the incremental cost Incremental Cost

The encompassing change that a company experiences within its balance sheet due to one additional unit of production.

Notes:
Incremental cost is the overall change that a company experiences by producing one additional unit of good.
 of holding another record is negligible This article or section is written like a personal reflection or and may require .
Please [ improve this article] by rewriting this article or section in an .
; digital records take no space. It is important that modern archivists do not pollute pol·lute
v.
1. To make unfit for or harmful to living things, especially by the addition of waste matter; contaminate.

2. To make less suitable for an activity, especially by the introduction of unwanted factors.
 their archives by saving redundant or inaccurate information; however, from a cost perspective it is almost always more expensive to determine what to store than simply to store everything.

During accession the application publishing to the archive needs to encapsulate en·cap·su·late
v.
1. To form a capsule or sheath around.

2. To become encapsulated.



en·cap
 the record into an object that contains all the information that will be necessary to interpret the record. Long-term preservation means having the ability to interpret digital records well after the applications that created them have stopped being used. For instance, preserving a Web page or a database transaction involves not only encapsulating an HTML HTML
 in full HyperText Markup Language

Markup language derived from SGML that is used to prepare hypertext documents. Relatively easy for nonprogrammers to master, HTML is the language used for documents on the World Wide Web.
 page or a database row but also creating an object that has all of the data that is referenced by the Web page or in the database transaction.

Traditionally, when records are accessioned, they are physically and legally handed over to the archive and marked as such. It is critical that the technology supporting the modern archive provide this same type of atomic transaction and audit trail at the point of accession. At accession, archivists record information such as where and when the records were handed over to the archives, by whom, whose intellectual property the record is, any access restrictions, and briefly what the records are and how big the group of records is. Because many archives have processing backlogs, meaning that the material is set aside for an often lengthy period of time before being filed, the accession record can become the only intellectual memory of the record.

The last step is often disposition, or the disposal of records, though disposition decisions can be made and remade re·made  
v.
Past tense and past participle of remake.
 at many different points in the process. Keeping the digital records online allows the archival software to retain control over disposition. At the end of their life cycle a compliance officer can decide if the records need to be kept onsite, or if can they be stored at a remote facility or even destroyed.

Preservation

The preservation of records involves both ensuring that the record contains the same information it did when it was archived and ensuring that the record can be viewed using existing technology. It also involves maintaining security and intellectual property rights.

The primary responsibility of the preservation layer is to keep the digital record intact. Because digital information is intangible, this is an enormous concern. Each record must be periodically refreshed re·fresh  
v. re·freshed, re·fresh·ing, re·fresh·es

v.tr.
1. To revive with or as if with rest, food, or drink; give new vigor or spirit to.

2.
, and when the hardware that created a record becomes obsolete, the record must be automatically moved to a new device. As data formats become outdated out·dat·ed  
adj.
Out-of-date; old-fashioned.


outdated
Adjective

old-fashioned or obsolete

Adj. 1.
, records must be evolved to support the new standards. Keeping the records safe is an important aspect of preservation--organizational archives must be secure from either malicious Involving malice; characterized by wicked or mischievous motives or intentions.

An act done maliciously is one that is wrongful and performed willfully or intentionally, and without legal justification.


DESERTION, MALICIOUS.
 or accidental intrusion. Government regulations have articulated requirements for preserving electronic records, and complying with these regulations has also been a recent focus in IT archive preservation.

When a digital record enters the preservation layers it becomes an archive object. The archive object contains the original data and metadata, as well the policies needed to preserve the digital record. The metadata includes:

* Application-specific or custom metadata: e.g., email subject, to, from, POSIX (Portable Operating System Interface for UNIX) An IEEE 1003.1 standard that defines the language interface between application programs and the Unix operating system.  parameters

* Policy parameters: e.g., retention period, replication In database management, the ability to keep distributed databases synchronized by routinely copying the entire database or subsets of the database to other servers in the network.

There are various replication methods.
, protection

* Intellectual property rights

* Links to derivative data

* Access logs

The preservation layer is responsible for keeping the metadata linked to the data it describes. The policies are the programs that perform the work of ensuring that records are preserved and available. Their tasks include:

* Verifying the authenticity The correct attribution of origin such as the authorship of an e-mail message or the correct description of information such as a data field that is properly named. Authenticity is one of the six fundamental components of information security (see Parkerian Hexad).  of digital records

* Ensuring that there are sufficient replicas of objects to endure hardware failures

* Eliminating duplicates

* Enforcing retention periods

* Evolving data formats

* Enforcing intellectual property rights

The preservation layer must be reliable and be able to scale beyond the limits of today's single-host systems. It is not uncommon for an archive to contain hundreds of millions of records. Each record must be verified, protected, replicated and so on. A Redundant Array of Independent Nodes (RAIN) architecture distributes the workload among a collection of servers that are interconnected in a LAN (Local Area Network) A communications network that serves users within a confined geographical area. The "clients" are the user's workstations typically running Windows, although Mac and Linux clients are also used. . RAIN architectures have the ability to automatically compensate for failures. Unlike Distributed File Systems Software that keeps track of files stored across multiple networks. When the data are requested, it converts the file names into the physical location of the file so it can be found. , a RAIN-based archive tracks digital records rather than provisioning storage capacity. Provisioning in RAIN systems becomes an autonomous piece of the technology that ensures that there is always sufficient capacity in the system to store and track all the digital records.

In order to maintain long-term access to digital records the preservation tier needs to be able to evolve data formats or create derivatives from the original records. It is imperative that these derivatives be permanently linked to the original records and that the programs used for the transformation be certified See certification. .

Preserving provenance and original order of records, making their context clear and keeping chain of custody information are all part of preserving the record's authenticity and value.

Access

A modern archive is useful because it enables users and programs to access digital records. Archives are not data graveyards. A modern archive should be thought of as a component in systems that enforce compliance, improve knowledge transfer and in general enable better corporate governance.

One of the least intuitive aspects of access is that the needs of the users cannot be anticipated when the archive is created. Users will want to query the records in an archive in ways that were not anticipated by the curators of the archive; hence, the importance of providing access as a separate tier from preservation. Like a traditional archive, the preservation layer should make no assumptions as to how the records will be accessed, and the access layer should accommodate multiple access methods and be able to adapt to the needs of its users.

IT archives that are offline can only be accessed physically, but access to online archives, like other networked resources, is available to groups or individuals who have been assigned rights by the administrator. However, although groups or individuals can technically access the records, they currently have limited tools to help them find records. The network metadata attached to the record is usually all that is available for querying. Quite often users will simply forward particular requests to IT departments, who find themselves performing a reference function.

IT departments find them-selves delivering archival functions without any of the tools an archivist has, such as detailed finding aids or experience with record content. Finding anything in many or most digital archives today is often very difficult and time-consuming.

Understanding what kinds of searches users make is crucial to figuring out the best ways to create tools for them to find information. Over a century ago, Charles Cutter cutter, small, one-masted sailing vessel, with a rig similar to that of a sloop except that it usually has a sliding bowsprit and a topmast. From 1800 to 1830 cutters were in service between England and France.  established a methodology for creating library catalogs called Rules for a Dictionary Catalog catalog, descriptive list, on cards or in a book, of the contents of a library. Assurbanipal's library at Nineveh was cataloged on shelves of slate. The first known subject catalog was compiled by Callimachus at the Alexandrian Library in the 3d cent. B.C. . Cutter believed that library users, using the catalog, without ever going to a collection, should be able to find books they wanted if they knew the subject, author or title, to see what a library has on a subject, by an author or in a type of literature, and to evaluate a book in terms of its edition and literary type. In 1996, Donald Hawkins discussed online searching. Hawkins stated that all searchers were using one of three methods: hunting, grazing grazing,
n See irregular feeding.


grazing

1. actions of herbivorous animals eating growing pasture or cereal crop.

2. area of pasture or cereal crop to be used as standing feed. See also pasture.
, or browsing See browse. . "Hunters" are searching for specific items or information, which would be analogous analogous /anal·o·gous/ (ah-nal´ah-gus) resembling or similar in some respects, as in function or appearance, but not in origin or development.

a·nal·o·gous
adj.
 to Cutter's idea of library users who are searching for a book when they know the title, subject or author. Within this category, there is a further distinction between searching for a particular record, when no other record will do, and searching for a particular piece of information, which may be contained in a number of different records. "Grazers" are having information fitting a certain query "pushed" to them. While this information pushing is something we associate more with news feeds and similar broadcasting models, they do apply to archives. Finally, "browsers" often have a general information need, but they have not yet formulated for·mu·late  
tr.v. for·mu·lat·ed, for·mu·lat·ing, for·mu·lates
1.
a. To state as or reduce to a formula.

b. To express in systematic terms or concepts.

c.
 a particular research question that can be answered with just one record.

Modern archives need to support all these categories, as well as a few more access functions not explicitly mentioned above. Sometimes users or administrators would like to query a database so that they can find and organize a large number of records. For example, an IT department might want statistical information on what formats the company's records have been in for the past ten years to examine how trends have changed over time. If the query matches an existing metadata field, it can be a simple procedure; however, if the query is of material not in an existing metadata field, it would currently require going through every single record manually to see if it matched the query.

One other important search-related function of archives is data mining, when the results of a query are retained as a new category. In a traditional archive, this might mean creating a new paper or electronic index for a collection; in a digital archive, it requires a query to be persisted so that past and future objects matching the criteria can be collected.

The RAIN architecture required by the preservation layer can be leveraged to provide more scalable access to digital records. A search engine that is distributed among the nodes of a cluster would be able to index billions of objects without requiring additional management. The indexes, typically half the size of the original text can be partitioned par·ti·tion  
n.
1.
a. The act or process of dividing something into parts.

b. The state of being so divided.

2.
a.
 and distributed along with the records. The result would be a high-availability search engine that can perform sub-second searches on the contents of the entire repository.

All these challenges must and will be met, because the purpose of archiving records is so that they can be accessed later. Ingestion and preservation are auxiliary auxiliary

In grammar, a verb that is subordinate to the main lexical verb in a clause. Auxiliaries can convey distinctions of tense, aspect, mood, person, and number.
 functions. Providing search technology and clear finding aids to facilitate access to records is vital to the modern archive. Records that cannot be found are essentially lost, and thus have not really been preserved. The mainstays of traditional archives, provenance and original order, while an indispensable foundation, are only a beginning. For the evolving and expanding needs of the digital environment, we must integrate the lessons of traditional archives and libraries, and continue to develop new technologies for retrieving preserved data in new and inventive in·ven·tive  
adj.
1. Of, relating to, or characterized by invention.

2. Adept or skillful at inventing; creative.



in·ven
 ways.

Summary

This document deals with the question of how to create a true modern archive that can be trusted to preserve the last copy of a critical digital record. It addresses the unique demands of the digital archive, while examining traditional archival practice as a foundation for today's systems. How records are brought into the archive and how those records are organized, securely preserved and accessed are all carefully considered. The challenge of building a modern archive goes beyond the need to comply with government regulations. The full extent of the challenge is to create a platform that is flexible and can be used in many different environments over a period of time, and yet rigid in requiring that vital information, particularly about the original context of the material, be preserved, so that these preserved records can be accessed, authenticated au·then·ti·cate  
tr.v. au·then·ti·cat·ed, au·then·ti·cat·ing, au·then·ti·cates
To establish the authenticity of; prove genuine: a specialist who authenticated the antique samovar.
 and understood.

Andres Rodriguez is founder and CTO (Chief Technical Officer) The executive responsible for the technical direction of an organization. See CIO and salary survey.  of Archivas (Waltham, MA)

www.archivas.com
COPYRIGHT 2005 West World Productions, Inc.
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2005, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.

 Reader Opinion

Title:

Comment:



 

Article Details
Printer friendly Cite/link Email Feedback
Title Annotation:SPOTLIGHT: ILM
Author:Rodriguez, Andres
Publication:Computer Technology Review
Geographic Code:1USA
Date:Mar 1, 2005
Words:4435
Previous Article:ILM ... easier said than done.(SPOTLIGHT: ILM)(Information Lifecycle Management)
Next Article:Bringing content lifecycle management to small & midsize market.(SPOTLIGHT: ILM)(information lifecycle management)
Topics:



Related Articles
Engineers developing technology to restore Hollywood movie classics.(Special Report: High Technology)
Tape storage an asset for high leveragability.(First In/First Out)
The impact of compliance on storage: will you benefit from increased demand?(Regulatory Compliance)(Information Lifecycle Management)
Policy-based data management in ILM.(Special ILM Issue)(Information Lifecycle Management)
Virtual tape: a solid citizen in an ILM world.(Storage Management)(Information Life-Cycle Management )
Distributed backup is the key to ILM.(Storage Networking)(Information Lifecycle Management )
Digital archiving in the pharmaceutical industry: while relatively new as a retention method in the drug industry, e-archiving of records is a...
Tape's efficiency as an ILM compliance solution: an interview with Steve Solomon of Fujifilm.(Information Lifecycle Management)(Interview)
Digital archiving in the twenty-first century: practice at the national library of the Netherlands.
The key to Information Lifecycle Management is cost-effective backup.(Restore lost data)

Terms of use | Copyright © 2009 Farlex, Inc. | Feedback | For webmasters | Submit articles