8 tips for raising data from the dead: retaining the original software and hardware used to create electronic date cannot guarantee its survival. Those who have neglected to properly convert and migrate their important information to ensure its availability will find solace and solutions in this article's suggestions for its recovery.
Despite the myriad problems that can beset records and information management (RIM), "all" is not necessarily lost when primary backups and data conversion duties are found to be wanting. So I discovered in an attempt to recover my own files from 20-30 years ago--a cautionary tale since the results were mixed. By very rough estimate, as explained in the chart on page 32, I was able to recover only an estimated 78% of the "ancient" archive.
Although this article arises from my trials and tribulations, it also speaks to issues facing society in general. It is well known that with digital records and documents now the norm, much of contemporary history will be lost to succeeding generations as standards change, computer products come and go, and storage media become corrupt. These issues have proliferated due to:
* Planned obsolescence
* Merger, acquisition, and failure of companies
* Improvements in operating systems
* Technological innovation
No one knows what the future will hold, but the past can be taken as a reasonable proxy of the compatibility and longevity problems likely to emerge in coming years. A convincing case cannot be made that backwards compatibility with previous generations of products is improving--except on the web, which is a relatively recent phenomenon.
How much of the past is prolog to the future will vary according to organization and job position, but the very nature of RIM is that a portion of the institutional and personal memory will always remain.
My journey back to the origin of widespread personal computer deployment in the early 1980s, when the archives I worked to recover began, instilled eight primary lessons.
Lesson 1: Think Software Obsolescence, Not Media Longevity
The tapes and 3.5-inch diskettes in this project were at the high end of the 20- to 30-year estimate of how long magnetic media remains reliable, according to the South Carolina Department of Archives & History "Electronic Records Management Guidelines for Digital Media." Other estimates of media longevity have ranged from three to 30 years and are too disparate to be trustworthy.
Use, manufacturing, and environmental factors (e.g., heat and humidity) are better predictors of media longevity than abstract rules, wrote Michael W. Gilbert in "Digital Media Life Expectancy and Care," produced by the University of Massachusetts.
But, since software obsolescence can be expected to outpace the most parsimonious estimates of media longevity, migration and conversion considerations should prevent "bit rot," or data decay, from ever growing; software needed to read legacy data will change much sooner than storage media degrades.
On the other hand, waiting too long to start the conversion process can exceed the media's life expectancy. In my case, 5.25 inch diskettes succumbed to this failing and, yielding hardly any data, were the most corrupted of all the media. A double failing was the misapprehension that the same material was available on tape, as few tapes survived.
Lesson 2: Expect Troublesome Version Issues
Had the material been encoded with standardized generalized markup language (SGML) to make structure and contents (e.g., paragraphs and sections) identifiable regardless of their format, recovery would have been facilitated. But, SGML has always been a miniscule part of the industry. And, extensible markup language (XML), the now-popular subset of SGML that makes it easy to structure, store and transport data, was not in existence until 1998, long after the 20- to 30-year-old files were created.
Accordingly, the archive in question retained the native format of each software application. With luck, current software is able to open legacy data with only minor irregularities like font changes. Yet, file structure and formatting conventions change over time, and current software also can produce gibberish--if the file opens at all.
Therefore, users often are forced to open and resave each file with successive versions (major releases) of the authoring software until the file is either rid of glitches or left with an acceptable amount of editing. Sometimes, there are known shortcuts (e.g., a third-party plug-in) that allow a page layout program to import all files since 1997 from a competing page layout program.
Floppy, digital audio tape (DAT), and small computer system interface (SCSI) drives needed to read the old media brought similar problems because the associated drivers were dependent on obsolete operating systems and processors.
For example, a leading platform abandoned floppy drives in 1998 and SCSI support the following year. For simplicity, I acquired used computers for different operating systems--three "ancient" machines for one platform and two machines for the other platform, which included a SCSI host adapter, large capacity cartridge drive, or 5.25 inch floppy drive. Each was acquired through online ads at a price greatly reduced from the original.
This arrangement had the advantage of permitting experimentation without jeopardizing operating systems and applications on the other computers.
Lesson 3: Rank on Your Own Expertise
Recovering old files can be a lonely process because vendors no longer sell or even support out-of-date products. So, users may be thrust into a new job without training or with only dim memories of the obsolete technology.
In comparison, data recovery experts (not used in my project) simply return the readable portions of a hard disk or tape, and updating, converting, or making sense of severely corrupted files become the responsibility of customers. In this spirit, one can apply the same first aid that data recovery experts often attempt: replacing the hard drive's controller with a working controller that exactly matches the vendor, model, and size of the defective disk.
For two of my hard drives that would no longer spin, I went to a large online auction site and found replacement drives that exactly matched and swapped the controllers, after which the drives booted perfectly. On the other hand, this method always involves risk, which the non-expert should avoid when handling files that must be recovered, as opposed to files that merely would be nice to recover.
Lesson 4: Draw on Less Obvious Resources
Even when vendors no longer provide support for obsolete products, they may sponsor online user communities, which can be an invaluable source of knowledge. News-oriented sites and professional societies also sponsor a variety of online forums that anyone can join and query.
Another resource can be online auctions of used equipment. Generally, sellers are willing to answer questions about products even before the auction ends. On the other hand, helpful information is unlikely to emanate from a seller who, buying in bulk from a liquidator, has never used the product.
Posting online "want-to-buy" ads can be yet another resource and was the only way I was able to acquire missing versions of key software.
Lesson 5: Concentrate on Expedience, Not Elegance
Results are what count in data recovery, not the elegance of the process. This principle applies to the networking together of the multiple computers in this project. Networking stability in my case would have been difficult to maintain with reinstallation, at different times, of three operating systems on each of two platforms.
As an alternative to a full network, a network-attached storage (NAS) drive would have sufficed except for fatal incompatibilities with the only NAS drive on hand (a recent model from one of the largest drive manufacturers). Employing "sneakernets"--using a combination of cartridge and USB drives to physically move data--was a suitable alternative. Mainly for convenience, I supplemented this method with compressed (zipped) files and file transfer over the Internet.
Lesson 6: Preserve Original File Elements
There is a useful role for photo editing, presentation, or page layout software, which can import multiple data types into a composite image or document. However, if the various files that come together are flattened into a new format, they all can be lost at once if the flattened file becomes corrupt; thus, preserving the individual elements as separate files helps ensure reliability.
Put differently, merge files when appropriate, but retain before and after copies as part of the permanent record. Especially for repurposing, individual components (e.g., images or word processing files) may be more adaptable than the flattened document.
Lesson 7: Tap Unused Space on Removable Media
Corporate policy may forbid copying important files to removable media, or the files may be too vast. But, if feasible, this type of redundancy can be a lifesaver. When DAT tapes on which I depended for the oldest data turned out to be incomplete, I was lucky to have had a series of partial copies on other removable media. These copies were generated, not as systematic backups, but simply as a hedge against unexpected system failure during specific projects. Wherever there was extra space on this media, it typically was filled with the most important documents at the time.
However, this collection spawned the related problem of files in no order apart from the occasional use of folders. It was difficult to determine what was what among the many duplicate, renamed, and modified files with changed dates. Fortunately, there is a variety of free or inexpensive software for finding and eliminating duplicates, though examining the list of duplicates was a monumental task. Although the software provides for automatic deletion of duplicates, this procedure would have been risky because of the profusion of dates.
Lesson 8: Avoid Cute Nomenclature
Given the danger that one's successors might be presented with a dense outcropping of unknown files, avoid cute file names in favor of descriptive names that will make sense to others later. For example, "Accounting 1982-83" is more informative than "Pillowman" or "Doghouse2." Metadata obviously aids classification when users are willing to take the time to add tags, but as in my case, hindsight is always 20/20.
Plan for the Future
This project, though failing to attain the goal of recovering files that could be published on the web, reinforces the need for diligent attention to ARMA International's Generally Accepted Recordkeeping Principles[R] (GARP[R]). (For more information, visit www.arma.org/garp/index.cfm.) This project emphasized that:
* Secure, offsite storage has considerable merit, but it is only as good for migration and conversion as the last backup placed into storage.
* Where available, paper copies of documents tended to be more reliable than the electronic copies.
* Optical character recognition (OCR) of the paper copies, combined with intensive study of what was worth preserving in the first place, probably would have been no more laborious or time-consuming than my massive recovery efforts.
* While graphic data was not suitable for OCR, this data needed to be reworked anyway; editable graphics turned out to be less important than originally assumed.
* There is a hard and easy way to learn GARP[R]: undue delay leads to the former.
While user behavior is in perennial need of improvement to prevent the loss of data and documents over time, the computer industry is hardly blameless. Dan Bricklin, who developed the first spreadsheet with Bob Frankston, was correct when he wrote in "Software That Lasts 200 Years" that software should be built like bridges, dams, and sewers--meant to last for future generations.
The seed of this project germinated in my unintended violation of GARP's[R] Principle of Availability, "An organization shall maintain records in a manner that ensures timely, efficient, and accurate retrieval of needed information."
This principle's annotation applies as well: "Electronic information needs to be routinely backed up to ensure that it can be restored if there is a disaster, a system malfunction, or the data becomes corrupted. It also needs to be constantly migrated to currently supported hardware and software to sustain its ongoing accessibility." (See www.arma.org/garp/metrics-availability.cfm.)
These words carry great import and should be followed religiously. But the flesh is weaker than the spirit. The real world is often one of competing priorities, procrastination, unfulfilled ambitions, underfunded projects, and a false sense of security. Although I had a variety of backups, I was negligent about migration and conversion of the principal archive.
Erroneously, I believed that as long as I kept the software and original computers that created the data, at some halcyon moment I could bring the ancient data back to fife; from there, the intention was to distill portable document format files as an archival format. Yet, because of travel and moving, most of the archives languished until a few years ago, by which time the oldest files were more than 30 years old.
When the recovery finally began, the two primary computers on which I was counting failed at different times. To fast forward, the initial repair was to no avail, and I ultimately decided to connect the hard drives from both machines to replacement computers.
Thus began an odyssey of.
* Acquiring these computers
* Reconstituting operating systems for different periods of the archive
* Installing SCSI, DAT, and 5.25 inch drives (CD-ROM and 3.5 inch drives were a given)
* Running data recovery software
* Searching online for working copies of corrupted authoring software
In the end, the surviving data was migrated to a new computer running the latest software.
Lawrence Kingsley, Ph,D., is a Boston-based writer and consultant. He is the founder and general editor of Telepublishing EBooks, which offers analytic reports on IT theory and practice, consumer trends and developments, and best-of-breed products. Kingsley recently edited QoS: Myths and Hype. (An extended excerpt is found at www.scribd.com/doc/47300283/QoS-Myths-and-Hype.) He received his doctorate from the University of Wisconsin and master's from Boston University. Kingsley can be contacted at email@example.com.
Lawrence Kingsley, Ph.D., can be contacted at firstname.lastname@example.org.
Recovery Scorecard Media Medium Quantity Quantity Recovery Type Size Corrupted In (GB) Hard disk (Each) whole or Part 4 GB 1 0 2.89 Hard disk 4 GB 1 0 3.010 Hard disk 2 GB 1 1 0.818 Hard disk 535 MB 1 1 0.264 Hard disk 330 MB 1 1 0.000 Hard disk 1 GB 1 0 0.429 Hard disk 40 GB 1 1 18.810 DVD 4.7 GB 2 0 4.417 CD-ROM 650 MB 24 5 9.720 DAT tape (8mm) 4 GB 7 5 20.960 3.5" floppy 1.44 MB 311 4 0.336 5.25" floppy 1.2 MB 70 70 0.003 Total N/A 421 88 61.657 The archive in question is listed in the first column; the last column lists the amount of data recovered from each device. The maximum storage space was only about 105 GB. Since the original amount of data on each storage device was not recorded, each was estimated to be 7596 full, for an aggregate of 79 GB. The recovered 61.7 GB represents a 78% recovery, or 2296 data loss.
|Printer friendly Cite/link Email Feedback|
|Publication:||Information Management Journal|
|Date:||Jul 1, 2012|
|Previous Article:||Turning the ship around with a four-generation crew: for the first time in history, organizations have four generations of employees working...|
|Next Article:||Getting buy-in for your information governance program.|