How will you manage the infinite archive? It may include a return to old-fashioned film.
Businesses are now more clearly focused on data and its value than ever before. However, most of this focus is on digital data. Numerous legal issues, the impact of compliance, and the fact that data may someday have significant value which can't presently be seen, make nearly all data a candidate for archival status. (Spam is a notable exception.) What is meant by the term archival? Archival storage presents several agendas for storage managers. Some think archival refers to media that needs to be read 10, 20, even 50 or more years from the present time. Others see preserving digital data for infinite periods of time, realizing that the physical media will change many times during the lifetime of data. Some healthcare providers indicate that they will archive medical records for a person's lifetime plus seven years.
The question of how much digital data exists in the world is best addressed by the second University of California, Berkeley, study on digital data creation. This comprehensive study defines magnetic, optical, print and film as the four types of physical media where data is stored. Per the study, magnetic disk and tape (digital) accounted for 92% of the total amount of data stored; film (non-digital) represented 7% of the total; with paper (non-digital) and optical (digital) media storing the remainder (www.sims.berkeley.edu/research/projects/how-much-info-2003/). Archival data is normally referred to as fixed content, meaning that it is rarely modified. Archival data represents a much lower I/O activity level than most other applications. These properties create many unique opportunities and challenges for archival storage device suppliers.
Magnetic tape is the most commonly used data center archival technology. The most commonly quoted figure for the archival life of current generation magnetic tape cartridges ranges from 15 to 30 years, in ideal environmental conditions. Even in an era of significant emphasis on compliance and records retention, that is long enough to make storage administrators comfortable that the media will last. Even if the digital media can be read many years from now, the rate of change for new storage technologies make the media obsolete in less than 10 years. Finding replacement parts, trained maintenance personnel, diagnostics, and operating systems that support old devices now mandate conversion to a new archival technology well before its rated useful life is over. The significant improvements in magnetic tape media life since 2000 now enable tape media to exceed the practical limits of the tape drives themselves. Remote electronic tape libraries used as vaults and true offline tape storage remain useful for archival of records that are seldom accessed, offering additional geographic protection against disasters.
Optical media organizations have traditionally relied upon optical media and Write-Once, Read-Many (WORM) media to comply with regulatory requirements for "non-erasable" and "non-rewriteable" storage media. Optical media thrives in the entertainment storage business, but interchange and standards issues can make media management more time consuming for business applications. In addition, optical disk capacities have failed to keep pace with magnetic technologies in either capacity or data rate. Optical DVDs offer 4.7 gigabytes capacity writing a single layer of data on a single side of the media, and the latest Blu-ray technologies (providing up to 30 gigabytes) pale compared to the half-terabyte capacities of current tape cartridges and disk drives. The explosion in regulated data storage is pushing the limits of storage capacity well into the terabyte range, pushing past the performance and capacity limits of optical media.
WORM Disk Arrives
In order to solve this dilemma, the popular economy magnetic disk arrays combined with user-selectable WORM functionality deliver a non-alterable storage solution that is ideal for archival and regulated data storage. WORM disks use disk arrays (typically, economical SATA drives) to create large second-tier online storage arrays. SATA-based storage arrays deliver TBs of online capacity at prices that bring disk storage closer to automated tape than ever before. Prices for SATA-based storage arrays are commonly in the $3-$15 per gigabyte range, compared to automated tape libraries that range for $3 to less than $.25 per gigabyte. The anticipated progress of tape cartridge capacity with data compression implies that automated tape will maintain its price differential over disk for the foreseeable future.
Evolving in parallel within the SATA movement is the new concept of MAID (Massive Arrays of Idle Disks) storage. MAID is similar to the RAID concept except that in a MAID storage array, all disks (currently SATA disks) are not spinning all the time. With a MAID subsystem, disks remain dormant (powered off) until requested. Power-up time for SATA disks takes about 10 seconds. MAID is aimed at enabling the current SATA activity to handle an additional level of storage requirements partially being addressed by automated tape libraries or not being cost-effectively addressed by disk or optical storage. This concept is somewhat analogous to an automated tape library with the exception that disks are substituted for tape cartridges. MAID can be viewed as a library of disks.
By reducing the number of disks that are concurrently active, the overall storage subsystem costs can be significantly lowered by simplifying controller complexity. The financial savings increase as storage environments get larger. MAID provides traditional levels of RAID data protection capability, important for SATA drives, to enable higher availability similar to current disk arrays. The most effective usage of MAID storage will be application driven. MAID isn't suitable for all applications, but it is poised to addresses mid-term archival, lower activity fixed-content data, and backup and recovery more cost effectively compared to existing disk solutions. MAID products are just now appearing in the market.
Last Ditch Data Recovery: The Return to Analog?
Digital storage has an additional requirement for access compared to analog data--electricity. What if there is an electrical outage for several days? Unfortunately, events of the recent past now indicate this has become a possibility, though still a remote one. Hurricanes, fires, floods, and terrorists contribute to the odds that one day dealing without electricity for several days may happen. Therefore, the technology selections above won't help in case of a blackout or sustained electrical failure.
The only reliable read capability remaining in a prolonged blackout becomes the human eye. The human eye can read analog data. In this absolute "last ditch data recovery" scenario, moving the absolute most critical, potentially life-saving data back to analog film has value. In a life or death situation, film can provide the information needed for survival when electricity based storage has become non-functional. Paper would be another option, except that the physical space required (compared to film) is most likely prohibitive. Film is immune to viruses and is non-alterable, making this step to archival storage management as the last stop. As an informational note, Microbox in Germany (www.microbox.de/english/frameset_longterm.htm) has developed a most unique and advanced state-of-the art laser film recording technique--currently writing data at 12,000 dots per inch with a rated media life of 500 years. Soon CAD-CAM drawings and x-rays with no artifacts can be written on this film that can be read with magnification by the human eye! Device obsolescence, the lack of parts, operating systems support, and worms and viruses are issues that don't exist in the case of film.
A final consideration for archival storage is the requirement to periodically move the data from older technologies to newer ones. The challenge, practicality and time required to do these conversions increase every day as the amount of data steadily accumulates in each business. Moving terabytes (and soon petabytes) of data through servers becomes impractical and degrades the overall system. Device-to-device data transfer offers the most promise, though a work-able implementation remains distant. This issue has no clear-cut solution today. Most progressive data centers continuously move 15-25% of their data to newer technologies each year to minimize the disruption, making this possibly the only viable option.
Though the complete journey from analog and digital data creation to long-term archival storage could involve a return to analog data, in either case the issue of managing archival storage is without a doubt a pressing one. The archival choices we make today will make possible archival data retrieval tomorrow.
|Printer friendly Cite/link Email Feedback|
|Title Annotation:||storage management|
|Publication:||Computer Technology Review|
|Date:||Sep 1, 2004|
|Previous Article:||Don't fear video.|
|Next Article:||The scramble is on to solve recovery puzzle: virtual tape, MAID, SRM are some leading solutions.|