Printer Friendly

Data lifecycle management: hard drives are not enough.

There are a number of pieces to the compliance puzzle; identifying what data needs to be saved, how fast that data is growing, how fast the data needs to be accessed, how long should it be retained, what federal/state/local regulations govern the data in question, and should the data be disposed of once it's met its end of life. Today, there is not a catchall application that can satisfy all the compliance questions and requirements; however, fixed-disk compliance solutions can go a long way in meeting the initial requirements to the compliance problem.

Fixed disk does have its limitations with long-term data. Long-term data is data that needs to be accessible for more than three years, yet has settled into its final version and is unlikely to change further. For compliance data that is still in a changing state, or for compliance data that needs to be accessed often, fixed disk makes a lot of sense. However, data migration should not stop there. There are a number of reasons to support a tiered architecture beyond fixed-disk solutions.

Media life: Most of the fixed-disk compliance solutions use low-cost ATA disks. The usable life of these disks is in the range of 3-5 years. All of these solutions use RAID technology to protect the data from failing disks; however, this is a cost that can be mitigated with longer-term removable media. The use of long-term media limits the number of times data must be migrated within the same media type throughout its life cycle. Limiting the number of times data must be migrated reduces the chance of data loss or corruption, as well as reducing the cost of managing the data over time.

Volume: As stated above, fixed-disk solutions must have redundancy to protect the disks from failure. As a result of redundancy, each object must be stored in multiple locations, at least partially. This is not new to primary storage; however, it is a requirement for fixed-disk solutions in a critical environment. This redundancy drives up the volume of stored objects and, therefore, increases the cost of fixed-disk Content Addressable Storage (CAS). All vendors marketing disk-based storage market their solution in terms of "raw" storage and "usable" storage. Raw storage is the total amount of storage a user must buy as part of the system. Usable storage is the amount of storage that the customer can actually use after redundancy and overhead. Cost per megabyte should be determined in terms of usable storage to get an accurate cost of ownership.

Overhead: In order to meet write-once requirements for magnetic disk solutions, the data must go through an Application Programmatic Interface (API) that forms a data protection layer of non-eraseability while the data is under management in the compliance system. The API can also perform a number of other tasks which are specific to the vendor that produced it, such as reducing redundant objects, encrypting data for security protection, and so on. The tasks can vary greatly; however, one of those tasks is to assign a retention period and lock the data in a non-alterable state for the life of the data based on specific policies. This API also takes up disk space in the system, as much as 25% of the usable storage. This API can also cause a significant performance hit to the system. Our experience is that data transfer rates for fixed-disk transfer devices are about the same as removable optical transfer rates (2-4 MB/sec) depending upon object size and volume. So, there is no performance advantage to using disk-based solutions over removable storage solutions.

Backup: All hard drives fail more often than we would like. That is the reason for redundancy. However, redundancy is not enough to protect critical data on magnetic disk. As a result, fixed-disk solutions should be backed up. Whether the system is backed up to disk or tape is not really an issue; however, there is a cost and management issue regarding backing up data. Backups are not archives and should not be treated as such. (I will leave this theme for another article.) Backups by definition create a second set of data. This presents a compliance issue in that if data is disposed of on the compliance CAS system, how is the data on the backup set disposed of? Either the administrator or the application must dispose of the data on the backup set. Most compliance applications do not have a means of dealing with backup sets.

Data Growth: Compliance data is growing at an alarming rate. Pick your analyst to get a number; however, all agree archives are growing by more than 100%. On a fixed-disk compliance system, the only way to add more volume is to add more disks. This volume growth generally is measured in terabytes or blades of storage. Blades take up rack space, and rack space takes up room. In any case, blades cost money. The amount of cost then tracks in a linear growth curve relative to your future storage needs. This will be problematic in the coming months and years as data centers outgrow the fixed-disk storage systems containing CAS information. The solution will be to either buy more disks in the form of blades, increase the size of the disks when possible with upgrades or move the data to a next tier solution that is removable.

Compliance First: We feel that compliance needs should be met first by storing regulated data to a compliance media that is removable, write-once and has a long-term usable life. Once the regulated data is secure on a compliance media, the data can then be migrated back to faster access storage as needed.

By doing this, the need to back up large amounts of data on fixed disk devices is mitigated, thus decreasing the back up window and reducing the amount of storage hardware to allocate and manage. Current hardware can be utilized more efficiently as archive data is written off to removable archive media that better satisfies archive and compliance needs over time and, if desired, can be moved off-line to further reduce archive storage costs. Once stored on compliance media, it can be deleted off fixed disk storage devices and, if necessary, migrated back when needed.

Archive is not Backup: Backups and Archives are not the same things, nor are they a substitute for one another. Backups are usually incremental, which means files are backed up only if they have changed since the last backup. Archiving is not incremental. When a file is migrated to an archive, it will stay stored on the target media until the end of the data lifecycle. Also, once a file has been archived, it is usually deleted from its previous location (not the case with a backup) and replaced with a meta-data pointer so the file can be retrieved from its original location. Backups are intended to protect data for a very temporary period of time, while archives are intended for long-term management of data. Backups are usually automated, as are archives. Archives must be indexable and searchable for them to be truly useful, while backups generally are larger blocks of data that get restored as a snap shot in time. Rotational media lends itself for more efficient indexing and searching than linear access storage such as tape, in most cases rendering the transfer rates of tape devices a useless comparison for archive performance.

Lower Archive Storage Costs: A tiered-based archive solution also provides a more cost-effective solution moving forward because the data in a removable solution can grow beyond the device itself simply by managing volumes off line. Therefore, the cost of expansion is the cost of new media and not more hard drives. Today that cost is approximately $2/GB with Plasmon's UDO or Sony's Professional Disk for Data. It is even less expensive with WORM tape solutions; however, with tape the user looses the random access benefit of rotational media.

Data on removable media does not need to be protected, as does magnetic disk, because drive failures are independent of the media and can be taken off line and replaced without having to copy or restore data. While some users copy data stored on removable media, the copy is generally stored off site as part of a disaster recovery procedure. The cost to copy media is only the media itself. No additional hardware is required for data growth. Nearline storage libraries are redundant by design and less costly per MB than fixed disk solutions. By limiting redundant copies of data, the amount of storage needed for archives is reduced greatly and helps to drive the cost of archives down. This factor will provide increased savings over time as your archives grow.

By using removable write-once media, the data protection takes place at the recording structure of the media or in the drive firmware. The system does not need to rely on an API to ensure the data is non-alterable. The device itself manages the write-once nature of the media. The gold standard in write-once protection is the physical change at the recording layer such as phase-change optical recording. Other removable write-once technologies such as tape rely on electronic write-protect firmware to provide write protection at the media. This form of electronic write-protect is also non-API specific and not subject to additional performance overhead.


Since there is no such thing as catchall media, a tiered approach is likely the prudent course of action to be considered by any company looking to improve their compliance policies. A mix of disk-based compliant storage along with removable media for long-term archives is the best solution from both a performance and a cost management perspective.


We all know that industries, governments, organizations and individuals are producing more and more digital content (documents, images, photos, voice recordings, e-mail ...) than ever before. We also know that industries, governments, and organizations are mandating that much of this content be archived for fixed periods of time in a secure and verifiable format, not only for legal reasons, as compliance plays into a "best practice" operational procedure. These digital records, in addition to mitigating legal issues, can be mined to capitalize on the business environment and maximize market opportunities, as well as reduce overall storage management costs and operational overhead. This phenomenon has generated the boom in archive storage over the last couple of years. In that time, a number of new products have emerged to capture this opportunity, many of which are magnetic hard drive based. While we at Pegasus believe that hard drive based archives are costly and provide little performance value to the user, there is no question that they have taken off. This is because they provide a solution to meet today's needs with today's technology. Unfortunately, these solutions provide little growth for tomorrow without the added investment of additional hardware and software as archive data continues to grow at a rate of over 100% a year.

Jim Wheeler is director of market development for Pegasus Disk Technologies (San Ramon, CA)
COPYRIGHT 2005 West World Productions, Inc.
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2005, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.

Article Details
Printer friendly Cite/link Email Feedback
Title Annotation:Disaster Recovery & Backup/Restore
Author:Wheeler, Jim
Publication:Computer Technology Review
Geographic Code:1USA
Date:Feb 1, 2005
Previous Article:IP-SAN performance: best practices.
Next Article:Data storage sticker shock: the need to recalculate data storage TCO.

Related Articles
The emergence of e-vaulting: electronic vaulting is a compelling improvement on traditional in-house data backup and recovery functions.
Simplifying storage: how companies benefit with a backup appliance approach. (SAN).
Prepare for the worst: portable tape drives put the "recovery" in mobile disaster recovery.
Lifecycle management drives data management's evolution from art to science.
SAN-based data replication.
CMS Products ships 100GB ABSplus ultra portable backup and restore system.
TCO should include value as well as cost.
The push for continuous data protection.
Personal disaster recovery software: an essential part of business disaster recovery plans.
2005 storage year in review.

Terms of use | Privacy policy | Copyright © 2019 Farlex, Inc. | Feedback | For webmasters