Understanding the new generation of data protection solutions.
However, by examining key differentiating factors--including retention period, speed and cost--we can segment the data protection market into three areas: Staging solutions, designed for data kept briefly on the way to longer-term media; Backup/Restore solutions, designed for data kept in the medium term, when a restore will likely be required; and Archiving solutions, designed for data kept long term for legal or corporate requirements. Further, by breaking down your data protection requirements by segment, it becomes easier to set evaluation criteria and identify the right components for a complete solution.
The cost of Staging solutions tends to be much higher than traditional tape; however, this is a secondary concern to speeding backups. Because of their cost, Staging devices are often used to keep the most recent backup data, typically for a few days maximum. Most technologies in this category enable data to be moved to tape at any time without disturbing business processes. Disk-based solutions in the Staging category include ATA arrays, Virtual Tape Libraries, Snapshot Technologies and Continuous Data Protection.
[FIGURE 1 OMITTED]
ATA Arrays are used as target data repositories for backup operations. Most modern back-up software solutions from vendors like Veritas or EMC Legato support backup-to-disk options, in addition to their more traditional backup-to-tape options. Backup to standard ATA Libraries tends to be extremely fast and easy to implement, but also extremely expensive. The cost per gigabyte for ATA array based storage in 2004 was between $3 and $15, whereas tape storage was one-sixth that cost.
Virtual Tape Libraries are increasingly common; today no major tape library vendor remains without one in its product portfolio. VTL solutions add to the standard ATA array by providing an interface that mimics a tape library, allowing seamless integration into existing backup infrastructures with little or no process change. VTL solutions tend to run faster than even ATA arrays, as no host operating system or file system is required. Most VTL solutions today also allow users to manage an attached tape library, backing up or restoring virtual tape images to real physical tape. Since VTL solutions are intended to speed backups, it's essential to examine the backup environment in its entirety. Bottlenecks in application servers, media managers and software compression can often reduce throughput substantially; in some instances it may be sufficient or more cost-effective to use a traditional array.
Most modern storage systems support snapshot technologies in one form or another. Snapshot technologies make a quick point-in-time image of the block-level storage or file system to be kept available to restore data, even when that data then subsequently changes. Snapshot images can often be made end-user accessible, allowing users to locate previous versions of data or recover accidentally deleted files. As the data is in a read-only format, the backup of a snapshot to tape is extremely simple. Most snap-shot technologies tend to be reasonably efficient in their use of disk space, storing only changes to stored data rather than whole copies of changed files.
Continuous Data Protection
Continuous Data Protection (CDP) is an emerging technology designed to provide the same level of protection as snapshots but without the atomic point in time. Using CDP, storage administrators can 'roll back' data storage systems to any previous point in time (within certain limits) rather than a predefined snapshot point. Questions remain regarding CDP technology's integration with heterogeneous application environments and the real tangible benefit it offers over traditional snapshots.
When evaluating Staging solutions, it's important to weigh the following criteria and consider:
* What level of performance does it offer in meeting enterprise backup windows?
* How easy is it to perform data recovery--for administrators and users?
* How well does it integrate with a disk library or tape library for longer-term retention?
* How well does it integrate with your existing backup software infrastructure?
Enterprises typically keep up to six months of data for restoration in the event of data loss. Backup and restore products are typically at the core of a data protection solution. Products in this segment tend to be a little slower than those in the staging segment; buying criteria is focused on reducing long-term data storage costs and enabling offsite data migration. Tape libraries have been entrenched in enterprise back-up storage for many years. However, exponential growth in enterprise data, shrinking back-up windows and high-profile incidents of tape loss in transit have exposed the limitations of tape. Further, studies show that one in three recoveries from tape fail--a rate of failure most organizations cannot tolerate. Disk-based products in the Backup/Restore category include Disk-Based Libraries and Capacity Optimized Storage.
Disk-Based Libraries (DBL) are the biggest challenger to tape library solutions. These devices typically come in two formats--removable and fixed. Removable disk libraries have cartridges that allow for the physical removal of disks or disk cartridges. However, the cartridges are often heavy fragile and difficult to move to a remote storage facility. Another issue with removable disk cartridges is that manual interaction and tracking is still involved; as with tape libraries, this remains the biggest source of potential recovery error. Fixed disk libraries rely on modern replication technologies to move data offsite for disaster recovery; they only allow for disk removal to replace defective disks. Well-designed DBL often provide both file system and virtual tape interfaces for maximum flexibility. Additionally, some disk libraries implement MAID (Massive Array of Independent Disks) technology to power disks up and down, lowering power costs over the lifetime of the system and extending the life of the disks within it.
Capacity Optimized Storage
Capacity Optimized Storage (COS) represents a new evolution in disk-based libraries. COS solutions not only allow for long-term disk-based storage, but also provide dramatic levels of compression (20:1 is typical), a feature that makes disk drives cost-competitive with tape for enterprise backup and restore. Some COS solutions also utilize the compression technology for wide-area replication, allowing low-cost site-to-site connections to be used for offsite data protection and obviating the need for manual processes around the movement of disk or tape
When evaluating Backup/Restore solutions, it's important to weigh the following criteria and consider:
* What is the cost of long-term data storage on disk vs. tape?
* What is the cost of ongoing maintenance?
* Does the solution provide mechanisms to check data integrity to ensure data recovery?
* How easy is it to move data offsite for disaster recovery protection?
* How well does the product integrate with standard backup management software?
Archiving solutions are designed to retain data over the long term--typically more than six months and up to six years for most financial data. (Legally, some kinds of data may need to be retained for up to ninety-nine years). New disk-based products in this segment have difficulty competing with the sheer low cost of a shelved tape cartridge. However, a company may be liable for the reproduction of historical data and tape offers no certainty of data restorability or integrity. Modern optical storage systems, such as "jukeboxes," use DVD media to store data, but long-term storage can often be a challenge due to the sheer amount of media generated--more than 120 DVDs are required for every Terabyte archived--and locating the correct DVD for recovery can be difficult. New legislation such as Sarbanes-Oxley has prompted vendors to create solutions that meet stringent retention requirements. Disk-based solutions in the Archiving segment include Content Addressable Storage solutions.
Content Addressable Storage
Content Addressable Storage (CAS) solutions are disk-based storage solutions developed specifically for the long-term archiving and retrieval of data with a focus on ensuring the retrieved data is identical to that which was stored. Data passed to a CAS system are examined and then a 'hash' is created which uniquely identifies the data. The hash (a series of hexadecimal digits) is generated by a 'hashing' algorithm (that processes the data to generate a hexadecimal fingerprint of its contents). This algorithm is chosen to minimize the chances of any two pieces of non-identical data generating the same hash. Once the data are stored, the hash is then returned to the originator of the data to be used for retrieval--much like a coat check. The hashing approach ensures that the same piece of data is never stored twice, as duplicate files can be identified though a comparison of hashes. While CAS systems work well, most require a separate, software-based control program that stores and retrieves data from the CAS system. Many programs exist for the most popular CAS solutions, including e-mail collection and archiving, Instant Messaging collection and archiving and file system front-ends for generic data storage.
When evaluating Archiving solutions, it's important to weight the following criteria and consider:
* What are your legal requirements for long-term corporate electronic record maintenance?
* What are the potential cost efficiencies you will realize in both storage and retrieval?
* What software support is available for archiving and retrieval of required data types?
* What solution support will be available over the lifespan of your archiving requirements?
Breaking down your data protection requirements by segment--Staging, Backup/Restore and Archiving--is an important first step in building a solution, making it much more straightforward to evaluate and select the appropriate products. In my next article, I will demonstrate how to use this approach in designing and implementing a complete data protection solution.
Geoff Barrall is CEO of Trusted Data and serves on the board of advisors for Data Domain
|Printer friendly Cite/link Email Feedback|
|Title Annotation:||Disaster Recovery & Backup/Restore|
|Publication:||Computer Technology Review|
|Date:||Jun 1, 2005|
|Previous Article:||Recovery management: focusing on recovery-oriented data protection.|
|Next Article:||Minimizing downtime with disk image restores.|