Are the days for traditional backup and recovery numbered? Changing the rules for backup and recovery.
[FIGURE 1 OMITTED]
Traditional backup and recovery
Backup and recovery were the very first data protection and storage management applications first surfacing in the mid 1950's when disk and tape appeared but their roles have become far more critical today. It is important to understand the RTO (Recovery Time Objective), the RPO (Recovery Point Objective) and the DPW (Data Protection Window) or the maximum amount of time that a system can be paused for a recovery operation without impacting business operations. These observations have enabled low-cost SATA disk systems to address more of the backup and recovery requirements than in the past since most of the recoveries occur in a short period of time after the failure first occurred. In general, after a month, keeping backup data on disk is not economical and the decision point to delete the backup data, continue to back up the data on disk, or move it to archival status presents itself to the storage administrator. Determining the optimal data protection strategy for specific data classifications can be complicated and consume large amounts of storage resources.
Data protection has become a more complicated discipline but it is not impossible to comprehend. Mirroring is implemented as a block-for-block replica of a file, a logical unit, or a physical disk volume normally using disks for all copies. Once the mirrored data element is established by creating a copy of the original data element, the mirror is maintained by replicating all write operations in two (or more) places creating identical copies. Mirroring eliminates the backup window but doubles the amount of disk storage required adding expense. Storage administrators must choose to implement asynchronous or synchronous mirroring as tradeoffs exist for each case and all increase the total storage requirements for the backup and recovery application.
In synchronous mirroring, both the source and the target devices must acknowledge the write is completed before the next write can occur. This degrades application performance but keeps the mirrored elements synchronized as true mirror images of each other. For asynchronous mirroring, the source and target devices do not have to synchronize their writes and the second and subsequent writes occur independently. Therefore, asynchronous mirroring is faster than synchronous mirroring but the secondary copies are slightly out-of-synch with the primary copy. This is sometimes to as a fuzzy copy. Asynchronous mirroring is often used with an IP storage protocol to replicate data to locations hundreds of miles away. In reality, the secondary data element is usually no more than one minute behind or out-of-synch with the primary copy. This can be a significant exposure for write-intensive applications.
Mirroring is used for many mission critical applications and it is the fastest way to recover data from a hardware device or subsystem failure since restore operations can occur in no more than a few seconds by switching to a mirrored copy. Mirroring does not help protect against a data corruption problem (hacker, worm, virus, human or software error) as it produces two or more copies of corrupted data. For best practices, mirroring should always be accompanied by point-in-time executable copies for data that can permit a near instantaneous recovery to occur from clean data that existed before the corruption occurred. Mirroring is defined and also commonly referred to as RAID 1 which (at a minimum) doubles the amount of disk storage required.
PIT (Point-In-Time) Copy
PIT provides an executable image of data meaning that no recovery action is needed at all. Like a series of still images, PIT copies are complete data images that are taken at specified points in time defined by the RPO (Recovery Point Objective). PIT copies enable an administrator to go back in time to access data from a non-corrupted state prior to when the corruption or other disruption occurred. This represents the most complete method to protect from human errors, software problems, hardware problems, viruses, intrusion and should accompany any mirroring implementation. Again, tradeoffs exist. The more frequent the PIT copy is taken, the more storage is required and the more time it takes to determine which copy is the correct one to reinstate.
Snapshot copy presents a consistent point-in-time view of changing data and is gaining popularity. There are many variations of snapshot copy. When using snapshot copy and write operations occur, the changed areas (writes) are saved in a separate area or partition on disk of disk storage specifically reserved for snapshot activity. Here the old value of the affected area or block can be saved in case the new block(s) are corrupted or to permit a fuzzy data image that can be used for a non-disruptive backup. Storage administrators must manage the number and currency of snapshots. Snapshots provide data protection from intrusion and data corruption but not from a failure on the device containing the source data copy.
CDP (Continuous Data Protection)
CDP or Journaling is another method to enable quick data recovery where every write and update operation is continuously written to another device that may or may not be the same as the primary device. With CDP, the secondary copy is a time-stamped continuum of data points providing an infinitely granular sequential history of write events. All write operations are queued to the secondary device, or the journal device, which may be disk or tape. Journals are typically kept as a continuous history for a few days covering the period of maximum likelihood for a data recovery action to occur. Journals are often used for databases and are especially good for protecting from intrusion and data corruption enabling restores to go back in time to a point precisely before the corruption occurred.
The newest emerging approach available for backup and recovery uses a patented technology called commonality factoring which greatly reduces the amount of data being backed up and therefore reduces the recovery time. Commonality factoring looks at changed data, similar to differential backup, but only sends the changed sub-file content rather than the entire file. Backup and recovery consumes memory, bandwidth and storage. Commonality factoring reduces all three and early estimates for certain applications indicate the savings could be as high as 90% reduction in the amount of data moved. Commonality factoring has the potential to fundamentally change the rules of the game for backup and recovery.
Data protection has become a critical IT discipline and businesses often choose the simplest approach after sustaining three years of downsizing and cutbacks. However the simplest approach may not provide the highest availability and severe business impact or loss may occur if recovery times are long. Today's IT environments are demanding a more comprehensive strategy for data protection, security and high-availability than ever before. The data protection solutions are improving and becoming available to deliver ultra-high availability thus increasing the probability for a business to survive most types of outages. This is critical since most businesses will not survive for long without IT. Businesses are trying to become resilient to machine and human imperfections such as intrusions, mistakes, accidents and cyber-terrorism as the price to pay for not implementing data protection can be fatal. There are many options, and all are better than no recovery strategy. The rules for backup and recovery are changing, are you prepared?
Fred Moore is President of Horison, Inc. (Boulder, CO).
|Printer friendly Cite/link Email Feedback|
|Title Annotation:||Disaster Recovery & Backup/Restore|
|Publication:||Computer Technology Review|
|Date:||Oct 1, 2005|
|Previous Article:||PCI Express switching and Remote I/O in the data center.|
|Next Article:||Interface considerations for tiered storage.|