Printer Friendly

How to evaluate a recovery management solution.

Last year's hurricane season started a national discussion about how prepared the nation is to cope with a major disaster. Business in the Gulf Coast was devastated and in the boardrooms across America senior executives have tasked IT professionals with creating and implementing solutions that will ensure mission critical data is continuously protected. IT professionals across all industries have come to realize that even with thorough planning the ability to restore data and bring systems back online quickly with zero loss of data can be an overwhelming task.

The complexity and cost of solving data protection and recovery issues today is rooted in the fact that it takes multiple tools to deliver a solution that still doesn't meet the new requirements of today's data center. This leaves IT professionals spending countless hours trying to integrate disparate tools and manually recovering data in an attempt to build a real-time infrastructure to support their enterprise. Because there are a variety of protection and recovery tools to choose from, it is crucial to arrive at core metrics to enable IT management to choose the best recovery management solution for their environment.

Recovery management is defined as the act, manner, or practice of managing a return to normal conditions. In the IT industry the definition is more specific--it describes how organizations return systems, applications, and data back to "normal" conditions. When unexpected failures occur, the goal is to bring IT systems back to its most recent consistent state and to restore business operations within minutes, to reduce downtime, and prevent significant financial loss.

The Evaluation Metrics

In order to evaluate a recovery management solution, one must have properly defined metrics. Data recovery service level agreements (SLAs) are traditionally measured by recovery time objectives (RTO) and recovery point objectives (RPO). RTO defines the time required to recover a set unit of missing data, and RPO defines the potential data loss--the time gap between the most recent application consistent recovery point and the physical failure point. RTO and RPO are good objectives for setting SLAs with regard to data recovery, but they are not sufficient for measuring a recovery management solution. For example, a snapshot tool may recover a server's data in minutes; however, a snapshot tool does not have the ability to recover a granular object. When one needs to locate a lost object from snapshots, the process is manual and the RTO could be many hours. In this case, RTO has nothing to do with the tool per se, inasmuch as it is entirely dependent on the manual process. While a data replication tool is capable of delivering zero or near zero RPO when a server fails, it is not capable of recovering business data if the data is corrupted, and the corrupted data is replicated.

As a result of examples like these, IT requires more comprehensive metrics to properly evaluate a recovery management solution. There are ten core metrics that fall into three categories--Recovery Time Characteristics, Recovered Data Characteristics, and Recovery Scalability Characteristics. The following chart explores these metrics in detail.


Recovery Time Characteristics

Recovery Time Objective (RTO). RTO defines how fast the solution is capable of recovering the data and application it is designed to protect. The RTO of most recovery solutions depends on whether or not a data verification process is needed during the recovery, and the size of the data set to be recovered. A solution that provides instant recovery regardless of data set size greatly reduces or eliminates business down time.

Recovery Time Granularity (RTG). RTG determines the time spacing for selecting a recovery point; this is an important parameter for recovering from logical failures. Unlike RPO, which determines the last recovery point prior to a physical failure, RTG defines recovery point selection options prior to the most recent recovery point.

Recovered Data Characteristics

Recovery Point Objectives (RPO). RPO defines the minimum time gap between the last failure and the point-in-time where data can be recovered. The smaller the gap, the less data is lost.

Recovery Object Granularity (ROG). ROG measures the level of objects that a solution is capable of recovering. For instance, object granularity may be a storage volume, a file system, a database table, a transaction, a mailbox, an email message, etc.

Recovery Event Granularity (REG). REG measures the capability of a recovery management solution to track events and to recover a failed application or missing data to a specific event.

Recovery Consistency Characteristics (RCC). RCC defines the usability of recovered data by the associated application. RCC of a recovery management solution depends not only on how data is captured and stored, but also on the data type being protected.

Recovery Scalability Characteristics

Recovery Location Scope (RLS). RLS defines where the protected data must be stored when recovery takes place. Most data protection solutions are designed such that the protected data is stored locally. Robust recovery management solutions can protect and recover data over LAN and WAN.

Recovery Service Scalability (RSS). RSS is measured by service (number of applications or data sets the solution is capable of protecting) and capacity (the maximum size of the data it can store).

Recovery Service Resiliency (RSR). RSR defines how well a recovery management solution tolerates failures. This includes system and data failures as well as data security authorization. For instance, if a system component fails, can the solution continue such that an application would be continuously protected? And can it also self-recover from any internal failures?

Recovery Management Cost (RMC). RMC defines the cost efficiency of a recovery management solution. Data services such as backup, snapshots, replication, policy management, and others are traditionally separate tools with very different architectures. For better RMC, find a consolidated recovery management platform which simplifies IT administration by reducing the amount of tools necessary to manage data. For further efficiency, utilize a solution which reduces the storage and network resources necessary to protect and recover data.

Recovery Management Scorecard

As we discussed earlier, there are a myriad of protection and recovery tools to choose from so it made sense to come up with the core metrics necessary to enable IT management to evaluate which solutions would best fit their environment. Now that we have an understanding of the "Top Ten" metrics necessary to evaluate a recovery management solution, let's apply these metrics to solutions that exist in the market today. Practical application of the metrics enables not only a solidified understanding of the metrics, but also a better comprehension of available solutions and how they compare.


In most industries today, the service level agreements for data protection and recovery have moved to a point where there is no time for backup windows, no tolerance for data loss, and very little margin for recovery downtime. Add to that the increased business demands for disaster recovery of mission and business critical data, along with new compliance requirements and you can quickly determine that the legacy tools of data protection and recovery are ill-equipped to handle today's requirements. The ten metrics of recovery management above enable IT management to apply thoughtful consideration to their own internal business requirements against the products they are evaluating.

Marty Ward is vice president of marketing and products at Asempra Technologies in Sunnyvale, Calif.
Recovery Management level Recovery Traditional Block-level
Requirement Management Backup Replication

Recovery Time RTO Sec to Minutes Hours to Days Min to Hours
Recovery Time RTG Seconds 24 hours None
Recovery Point RPO Near Zero 24 hours Near Zero or
Objective Minutes
Recovery Object ROG Transaction File Storage blocks
Recovery Event REG Fine-grained- Coarse-manual Coarse-manual
Granularity consistency checkpoints checkpoints
Recovery RCC Strong Strong Crash consistent
Consistency Consistency Consistency only
Recovery Location RLS LAN & WAN LAN-only LAN & WAN
Recovery Service RSS High Medium Low
Recovery Service RSR High Medium Low
Recovery RMC $ $$$ $$$$$
Management TCO

Recovery Management Block-level Continuous File-level Continuous
Requirement Data Protection Data Protection

Recovery Time RTO Min to Hours Min to Hours
Recovery Time RTG Sec to Hours Sec to Hours
Recovery Point RPO Near Zero Near Zero
Recovery Object ROG Storage blocks File
Recovery Event REG Coarse-manual Coarse-manual
Granularity checkpoints checkpoints
Recovery RCC Crash consistent only Crash consistent only
Recovery Location RLS LAN-only LAN-only
Recovery Service RSS Medium Low
Recovery Service RSR Medium Medium
Recovery RMC $$$ $$
Management TCO

Table. 1
COPYRIGHT 2006 West World Productions, Inc.
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2006, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.

Article Details
Printer friendly Cite/link Email Feedback
Title Annotation:Disaster Recovery & Backup/Restore
Author:Ward, Marty
Publication:Computer Technology Review
Geographic Code:1USA
Date:Mar 1, 2006
Previous Article:ICM: beyond just storage.
Next Article:NAS virtualization simplifies file storage management.

Related Articles
Sofftek DR Manager new disaster recovery software. (VIRUS NOTES).
Prepare for the worst: portable tape drives put the "recovery" in mobile disaster recovery.
Plan for the worst, hope for the best: backup and disaster recovery.
Preparing for disaster with an effective business continuity strategy: overcoming potential dangers to your information infrastructure.
TCO analysis: where D2D fits--part 2.
Overcoming recovery barriers: rapid and reliable system and data recovery.
Peace of mind: disaster recovery plans can keep your business alive.
Understanding the new generation of data protection solutions.
Personal disaster recovery software: an essential part of business disaster recovery plans.
Preparing for the unthinkable: disaster recovery.

Terms of use | Privacy policy | Copyright © 2018 Farlex, Inc. | Feedback | For webmasters