How to evaluate a recovery management solution.
Last year's hurricane season Hurricane season refers to a period in a year when hurricanes usually form. For more information see: Tropical cyclone#Times of formation.
For a lists of past seasons, see:
tr.v. dev·as·tat·ed, dev·as·tat·ing, dev·as·tates
1. To lay waste; destroy.
2. To overwhelm; confound; stun: was devastated by the rude remark. and in the boardrooms across America senior executives have tasked IT professionals with creating and implementing solutions that will ensure mission critical data is continuously protected. IT professionals across all industries have come to realize that even with thorough planning the ability to restore data and bring systems back online quickly with zero loss of data can be an overwhelming task.
The complexity and cost of solving data protection and recovery issues today is rooted in the fact that it takes multiple tools to deliver a solution that still doesn't meet the new requirements of today's data center. This leaves IT professionals spending countless hours trying to integrate disparate tools and manually recovering data in an attempt to build a real-time infrastructure to support their enterprise. Because there are a variety of protection and recovery tools to choose from, it is crucial to arrive at core metrics metrics Managed care A popular term for standards by which the quality of a product, service, or outcome of a particular form of Pt management is evaluated. See TQM. to enable IT management to choose the best recovery management solution for their environment.
Recovery management is defined as the act, manner, or practice of managing a return to normal conditions
The Evaluation Metrics
In order to evaluate a recovery management solution, one must have properly defined metrics. Data recovery service level agreements (SLAs) are traditionally measured by recovery time objectives (RTO (Recovery Time Objective) The amount of time a computer system or application can stop functioning before it is considered intolerable to the enterprise. It can be computed to be from seconds to days, depending on how critical the application is to the organization. ) and recovery point objectives (RPO RPO Recruitment Process Outsourcing
RPO Recovery Point Objective (disaster recovery)
RPO Royal Philharmonic Orchestra
RPO Rochester Philharmonic Orchestra
RPO Representative Poetry Online
RPO Railway Post Office ). RTO defines the time required to recover a set unit of missing data, and RPO defines the potential data loss--the time gap between the most recent application consistent recovery point and the physical failure point. RTO and RPO are good objectives for setting SLAs with regard to data recovery, but they are not sufficient for measuring a recovery management solution. For example, a snapshot (1) A saved copy of memory including the contents of all memory bytes, hardware registers and status indicators. It is periodically taken in order to restore the system in the event of failure.
(2) A saved copy of a file before it is updated. tool may recover a server's data in minutes; however, a snapshot tool does not have the ability to recover a granular granular /gran·u·lar/ (gran´u-lar) made up of or marked by presence of granules or grains.
1. Composed or appearing to be composed of granules or grains.
2. object. When one needs to locate a lost object from snapshots, the process is manual and the RTO could be many hours. In this case, RTO has nothing to do with the tool per se, inasmuch as in·as·much as
1. Because of the fact that; since.
2. To the extent that; insofar as.
1. since; because
2. it is entirely dependent on the manual process. While a data replication tool is capable of delivering zero or near zero RPO when a server fails, it is not capable of recovering business data if the data is corrupted, and the corrupted data is replicated.
As a result of examples like these, IT requires more comprehensive metrics to properly evaluate a recovery management solution. There are ten core metrics that fall into three categories--Recovery Time Characteristics, Recovered Data Characteristics, and Recovery Scalability Characteristics. The following chart explores these metrics in detail.
Recovery Time Characteristics
Recovery Time Objective (RTO). RTO defines how fast the solution is capable of recovering the data and application it is designed to protect. The RTO of most recovery solutions depends on whether or not a data verification process is needed during the recovery, and the size of the data set to be recovered. A solution that provides instant recovery regardless of data set size greatly reduces or eliminates business down time.
Recovery Time Granularity The degree of modularity of a system. More granularity implies more flexibility in customizing a system, because there are more, smaller increments (granules) from which to choose. (RTG RTG
abbreviation for ready to go; used in medical records. ). RTG determines the time spacing for selecting a recovery point; this is an important parameter for recovering from logical failures. Unlike RPO, which determines the last recovery point prior to a physical failure, RTG defines recovery point selection options prior to the most recent recovery point.
Recovered Data Characteristics
Recovery Point Objectives (RPO). RPO defines the minimum time gap between the last failure and the point-in-time where data can be recovered. The smaller the gap, the less data is lost.
Recovery Object Granularity (ROG ROG Roger
ROG Rouge (Everquest)
ROG Republic of Gamers
ROG Royal Observatory Greenwich (UK)
ROG Reactive Organic Gas
ROG Receipt Of Goods
ROG Rise Off Ground ). ROG measures the level of objects that a solution is capable of recovering. For instance, object granularity may be a storage volume, a file system, a database table, a transaction, a mailbox A simulated mailbox in the computer that holds e-mail messages. Mailboxes are stored on disk as a file of messages, a database of messages or as an individual file for each message. The standard mailboxes are usually In, Out, Trash and Junk (Spam). , an email message, etc.
Recovery Event Granularity (REG). REG measures the capability of a recovery management solution to track events and to recover a failed application or missing data to a specific event.
Recovery Consistency Characteristics (RCC RCC - An extensible language. ). RCC defines the usability of recovered data by the associated application. RCC of a recovery management solution depends not only on how data is captured and stored, but also on the data type being protected.
Recovery Scalability Characteristics
Recovery Location Scope (RLS Restless legs syndrome (RLS)
A disorder in which the patient experiences crawling, aching, or other disagreeable sensations in the calves that can be relieved by movement. RLS is a frequent cause of difficulty falling asleep at night. ). RLS defines where the protected data must be stored when recovery takes place. Most data protection solutions are designed such that the protected data is stored locally. Robust recovery management solutions can protect and recover data over LAN (Local Area Network) A communications network that serves users within a confined geographical area. The "clients" are the user's workstations typically running Windows, although Mac and Linux clients are also used. and WAN.
Recovery Service Scalability (RSS (Really Simple Syndication) A syndication format that was developed by Netscape in 1999 and became very popular for aggregating updates to blogs and the news sites. RSS has also stood for "Rich Site Summary" and "RDF Site Summary. ). RSS is measured by service (number of applications or data sets the solution is capable of protecting) and capacity (the maximum size of the data it can store).
Recovery Service Resiliency (RSR RSR Regular sinus rhythm, see there ). RSR defines how well a recovery management solution tolerates failures. This includes system and data failures as well as data security authorization. For instance, if a system component fails, can the solution continue such that an application would be continuously protected? And can it also self-recover from any internal failures?
Recovery Management Cost (RMC RMC Royal Military College
RMC Radio Monte Carlo
RMC Randolph-Macon College (Ashland, Virginia)
RMC Regional Medical Center
RMC Robert Morris College (Illinois)
RMC Rocky Mountain College ). RMC defines the cost efficiency of a recovery management solution. Data services such as backup, snapshots, replication, policy management, and others are traditionally separate tools with very different architectures. For better RMC, find a consolidated recovery management platform which simplifies IT administration by reducing the amount of tools necessary to manage data. For further efficiency, utilize a solution which reduces the storage and network resources necessary to protect and recover data.
Recovery Management Scorecard
As we discussed earlier, there are a myriad of protection and recovery tools to choose from so it made sense to come up with the core metrics necessary to enable IT management to evaluate which solutions would best fit their environment. Now that we have an understanding of the "Top Ten" metrics necessary to evaluate a recovery management solution, let's apply these metrics to solutions that exist in the market today. Practical application of the metrics enables not only a solidified so·lid·i·fy
v. so·lid·i·fied, so·lid·i·fy·ing, so·lid·i·fies
1. To make solid, compact, or hard.
2. To make strong or united.
v.intr. understanding of the metrics, but also a better comprehension of available solutions and how they compare.
In most industries today, the service level agreements for data protection and recovery have moved to a point where there is no time for backup windows, no tolerance for data loss, and very little margin for recovery downtime. Add to that the increased business demands for disaster recovery of mission and business critical data, along with new compliance requirements Compliance requirements are a series of directives established by United States Federal government agencies that summarize hundreds of Federal laws and regulations applicable to Federal assistance (also known as Federal aid or Federal funds). and you can quickly determine that the legacy tools of data protection and recovery are ill-equipped to handle today's requirements. The ten metrics of recovery management above enable IT management to apply thoughtful consideration to their own internal business requirements against the products they are evaluating.
Marty Ward is vice president of marketing and products at Asempra Technologies in Sunnyvale, Calif.
Transaction- Recovery Management level Recovery Traditional Block-level Requirement Management Backup Replication Recovery Time RTO Sec to Minutes Hours to Days Min to Hours Objective Recovery Time RTG Seconds 24 hours None Granularity Recovery Point RPO Near Zero 24 hours Near Zero or Objective Minutes Recovery Object ROG Transaction File Storage blocks Granularity Recovery Event REG Fine-grained- Coarse-manual Coarse-manual Granularity consistency checkpoints checkpoints events Recovery RCC Strong Strong Crash consistent Consistency Consistency Consistency only Characteristic Recovery Location RLS LAN & WAN LAN-only LAN & WAN Scope Recovery Service RSS High Medium Low Scalability Recovery Service RSR High Medium Low Resiliency Recovery RMC $ $$$ $$$$$ Management TCO Recovery Management Block-level Continuous File-level Continuous Requirement Data Protection Data Protection Recovery Time RTO Min to Hours Min to Hours Objective Recovery Time RTG Sec to Hours Sec to Hours Granularity Recovery Point RPO Near Zero Near Zero Objective Recovery Object ROG Storage blocks File Granularity Recovery Event REG Coarse-manual Coarse-manual Granularity checkpoints checkpoints Recovery RCC Crash consistent only Crash consistent only Consistency Characteristic Recovery Location RLS LAN-only LAN-only Scope Recovery Service RSS Medium Low Scalability Recovery Service RSR Medium Medium Resiliency Recovery RMC $$$ $$ Management TCO Table. 1