Printer Friendly
The Free Library
14,380,416 articles and books
Member login
User name  
Password 
 
Join us Forgot password?

Assessing the foundation of long distance disaster recovery.


In theory, enterprise data protection strategies should be geared towards enabling the rapid recovery of business operations Business operations are those activities involved in the running of a business for the purpose of producing value for the stakeholders. Compare business processes. The outcome of business operations is the harvesting of value from assets  in the event of a wide range of failures. The continuum of failure types spans the range from a simple hardware component failure to a widespread disaster that may take out one or more data centers. Current recovery strategies such as RAID or other redundant components, I/O (Input/Output) The transfer of data between the CPU and a peripheral device. Every transfer is an output from one device and an input to another. See PC input/output.

I/O - Input/Output
 multi-pathing and local clustering for high availability Also called "RAS" (reliability, availability, serviceability) or "fault resilient," it refers to a multiprocessing system that can quickly recover from a failure. There may be a minute or two of downtime while one system switches over to another, but processing will continue.  purposes do a fairly good job of supporting rapid and, in some cases, even transparent recovery from simple local failures. But more widespread types of failures (due to acts of terrorism, forces of nature such as tornadoes or earthquakes, or events such as multi-state power outages This is a list of famous wide-scale power outages. 1965
  • The Northeast Blackout of 1965 on November 9, 1965.
1977
  • The infamous New York City Blackout of July 13-14, 1977, resulted in looting and rioting.
) raise the bar significantly to operational recovery.

All recovery options must start with the data. The obvious solution to the problem of widespread disasters has been to locate one or more copies of the operational data at one or more sites located far enough apart so as not to be affected by any single disaster. Although there are a variety of other issues that must be addressed to ensure rapid and reliable recover--such as human resources The fancy word for "people." The human resources department within an organization, years ago known as the "personnel department," manages the administrative aspects of the employees. , proven processes, communications and access to appropriate equipment--the foundation for viable recovery start with the data. Following is what is required to lay the foundation for enterprise data recoverability.

Current Long Distance Disaster Recovery Solutions

Traditionally, there have been two approaches to this problem. Tape-based backup solutions store a point-in-time copy of operational data on inexpensive media that can be moved to a remote location and safely stored. If recovery is required, data can be restored from tape to disk at a recovery site and then used to restart To resume computer operation after a planned or unplanned termination. See boot, warm boot and checkpoint/restart.  critical applications. The other approach has been to use asynchronous Refers to events that are not synchronized, or coordinated, in time. The following are considered asynchronous operations. The interval between transmitting A and B is not the same as between B and C. The ability to initiate a transmission at either end.  replication In database management, the ability to keep distributed databases synchronized by routinely copying the entire database or subsets of the database to other servers in the network.

There are various replication methods.
 to continuously maintain a relatively up-to-date copy of operational data on disk at a remote site. Any changes to operational data at the local site are sent across a network of some type to also be applied at the remote site. If recovery is required, servers and applications at the remote site can be brought up using this copy.

Each of these solutions introduces major operational trade offs, however. Tape-based backup solutions suffer from three basic problems: backup windows, data loss and data integrity/recoverability.

[FIGURE 1 OMITTED]

The backup window is the amount of time an application must be off line for a backup to be performed. In today's globalized environment, 7X24 availability is a strong requirement, leaving literally no time when critical applications can be down--even for just a minute or two. Even the use of snapshot (1) A saved copy of memory including the contents of all memory bytes, hardware registers and status indicators. It is periodically taken in order to restore the system in the event of failure.

(2) A saved copy of a file before it is updated.
 technology still requires the application to be brought down, causing operational impact.

Tape-based backups are also point-in-time copies, current to the point when the backup was taken. Given the backup window problem, backups are not taken very frequently. Most environments are backed up only once a day at the most, and many are only backed up every several days or on a weekly basis. If recovery is required, any changes to the operational data made since the last backup are lost.

Finally, tape media integrity can introduce non-deterministic recovery problems. There is some risk in the data conversion that takes place when data is backed up from disk to tape, and then restored from tape back to disk. Tapes must be managed with some manual labor at some point throughout the off-site storage process, and this introduces the possibility of human error (tapes could be misplaced mis·place  
tr.v. mis·placed, mis·plac·ing, mis·plac·es
1.
a. To put into a wrong place: misplace punctuation in a sentence.

b.
 or lost). Tapes also wear out more quickly than disks, particularly if they are being used over and over for backup purposes. Research done by Gartner Group (company) Gartner Group - One of the biggest IT industry research firms.

Address: Connecticut, USA.
 in 2H03 indicates that one in four tapes have unrecoverable files. The unfortunate aspect of this is that it is not clear if a file is unrecoverable until recovery is attempted, and at that point it's too late to do anything about it if it can't be recovered.

Replication did a good job of addressing these issues. Once installed, replication can run continuously in the background, effectively removing the backup window issue. Data loss is minimized relative to tape since writes are continuously being sent across the network to the remote site. In almost all cases, replication will allow recovery from much more current data than tape. And because data at the remote site is stored on disk in native disk format (not converted to a tape format for storage on tape) there is a much higher chance of data being recoverable when it is required.

Historically, replication has been significantly more expensive than tape due to the price differential between disk and tape hardware. But newer disk technologies such as ATA (1) (AT Attachment) The specification for IDE drives. See IDE.

(2) See analog telephone adapter.

ATA - Advanced Technology Attachment
 are closing the cost gap while at the same time providing more reliable solutions with faster recovery. The disk-to-disk data protection trend is definitely taking hold. In December 2003, an Enterprise Storage Group survey showed that 83% of enterprise users and 59% of mid-tier users have either already deployed or state that they will purchase some form of disk-based data protection technology within the next 24 months. Replication is one form of disk-to-disk data protection that will benefit from this trend.

The first form of replication available was synchronous Refers to events that are synchronized, or coordinated, in time. For example, the interval between transmitting A and B is the same as between B and C, and completing the current operation before the next one is started are considered synchronous operations. Contrast with asynchronous.  replication. In synchronous replication, widely deployed for sites within 30-50 miles of each other, writes at both sites must complete before a write acknowledgement is sent back to the critical application(s). The write latency (1) The time between initiating a request in the computer and receiving the answer. Data latency may refer to the time between a query and the results arriving at the screen or the time between initiating a transaction that modifies one or more databases and its completion.  between sites then becomes a constraint Constraint

A restriction on the natural degrees of freedom of a system. If n and m are the numbers of the natural and actual degrees of freedom, the difference n - m is the number of constraints.
 limiting the viable distance of synchronous replication configurations. To address the concerns over widespread disasters in this post-9/11 era, synchronous replication is not sufficient, in most cases, because of distance limitations.

By its very nature, asynchronous replication is designed for long-distance configurations. In asynchronous replication, the write to the remote site is decoupled from the write to the local site using a local queue Pronounced "Q." A temporary holding place for data. See queuing, message queue and print queue.

(programming) queue - A first-in first-out data structure used to sequence objects. Objects are added to the tail of the queue ("enqueued") and taken off the head ("dequeued").
 of some sort. This means that critical application performance is unfettered by the distance between sites, opening up the ability to support configurations spanning literally thousands of miles. One key point to understand about asynchronous replication is that the asynchronous nature of the write does mean that there can be some lag time between writes being applied at the local and remote sites, depending on network bandwidth and distance. This lag time is generally measured on the order of seconds or minutes, however, as opposed to tape where the "lag" time (the last backup point) is often measured in days.

Available Implementations

Asynchronous replication is available in a variety of forms. Storage-based solutions enable asynchronous replication between enterprise arrays from the same vendor and can support heterogeneous Not the same. Contrast with homogeneous.

heterogeneous - Composed of unrelated parts, different in kind.

Often used in the context of distributed systems that may be running different operating systems or network protocols (a heterogeneous network).
 servers, but generally force vendor lock-in In economics, vendor lock-in, also known as proprietary lock-in, customer lock-in, lock-in is where a customer is dependent on a vendor for products and services and cannot move to another vendor without substantial switching costs, real and/or perceived.  at the array level, driving high cost. Server-based solutions support mirroring between servers running the same operating system operating system (OS)

Software that controls the operation of a computer, directs the input and output of data, keeps track of files, and controls the processing of computer programs.
 and can support heterogeneous arrays, but can also impose vendor lock-in at the volume manager or file system level. Depending on the vendor, architectural limitations may also impose performance and scalability issues. Appliance-based solutions support both heterogeneous servers and storage, but force new hardware purchases (appliances) and can introduce performance bottlenecks that limit scalability at the appliance level. Generally, all of these types of solutions support the ability to replicate rep·li·cate
v.
1. To duplicate, copy, reproduce, or repeat.

2. To reproduce or make an exact copy or copies of genetic material, a cell, or an organism.

n.
A repetition of an experiment or a procedure.
 data over IP-based networks. If long distance replication is required, it is important to understand the features and limitations of each approach to determine if it is appropriate for your environment.

There are vendors currently developing network device-based solutions that are slated for availability in 2005. These solutions are expected to offer an operating system-agnostic solution with no requirement for any host-based components. The availability of switch-based options will offer users increased choice in how they deploy replication.

Key considerations when evaluating an asynchronous replication solution include:

* Does it support heterogeneous solutions that meet your definition of "heterogeneous"? Is ATA and other cost-effective disk supported?

* Does it have the flexibility to support multiple storage architectures (DAS, SAN and NAS (1) See network access server.

(2) (Network Attached Storage) A specialized file server that connects to the network. A NAS device contains a slimmed-down operating system and a file system and processes only I/O requests by supporting the popular
), preserve existing investments and leave future investment options open?

* Does it impose any form of vendor lock-in at the operating system, volume manager, file system, application, server or storage level?

* What are the data integrity and scalability limitations (if any) of the architectural implementation?

* Does the vendor offer some way to evaluate the performance of their solution in your real environment prior to purchase?

* What are the operational implications, including cost, of installing the solution in your existing environment?

Asynchronous replication can be an integral part of an overall disaster recovery solution that needs to solve the "widespread disaster" problem.

Understanding the capabilities of asynchronous replication will ensure it can be most effectively applied in a given environment, providing the best solution for the money.

Eric Burgener is vice president of marketing at Topio (Santa Clara Santa Clara, city, Cuba
Santa Clara (sän`tä klä`rä), city (1994 est. pop. 217,000), capital of Villa Clara prov., central Cuba.
, CA)

www.topio.com
COPYRIGHT 2004 West World Productions, Inc.
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2004, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.

 Reader Opinion

Title:

Comment:



 

Article Details
Printer friendly Cite/link Email Feedback
Title Annotation:Regulatory Compliance
Author:Burgener, Eric
Publication:Computer Technology Review
Geographic Code:1USA
Date:May 1, 2004
Words:1453
Previous Article:Leveraging existing resources to archive e-mail.(Regulatory Compliance)
Next Article:D2D2T: is it quite right for you?(Regulatory Compliance, Disk-to-Disk-to-Tape)
Topics:



Related Articles
Impact of new regulations on the data protection requirements for the financial industry.(Storage Networking)
McDATA improves security for data storage with addition of SANtegrity Security assessment.
Disaster recovery: regulatory issues.(Disaster Recovery & Backup/Restore)
Assessing your storage and backup for regulatory compliance.(Regulatory Compliance)
Overcoming recovery barriers: rapid and reliable system and data recovery.(Data Protection)
Building compliance, block by block.(Storage Management)(Information Lifecycle Management )
Looking back.(Calendar)
Peace of mind: disaster recovery plans can keep your business alive.(DISASTER PLANNING)
The push for continuous data protection.(Special Section)
Successful business continuity strategies: how to conduct business as usual in unusual times.(Business of Technology)(Company overview)

Terms of use | Copyright © 2009 Farlex, Inc. | Feedback | For webmasters | Submit articles