Printer Friendly
The Free Library
14,558,173 articles and books
Member login
User name  
Password 
 
Join us Forgot password?

The paradox of distance in business continuance.


In most enterprises, mission-critical applications are fundamental to the core business. Failure of those applications can be potentially disastrous to the business and, in some cases, can be terminal. This is why protecting high-value data and delivering 24X7X365 business continuity is by all counts the top objective of any IT organization. Meeting this goal, however, represents a number of challenges as the traditional data protection strategies are complex, expensive and hard to manage, and require extensive additional infrastructure. Distances between primary and secondary sites are also limited with many data protection technologies, since the primary application performance can get impacted. Protection against regional disasters and power outrages and numerous regulations are driving the need for extended distance disaster protection.

In September 2002, the Federal Reserve, the Securities and Exchange Commission (SEC), and the Office of the Comptroller of the Currency The Office of the Comptroller of the Currency (or OCC) was established by the National Currency Act of 1863 and serves to charter, regulate, and supervise all national banks and the federal branches and agencies of foreign banks in the United States.  (OCC OCC

See: Options Clearing Corporation


OCC

See Options Clearing Corporation (OCC).
) jointly published the "Draft Interagency in·ter·a·gen·cy  
adj.
Involving or representing two or more agencies, especially government agencies.
 White Paper on Sound Practices to Strengthen the Resilience resilience (r·zilˑ·yens),
n
 of the U.S. Financial System," in direct response to the terrorist attacks of September 11, 2001 (www.sec.gov/rules/concept/34-46432.htm). This outlined "preliminary conclusions with respect to the factors affecting the resilience of critical markets and activities in the U.S. financial system; sound practices to strengthen financial system resilience; and an appropriate timetable for implementing these sound practices." The agencies solicited comments on the draft white paper, and received many letters from leaders of financial firms, industry associations, technology companies, and others (www.sec.gov/rules/concept/s73202.shtml).

[GRAPHIC OMITTED]

Probably the most controversial aspect of this interagency white paper was the suggestion that those financial institutions that "play significant roles in critical financial markets" must have fully operational recovery sites located at least 200-300 miles away from the primary data center site. In addition to protection against regional disasters, there could be other reasons for deploying systems across multiple data centers in different geographic locations such as providing "local access" to users spread across a wide geographic area or to take advantage of existing IT resource skills and infrastructure in the companies geographically dispersed dis·perse  
v. dis·persed, dis·pers·ing, dis·pers·es

v.tr.
1.
a. To drive off or scatter in different directions: The police dispersed the crowd.

b.
 data centers.

Is Only the Mainframe Data Critical?

Windows server See Windows Server 2008, Windows Server 2003, Windows Home Server, Windows 2000 and Windows NT.  operating systems Operating systems can be categorized by technology, ownership, licensing, working state, usage, and by many other characteristics. In practice, many of these groupings may overlap.  have become accepted in high-end, mission-critical applications and as a result requirements for disaster tolerance and business continuance The adjournment or postponement of an action pending in a court to a later date of the same or another session of the court, granted by a court in response to a motion made by a party to a lawsuit.  for these systems is becoming more and more important. Microsoft offers a robust clustering technology as part of the windows operating system operating system (OS)

Software that controls the operation of a computer, directs the input and output of data, keeps track of files, and controls the processing of computer programs.
 (MSCS See Microsoft Cluster Server. ); however this is generally deployed for failure protection within a campus or data center environment.

The goal is to ensure that there is no single point of failure. In other words Adv. 1. in other words - otherwise stated; "in other words, we are broke"
put differently
, the loss of a single component or complete site failure cannot cause applications to become unavailable. In extreme cases, a complete site can fail, either due to a total loss of power or through a natural or artificial disaster. More and more businesses are recognizing the value of deploying mission critical solutions across multiple geographically dispersed sites A site selected to reduce concentration and vulnerability by its separation from other military targets or a recognized threat area. .

A new data protection approach utilizing an intelligent replication appliance coupled with windows clustering technology can be used to create a highly resilient infrastructure across data centers that are thousands of miles apart, protecting applications automatically against all types of failures as well as local or regional disasters. In addition, this new network-based data protection architecture can provide unique features such as bandwidth optimization and support for heterogeneous, storage and server environments.

Replication Requirements Unique to Clusters

Clusters are defined as a minimum of two or more computer systems that together provide a highly available and highly scalable platform for hosting applications. MSCS clusters host applications that use failover to achieve high availability Also called "RAS" (reliability, availability, serviceability) or "fault resilient," it refers to a multiprocessing system that can quickly recover from a failure. There may be a minute or two of downtime while one system switches over to another, but processing will continue. . The failover mechanism is automatic and the configuration ensures that loss of one site does not cause a loss of the application.

The challenge with making a multi-site MSCS configuration to work the replication infrastructure has to solve several specific issues:

* Making sure that multiple sites have independent copies of the same data

* Making sure that each site has its own copy of the data so that if one site is lost, the applications can continue

* Ensuring that changes to the data at one site are replicated in a consistent manner to the other sites so that in the event that the first site fails, the changes are available in the second site so that the applications will run uninterrupted

* Ensuring that the data between two sites stays consistent at all times

* Replicating data across sites in both directions to ensure failover-failback

Geographically dispersed cluster configurations should be implemented, especially around storage and data replication components of the solution. The system should ensure various failures will not result in data corruption Data corruption refers to errors in computer data that occur during transmission or retrieval, introducing unintended changes to the original data. Computer storage and transmission systems use a number of measures to provide data integrity, the lack of errors.  and ensure that the cluster integrity is always maintained.

The most difficult challenge, specifically with geographically dispersed clusters, is to be able to distinguish between a communication failure between sites where the other site is still alive and a site failure where it is no longer available to run applications.

The MSCS architecture handles this issue using a single quorum A majority of an entire body; e.g., a quorum of a legislative assembly.

A quorum is the minimum number of people who must be present to pass a law, make a judgment, or conduct business.
 resource in the cluster that is used as the tie-breaker to avoid split-brain scenarios. A split-brain scenario can happen in the above case when all of the network communication links between two or more cluster nodes fail. In these cases, the cluster may be split into two or more partitions that cannot communicate with each other.

Using an Intelligent Replication Appliance to Set up a Geographically Dispersed Cluster

A geographically dispersed cluster generally deploys multiple storage arrays, with a minimum of one at each site. The replication system is configured con·fig·ure  
tr.v. con·fig·ured, con·fig·ur·ing, con·fig·ures
To design, arrange, set up, or shape with a view to specific applications or uses:
 to replicate the application data in both directions so that, in the event of site failure, the application data is preserved so that the failover servers can continue to provide the services and applications. In addition, the consistency of the quorum volume should be maintained in a synchronous Refers to events that are synchronized, or coordinated, in time. For example, the interval between transmitting A and B is the same as between B and C, and completing the current operation before the next one is started are considered synchronous operations. Contrast with asynchronous.  manner to guarantee operations of the MSCS cluster independent of any type of failures.

Synchronous replication is used for the quorum volume, which means that any data written by MSCS on one node at one site will not complete until the change has been made on the other site.

Asynchronous Refers to events that are not synchronized, or coordinated, in time. The following are considered asynchronous operations. The interval between transmitting A and B is not the same as between B and C. The ability to initiate a transmission at either end.  replication is used for the data volumes, which means that if a change is made to the data on one site, that change will be replicated to the second site. It is important, however, that the consistency of the data volumes is maintained. This means that the write order fidelity is guaranteed by the replication system, and that the remote data is always consistent. This is important since most applications can recover from crash consistent states but very few (if any) can recover from out of order I/O (Input/Output) The transfer of data between the CPU and a peripheral device. Every transfer is an output from one device and an input to another. See PC input/output.

I/O - Input/Output
 sequences, whereby the application may be totally unusable.

A Cost-Effective, Less Complex Alternative

Intelligent replication appliances, coupled with clustering configurations, effectively address the cost, complexity and management issues that have limited traditional data protection solutions, simplifying infrastructures while extending application protection efficiently over long distances. The ability to use a network-based appliance to enable low-cost connectivity, bandwidth optimization and long distance replication services allows IT organizations to deliver 24X7X365 availability of business information (with dramatic savings in operational costs) and ensures that information will be immediately available in the event of a complete or partial site failure.

Bandwidth Optimization: Using intelligent bandwidth reduction technologies, appliances can deliver unprecedented reduction in bandwidth requirements Bandwidth requirements (communications)

The channel bandwidths needed to transmit various types of signals, using various processing schemes. Every signal observed in practice can be expressed as a sum (discrete or over a frequency continuum) of sinusoidal
. This enables the system to dramatically reduce WAN costs, particularly over long distances.

Bi-Directional Replication Over Existing Infrastructures: The appliance can enable bi-directional use across heterogeneous server and storage platforms, with guaranteed data consistency Data consistency summarizes the validity, accuracy, usability and integrity of related data between applications and across the IT enterprise. This ensures that each user observes a consistent view of the data, including visible changes made by the user's own transactions and  across multiple servers and storage platforms in the event of any possible failure or disaster.

Protection for the Entire Data Center: Using an intelligent replication appliance, users can protect the transactional data, multi-tiered applications and other business-important information in a data center (including operating systems, working files and e-mail) to bring point-in-time protection of all applications for end-to-end immediate recovery in case of a failure.

We have entered a time where cost-effective, intelligent replication solutions can help protect data centers located thousands of miles apart, automatically and cost effectively.

Mehran Hadipour is vice president of marketing for Kashya, Inc. (San Jose San Jose, city, United States
San Jose (sănəzā`, săn hōzā`), city (1990 pop. 782,248), seat of Santa Clara co., W central Calif.; founded 1777, inc. 1850.
, CA)

www.kashya.com
COPYRIGHT 2004 West World Productions, Inc.
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2004, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.

 Reader Opinion

Title:

Comment:



 

Article Details
Printer friendly Cite/link Email Feedback
Title Annotation:Data Protection
Author:Hadipour, Mehran
Publication:Computer Technology Review
Geographic Code:1USA
Date:Nov 1, 2004
Words:1359
Previous Article:E-mail authentication slams spam.(Data Protection)
Next Article:Network file virtualization.(Data Protection)
Topics:



Related Articles
NETWORK APPLIANCE REDEFINES STORAGE LANDSCAPE WITH NEW BUSINESS CONTINUANCE SOLUTIONS.
Sun Microsystems and Nortel Networks Enterprise Continuity solution. (Top Technology Showcase).
Over there: revisiting the mid-range storage market in Europe. (Business of Technology).(Buyers Guide)
Using data backup and restore services to achieve compliance.(Backup/Restore)
Preparing for disaster with an effective business continuity strategy: overcoming potential dangers to your information infrastructure.(Disaster...
High availability WAN Clusters.(Disaster Recovery & Backup/Restore)(Wide area networks)
McDATA Eclipse SAN Router completes testing under IBM TotalStorage Proven program.(Storage Area Network)
Data protection: recovery with tape.(first in/first out)
Looking back.(Calendar)
ILM: maximizing the real value of data.(Storage Management)(Information Lifecycle Management )

Terms of use | Copyright © 2009 Farlex, Inc. | Feedback | For webmasters | Submit articles