Planning for backup and recovery: the key elements for a successful backup and recovery strategy.Backup and recovery are point-source solutions on the continuum of availability and resiliency. Availability describes operational behavior of a system under adverse conditions (i.e., failure of one-half of a clustered system). Resiliency describes the operational behavior of system restoration after service degradation due to an unplanned event. In their most extreme examples, both availability and resiliency are provided by backup and recovery systems. Backup and recovery systems traditionally address two specific requirements: error correction (i.e., inadvertent deletion of a necessary file or files) and disaster recovery. Many firms developed and currently operate backup and recovery strategies primarily designed to address the operational issue of error recovery. The backup media generated by these error recovery strategies are subsequently taken off site and extended to meet disaster recovery requirements mandated by regulatory bodies or to support limited recovery for only a few weeks or long enough to close the business. Most of these strategies focused on the risks associated with a site- or institution-specific outage or disaster. Re-Evaluating B&R Strategies Enterprises of all sizes, from small businesses to large financial services The examples and perspective in this article or section may not represent a worldwide view of the subject. Please [ improve this article] or discuss the issue on the talk page. firms, are re-evaluating backup and recovery strategies. This is driven by several, highly correlated factors. The terrorist attacks of September 11, 2001, and the North East power grid failure of August 14, 2003 demonstrated, firstly, that regional disasters must be considered as likely as localized institutional disasters--which calls into question the validity of over-subscription in the first-come, first-served “FCFS” redirects here. For the figure skating competition, see Four Continents Figure Skating Championships. This article is about a general service policy. For the technical concept, see FIFO. shared services shared services, n.pl the administrative, clinical, or other service functions that are common to two or more hospitals or their health care facilities and used jointly or cooperatively by them. disaster recovery market, such as IBM (International Business Machines Corporation, Armonk, NY, www.ibm.com) The world's largest computer company. IBM's product lines include the S/390 mainframes (zSeries), AS/400 midrange business systems (iSeries), RS/6000 workstations and servers (pSeries), Intel-based servers (xSeries) , EDS (Electronic Data Systems, Plano, TX, www.eds.com) Founded in 1962 by H. Ross Perot (independent candidate for the President of the U.S. in 1992), EDS is the largest outsourcing and data processing services organization in the country. , and SunGard. Secondly, these events demonstrated that in the age of e-business, same-day settlement, and real-time Internet transactions, computer systems are no longer business facilitation tools, but rather fundamental components of the ability to perform business at all. This calls into question the pervasive belief that long-term recovery strategies can rely on the ability to close the business through access to sufficient books and records. These factors are evident in the Interagency White Paper on Sound Practices to Strengthen the Resilience of the U.S. Financial System, which addresses business resiliency in the financial services sector, and was co-authored by the Federal Reserve Board, the Office of the Comptroller of the Currency The Office of the Comptroller of the Currency (or OCC) was established by the National Currency Act of 1863 and serves to charter, regulate, and supervise all national banks and the federal branches and agencies of foreign banks in the United States. , and the Securities and Exchange Commission. Internal providers of data center services are increasingly challenged to address a diverse set of changing requirements and subjective interpretations. The key to success in this endeavor, however, is to ensure that the solutions are well mapped to the business requirements. Inasmuch as in·as·much as conj. 1. Because of the fact that; since. 2. To the extent that; insofar as. inasmuch as conj 1. since; because 2. business requirements are potentially subject to significant change, as indicated above, the first step in the architecture of a functional availability and resiliency model is to perform a Business Impact Analysis or BIA BIA abbr. Bureau of Indian Affairs . Business Impact Analysis The BIA provides identification of critical business processes required to operate the business. These business processes are then mapped to the underlying applications, which, in turn, allow identification of systems, databases, and files required to operate the business. Within the context of the BIA, all business processes and applications seldom share a common level of criticality, requiring the imposition of tiers of availability; for example, some systems must be operational within 30 minutes of an event, and other systems must be operational within two business days. An important factor for the enterprise to consider during the BIA is the dollar value of the business to be recovered. Backup and recovery systems mitigate risk of loss, but only at the cost of implementation. A well-understood cost model of the business to be protected is critical for valid interpretation of the costs to deploy a robust recovery system. RTO (Recovery Time Objective) The amount of time a computer system or application can stop functioning before it is considered intolerable to the enterprise. It can be computed to be from seconds to days, depending on how critical the application is to the organization. , RPO RPO Recruitment Process Outsourcing RPO Recovery Point Objective (disaster recovery) RPO Royal Philharmonic Orchestra RPO Rochester Philharmonic Orchestra RPO Representative Poetry Online RPO Railway Post Office and MPO MPO myeloperoxidase. MPO Myeloperoxidase, see there The definition of these tiers of availability and resiliency provide the foundation for a functional architecture. The key data points that differentiate the tiers of availability are the RTO, RPO, and MPO. The RTO is the maximum time between an event and the time at which a system must be returned to operation, or Recovery Time Objective. The RPO is the maximum allowable data loss, or Recovery Point Objective. The MPO, or Maintenance Point Objective, is an additional but less commonly articulated metric that describes the maximum allowable window for the performance of system maintenance. The RTO, RPO and MPO are critical data points to consider during architecture of an availability and resiliency model. As previously discussed, tape-based backup and recovery systems can be considered as point-source solutions on this continuum. Tape-based backup systems, however, do not easily accommodate a RTO of less then four hours, due to the time required to physically retrieve the data from tape, nor do they easily accommodate a RPO requirement of less than 24 hours for similar reasons. A multi-terabyte system, however, with a low RTO (less than 10 minutes) and a large RPO (greater than 72 hours) may be accommodated by tape as the restoration process may take place at preemptive pre·emp·tive or pre-emp·tive adj. 1. Of, relating to, or characteristic of preemption. 2. Having or granted by the right of preemption. 3. a. set intervals to prepare for rapid recovery, but such a scenario may fail due to the generation of the backup data sets exceeding the allowable MPO. The RTO, RPO, MPO, and identification of necessary data and compute elements within the enterprise are the primary result of the BIA and provide the business requirements for a backup and recovery solution. Backup and recovery solutions that cannot clearly articulate the provisioned objectives or source of their definition are seldom viewed as successful implementations. Technical Impact Analysis Following the BIA is the Technical Impact Analysis or TIA (1) (Telecommunications Industry Association, Arlington, VA, www.tiaonline.org) A membership organization founded in 1988 that sets telecommunications standards worldwide. It was originally an EIA working group that was spun off and merged with the U.S. . The TIA is the process by which the business requirements are mapped to technical requirements. The TIA identifies additional technical considerations, such as an application-specific requirement to suspend active transaction processing Updating the appropriate database records as soon as a transaction (order, payment, etc.) is entered into the computer. It may also imply that confirmations are sent at the same time. Transaction processing systems are the backbone of an organization because they update constantly. during backup or replication, sensitivity to network architecture and latency. A critical component to be considered during the TIA is the arbitrage of diversely distributed data elements of enterprise data. For example, a transaction may be captured by a primary system, which maintains state information on the transaction while simultaneously updating another database. Without properly correlating the backup and recovery processes, each database may maintain independent referential integrity A database management safeguard that ensures every foreign key matches a primary key. For example, customer numbers in a customer file are the primary keys, and customer numbers in the order file are the foreign keys. , but the correlation between the two may be lost. The collection, collation COLLATION, descents. A term used in the laws of Louisiana. Collation -of goods is the supposed or real return to the mass of the succession, which an heir makes of the property he received in advance of his share or otherwise, in order that such property may be divided, together with the , and aggregation of these technology-specific elements develop into the technical requirements for a backup and recovery solution. Differing technologies (such as mass storage array-based synchronous or asynchronous Refers to events that are not synchronized, or coordinated, in time. The following are considered asynchronous operations. The interval between transmitting A and B is not the same as between B and C. The ability to initiate a transmission at either end. replication, operating system operating system (OS) Software that controls the operation of a computer, directs the input and output of data, keeps track of files, and controls the processing of computer programs. replication, "snap" or BCV BCV Banco Central de Venezuela BCV Banque Cantonale Vaudoise BCV Bovine Coronavirus BCV Business Continuance Volumes (EMC Technology) BCV Beach Club Villas (Disney resort) BCV Battle Command Vehicle copies, tape generation and cloning, and multi-phase dual-site commit databases) all provide points on the continuum of availability and resiliency, and can be combined to offer the necessary RTO, RPO, MPO, economic performance and operational viability required by the enterprise. Armed with the business and technical requirements, in conjunction with the capabilities of the various technologies, the system administrator has the information required to engineer and develop the required infrastructure for backup and recovery. It is important to note that backup and recovery can no longer be considered to be simply the tape backup Using magnetic tape for storing duplicate copies of hard disk files. Users can add an internal or external tape drive to their desktop computers for backup purposes, and files are typically copied to the tapes using a backup utility that updates on a periodic schedule. environment, but rather the whole range of potential solutions for the provisioning of availability and resiliency. Once the TIA is complete and firm technical requirements with potential solutions have been identified, the system administrator must perform a financial analysis of the viable solutions. Initial capital expense, increases in operating costs operating costs npl → gastos mpl operacionales associated with operational personnel or re-occurring monthly network access fees, media costs, maintenance costs, and the GAAP GAAP See: Generally Accepted Accounting Principles GAAP See generally accepted accounting principles (GAAP). accounting treatment of these expenses should be taken into consideration. The financial analysis ensures that the projected solution set will provide the appropriate economic performance for the enterprise. A Viable Solution With the completion of the BIA, TIA, technical architecture and associated financial analysis, a viable solution may include a variety of technologies similar to the Table--all of which should be viewed as components of the backup and recovery strategy. As seen in the Table, the enterprise solution is comprised of a variety of technical implementations that meet the business-defined performance requirements. These solutions must be rigorously tested during initial implementation. Additionally, due to the sporadic nature of their usage, system administrators must devise and implement testing regimens to ensure continued proper functioning of the systems through data validity checking Routines in a data entry program that test the input for correct and reasonable conditions, such as account numbers falling within a range, numeric data being all digits, dates having a valid month, day and year, etc. , media checking, log management, and other standard system administration management techniques. Conclusion In sum, the key elements for successful development of a backup and recovery strategy are to ensure that the business requirements have been properly captured and properly valued. The analysis of these business requirements yields the technical requirements. Armed with the business requirements, technical requirements, capabilities of the technology set, and the desired economic performance characteristics, the system administrator is well positioned to develop a successful and valid backup and recovery strategy.
System RTO RPO MPO
Mainframe 60 minutes 0 min. 8 hrs/week
Database 5 minutes 0 min. 3 hrs/night
Systems
Web 1 minute 24 hours 1 hr/week
Systems
File 30 minutes 24 hours 3 hrs/night
Servers
System Solution Set
Mainframe Mass storage array with
synchronous replication, weekly
tape backup
Database High availability cluster server
Systems on mass storage array with
synchronous replication
Web Multiple distributed nodes with
Systems tape backup
File Mass storage array with
Servers asynchronous replication, weekly
tape backup
James Dow is director of the Technical Architecture Group at CS Technology (New York New York, state, United States New York, Middle Atlantic state of the United States. It is bordered by Vermont, Massachusetts, Connecticut, and the Atlantic Ocean (E), New Jersey and Pennsylvania (S), Lakes Erie and Ontario and the Canadian province of , NY) www.cstechnology.com |
|
||||||||||||||||||||

Printer friendly
Cite/link
Email
Feedback
Reader Opinion