Contintuity guaranteed.For almost three decades, there has been a single, fundamental model of business continuity - duplication. On one level, organisations of all sizes have relied on backing up their data with the assumption that they can roll back to a previous state in the event of a failure. But the larger of them - banks, airlines and other companies that need to ensure little or no interruption to their transaction services - have taken that duplication a stage further by mirroring the core elements of their IT environments either on- or off-site. Although such duplication of systems and facilities involves considerable cost, and is therefore only usually an option for high-end users, it does mean that such organisations can attain service levels approaching or even surpassing the hallowed hal·lowed adj. 1. Sanctified; consecrated: a hallowed cemetery. 2. Highly venerated; sacrosanct: our hallowed war heroes. 'five nines' - the goal of 99.999% uptime. A variation on that theme that gained considerable traction in the late 1980s and early 1990s, especially among banks and telecoms companies, was to duplicate within the same box. So-called fault-tolerant or non-stop servers - from companies such as Tandem (now part of Hewlett-Packard) and Stratus stratus: see cloud. (Stratus Technologies, Maynard, MA, www.stratus.com) A manufacturer of fault-tolerant computers founded in 1980. It supports both the VOS and FTX Unix operating systems on its XA/R line of i860-based systems. - relied on cloning components, providing dual power supplies, standby processors, twin disk systems and so on. The system could switch to a back-up unit internally whenever a primary one failed. Again, those were the workhorses for organisations which could not tolerate downtime The time during which a computer is not functioning due to hardware, operating system or application program failure. - no matter the expense. However, over the last decade, systems vendors have been on a quest to bring those levels of business continuity to a much wider audience - at a much lower cost - by spreading the processing load across larger numbers of systems and thereby diluting or even eliminating the impact of any one failing. And that represents a fundamental shift in how availability is attained: from a model built on duplication to one based on redundancy. Today, that manifests itself in cluster systems but, increasingly, that approach is giving way to the notion of systems networked together in a grid architecture. Although the notion of hooking together the processing power of multiple servers has attractions for scalability, the primary reason for clustering servers together is to obtain a high degree of availability. Jean Bozman, an analyst at industry watcher IDC, outlines the situation: "In the absence of 100% non-stop or fully fault-tolerant systems Fault-tolerant systems Systems, predominantly computing and computer-based systems, which tolerate undesired changes in their internal structure or external environment. , clustered servers allow mission-critical applications to remain highly available, even if one of the individual server nodes should fail. Typically, two or more servers are connected together by shared access to storage and by an interconnect link that provides a 'heartbeat'. If the heartbeat (1) A periodic signal generated by hardware for activation and/or synchronization purposes. See MHz. (2) A periodic signal generated by hardware or software to indicate that it is still running. 1. is not detected from one of the attached servers, then clustering software initiates a failover of workload from the affected server to an alternative server." In a survey of 325 IT executives, IDC found that respondents were pursuing high availability Also called "RAS" (reliability, availability, serviceability) or "fault resilient," it refers to a multiprocessing system that can quickly recover from a failure. There may be a minute or two of downtime while one system switches over to another, but processing will continue. by deploying failover clusters or workload-balancing that allowed end users to access servers with little or no interruption even in the event of failure of hardware or software components within the cluster. The research, released in 2004, showed that 70% were using clusters primarily to leverage their high-availability characteristics. "This reflects the importance of achieving high availability for data and applications in a networked world that values 'anytime' access and that has little tolerance for downtime - planned or unplanned," says Bozman. But, in practice, clustering has presented some major challenges - not least the need to re-balance database and applications workload across the remaining processors in the event of a failure. Systems software companies - IBM (International Business Machines Corporation, Armonk, NY, www.ibm.com) The world's largest computer company. IBM's product lines include the S/390 mainframes (zSeries), AS/400 midrange business systems (iSeries), RS/6000 workstations and servers (pSeries), Intel-based servers (xSeries) , Oracle, Microsoft and others - have sought to address this, to varying degrees of success, by parallelising the workload and ensuring it can be balanced across all processing units. In the IDC survey, amongst those using 'parallel' clustering, one database system dominated. Oracle, with the Real Application Clusters version of its database, was used in 69% of sites. No other database scored above 10%. RAC See remote access concentrator. , available since June 2001, is a major aid to continuity at companies such as Deutsche Post Deutsche Post AG (ISIN: DE0005552004, LSE: DPO) is a German post, logistics and courier headquartered in Bonn, previously the German state-owned mail monopoly. It has 520,000 employees in more than 220 countries and territories worldwide and generated revenue of € 60. . As part of a push for efficiency and systems reliability, the German postal service postal service, arrangements made by a government for the transmission of letters, packages, and periodicals, and for related services. Early courier systems for government use were organized in the Persian Empire under Cyrus, in the Roman Empire, and in medieval wanted to bring together the 84 separate servers that powered its letter coding system Noun 1. coding system - a system of signals used to represent letters or numbers in transmitting messages code - a coding system used for transmitting messages requiring brevity or secrecy into a single cluster. Underpinning un·der·pin·ning n. 1. Material or masonry used to support a structure, such as a wall. 2. A support or foundation. Often used in the plural. 3. Informal The human legs. Often used in the plural. that cluster with RAC "meant we could ensure the system remained available 24x7," says Robert Leaman, director of systems architecture at Deutsche Post ITSolutions, the organisations IT services provider. "Any systems failure would mean returning to hand sorting, which would have had a dramatic impact on the bottom line and therefore could not be allowed to happen." Like other organisations, Deutsche Post is looking to take clustering to the next level, driven by a need to cut costs and overcome some of the limitations of clustering. Depending on the architecture, there are limits to the number of servers that can be clustered together, often forcing larger organisations to use high-end, expensive machines. Moreover, the act of clustering processing power establishes a single point of failure that might be vulnerable to events such as a power outage Noun 1. power outage - equipment failure resulting when the supply of power fails; "the ice storm caused a power outage" power failure equipment failure, breakdown - a cessation of normal operation; "there was a power breakdown" . Grid computing grid computing, the concurrent application of the processing and data storage resources of many computers in a network to a single problem. It also can be used for load balancing as well as high availability by employing multiple computers—typically personal promises to address such issues. "Grid computing involves networking together lots of commodity, low-cost servers acting like, and looking to the user like, one machine," says Chuck Phillips, president of Oracle. "The workload is balanced across machines as applications' needs dictate." In the company's 10g database product, Oracle has augmented its RAC software with grid control modules designed to do just that. In many settings, grid will remove the need for duplicate hardware. If one node is unplugged or becomes unavailable for any other reason, the workload is simply spread automatically across remaining nodes. The interest is particularly high for database grid deployments. At the halfway point of 2004, a survey by analyst group Evans Data found that 37% of database developers were implementing or planning grid computing architectures. "While we're still in the infancy of grid computing, the technology looks very promising for database sites struggling with capacity issues," says Joe McKendrick, an analyst at Evans Data. "We just can't keep throwing more hardware at the problem." And that thinking is driving early user demand. The UK's national mapping agency, Ordinance A law, statute, or regulation enacted by a Municipal Corporation. An ordinance is a law passed by a municipal government. A municipality, such as a city, town, village, or borough, is a political subdivision of a state within which a municipal corporation has been Survey, for example, cites a need for lower costs and higher resilience as reasons behind its adoption of Oracle 10g See Oracle database. . It is a similar story at OGMA OGMA Oficinas Gerais de Material Aeronáutico (Portugal) OGMA Old Growth Management Area (forestry industry) , the Portuguese aircraft maintenance company. As more companies prove the model, grid will play an increasingly large role in the evolution of business continuity. |
|
||||||||||||||||

Printer friendly
Cite/link
Email
Feedback
Reader Opinion