Mirroring Your Way To A Fault-Tolerant Storage System Beyond RAID 5.Most network and system managers prefer RAID disk arrays because they provide a measure of protection against drive failures. However, those who must attempt to keep the shop running without interruption typically require servers and storage with higher standards of fault tolerance See fault tolerant. (architecture) fault tolerance - 1. The ability of a system or component to continue normal operation despite the presence of hardware or software faults. This often involves some degree of redundancy. 2. or "no single point of failure." Unfortunately, standard RAID systems simply do not provide this measure of protection. Further, these enterprises typically run business critical database applications that are seldom, if ever, closed--even for backup--often 24 hours a day, seven days a week. Mechanical failure is inevitable. Disks, power supplies, fans, and computers all fail. It is the network manager's job to anticipate the costs of these predictable failures and compare them to the costs of prevention. How can you add several added elements of protection for those network server environments where the cost of data loss and downtime The time during which a computer is not functioning due to hardware, operating system or application program failure. are very high? For these situations, the cost of extra equipment redundancy is low compared to the anticipated costs of downtime or data loss. We all know equipment will fail--we want it to fail gracefully and not take the enterprise or our data with it. Becoming Better Than No Single Point Of Failure To reduce downtime due to component failures, there are three different methods to mirror your RAID 5 storage systems to provide a cost-effective solution for protecting critical situations. Each method protects your data even if an entire RAID array fails. To this end, these methods offer "no single point of failure" at the storage array and optionally at the server. Level 1--Storage Redundancy--Drives Can Fail. The first method provides a storage architecture designed to provide a full measure of fault tolerance via component redundancy. Unique to the architecture, this Level 1 system avoids the vulnerabilities to single failure points commonly found in typical storage arrays. This no single point of failure design in the RAID storage architecture uses a pair of RAID 5 systems that are each connected to a server that supports host-based mirroring. Operating systems Operating systems can be categorized by technology, ownership, licensing, working state, usage, and by many other characteristics. In practice, many of these groupings may overlap. that perform mirroring include Netware, Windows NT (Windows New Technology) A 32-bit operating system from Microsoft for Intel x86 CPUs. NT is the core technology in Windows 2000 and Windows XP (see Windows). Available in separate client and server versions, it includes built-in networking and preemptive multitasking. , Solaris, HP-UX HP's version of Unix that runs on its 9000 family. It is based on SVID and incorporates features from BSD Unix along with several HP innovations. (operating system) HP-UX - The version of Unix running on Hewlett-Packard workstations. , AIX (Advanced Interactive eXecutive) IBM's Unix-based operating system which runs on its Intellistation workstations and pSeries, p5, iSeries and i5 server families. , OpenVMS (Volume Shadowing), Digital Unix See Tru64 Unix. , SGI (SGI, Sunnyvale, CA, www.sgi.com) A manufacturer of workstations and servers, founded in 1982 by Jim Clark. The company was founded as Silicon Graphics, Inc., but changed to its acronym in 1999. Irix, and others. Fig 1 shows the setup. Unlike a standard RAID 5 array, this method can withstand a minimum of three simultaneous drive failures and still continue to run properly--all transparently to system users. Each RAID 5 array can sustain a drive failure and continue operating with parity information. If a hot spare is present or a replacement drive is inserted, the system continues with either one or even both data rebuilds in progress simultaneously. During this critical time, another drive can fail, taking down an entire RAID 5 array. This would normally disable To turn off; deactivate. See disabled. the server and all the users and risk data loss. However, this Level 1 system keeps running. One RAID 5 array with a failed drive is still sufficient to run the server and keep data continuously accessible to users. This provides ample time to fix the disabled RAID 5 array and avoid potential data loss. With optional "hot-spare" drives installed, the Level 1 array can withstand the subsequent failure of up to two additional drives. This Level 1 array can also withstand the failure of multiple fans and multiple power supplies. Each RAID 5 array is typically connected to a separate UPS and has redundant AC connections. An AC power line to each array can fail and an UPS can fail during a power outage Noun 1. power outage - equipment failure resulting when the supply of power fails; "the ice storm caused a power outage" power failure equipment failure, breakdown - a cessation of normal operation; "there was a power breakdown" or even an entire RAID 5 array can completely fail and the mirrored RAID 5 arrays will continue to operate. Level 2--Server Redundancy--Server Can Fail. The multi-host, Level 2 configuration (Fig 2) takes advantage of a RAID array's multi-hosting capabilities. Each server is connected to both RAID 5 arrays. In most cases, the second server is a standby server-ready to take over if the first server fails. This configuration also withstands the failure of a bus since there are two RAID arrays connected to each server via separate buses. Bus hang-ups do occur. While rebooting can easily reset these hang-ups, this practice may prove unacceptable in many environments where users expect the system to be in full operation. Thus, the environment continues operating with the loss of a server, bus, or storage element. Level 3--Clustering Hosts Share Data. Sharing data between multiple servers enables network managers to distribute workloads onto multiple servers without the need to arbitrarily decide how to split up the data. This capability of Level 3 also offers no single point of failure in the entire environment when combined with RAID 5 storage. It also eliminates the obvious idleness of the backup server A computer in a network used to store copies of files from client machines or other servers. Such servers typically have their disks set up in a RAID configuration to provide fault tolerance. See backup program, RAID, SAN and LAN free backup. . The following operating systems support this environment: Digital Unix and Open VMS (1) (Virtual Memory System) A multiuser, multitasking, virtual memory operating system for the VAX series from Digital. VMS applications run on any VAX from the MicroVAX to the largest unit. See OpenVMS. , HP-UX, AIX, and Solaris. This architecture creates a server environment with true no single point of failure fault tolerance, maximizes the utility of all installed components, and promotes ease of access to shared data. Joel Leider is the CEO (1) (Chief Executive Officer) The highest individual in command of an organization. Typically the president of the company, the CEO reports to the Chairman of the Board. of Winchester Systems (Woburn, MA). |
|
||||||||||||||||

Printer friendly
Cite/link
Email
Feedback
Reader Opinion