Insuring The Reliability Of Fibre Channel RAID Storage.A major benefit of Storage Area Networks is fast "any to any" server or client access to RAID storage. In a mission-critical environment, this places emphasis on ensuring high availability Also called "RAS" (reliability, availability, serviceability) or "fault resilient," it refers to a multiprocessing system that can quickly recover from a failure. There may be a minute or two of downtime while one system switches over to another, but processing will continue. of not only the data access paths, but also the RAID storage system itself. Fortunately, standardized Fibre Channel layers define media and interface characteristics, as well as specifying highly reliable transmission protocols with low bit-error rates. SAN fabrics have evolved to include redundancies among switches and access paths, providing failover insurance against hardware problems. From a hardware perspective, RAID systems typically include such high-availability features as redundancies, hot-swappability, and thermal management to dissipate dis·si·pate v. dis·si·pat·ed, dis·si·pat·ing, dis·si·pates v.tr. 1. To drive away; disperse. 2. heat build-up build·up also build-up n. 1. The act or process of amassing or increasing: a military buildup; a buildup of tension during the strike. 2. . Fibre Channel RAID systems with dual-loop architectures even provide protection against internal disk channel failures. Alarm systems and remote management capabilities further contribute to the reliability of today's RAID storage systems. The storage industry has embraced traditional RAID levels (1, 3, 5) and variations thereof (0+1, 1+5, 6, etc.) as means of protecting critical information against the likelihood of disk drive failures. Typically, however, this protection is limited to a single drive failure (RAID 3 or 5). At most, protection against three concurrent inoperable inoperable /in·op·er·a·ble/ (in-op´er-ah-b'l) not susceptible to treatment by surgery. in·op·er·a·ble adj. Unsuitable for a surgical procedure. drives is achieved, but at the cost of expensive mirroring. Even exotic arrays of this nature have limitations on the conditions under which drive failures can be sustained. LAND-5 has developed patented algorithms that allow a disk RAID array consisting of "N" drives to sustain operations even in the event of "M" drive failures, where 1[less than]=M[less than]N. Called "eRAID," this breakthrough technology can be implemented with far fewer disk drives than mirroring while also yielding higher performance and enhanced reliability. INTRODUCTION With the growth of mission-critical information requiring twenty-four hour access, the reliability of storage systems is paramount. Downtime The time during which a computer is not functioning due to hardware, operating system or application program failure. is extremely costly. Customers, vendors, employees, and prospects can no longer conduct essential business or critical operations. There is a "lost opportunity" cost to storage failures, as well, in terms of business lost to competitors. Well-documented studies place the cost of downtime in the tens of thousands (or even millions) of dollars per hour. Consider the recent problems with eBay, a major online auction Website with 2 million customers that suffered extended equipment crashes. The company, which saw its stock value slide by almost 20 percent, lost significant revenue over the three-day period--eBay warned that the latest 22-hour outage out·age n. 1. A quantity or portion of something lacking after delivery or storage. 2. A temporary suspension of operation, especially of electric power. would knock between $3 million to $5 million off Q2 sales. However, the greater damage could be to eBay's reputation, especially if it continues to be plagued by outages. In a recent survey of consumers, Jupiter Communications found that 46 percent of online consumers leave a preferred site if they experience technical or performance problems. The need for large amounts of reliable online storage is fueling demand for fault-tolerant technology. According to according to prep. 1. As stated or indicated by; on the authority of: according to historians. 2. In keeping with: according to instructions. 3. International Data Corporation, the market for disk storage systems last year grew by 12 percent, topping $27 billion. More telling than that figure, however, is the growth in capacity being shipped, which grew 103 percent in 1998. Much of this explosive growth can be attributed to the space-eating demands of endeavors such as year 2000 testing, installation of data-heavy enterprise resource planning See ERP. (application, business) Enterprise Resource Planning - (ERP) Any software system designed to support and automate the business processes of medium and large businesses. applications, and the deployment of widespread Internet access See how to access the Internet. . The rising tide Noun 1. rising tide - the occurrence of incoming water (between a low tide and the following high tide); "a tide in the affairs of men which, taken at the flood, leads on to fortune" -Shakespeare flood tide, flood of Storage Area Networks (SAN) is fueled by the prospect of providing "any to any" high-performance access by networked servers and clients to critical information on a continuous basis. RAID storage is the underlying foundation of SAN technology, necessary to insure that mission-critical data is available when needed. Access to online storage on a 24x7 basis is essential to most SAN configurations. Thus, the reliability of Fibre Channel RAID storage is, in a sense, the Achilles heel Achilles heel Noun a small but fatal weakness [Achilles in Greek mythology was killed by an arrow in his unprotected heel] Achilles heel n → talón m de Aquiles of a SAN fabric. In examining SAN storage, attention is quickly focused on three elements that are essential to reliability: * The error checking scheme inherent in the transmission protocol * The reliability of the RAID storage unit itself * The ability of the RAID storage system to withstand multiple drive failures This article discusses each topic in turn. Industry answers exist for the first two subjects, but the storage community is still applying expensive "band-aids" in an attempt to overcome the inevitability of disk drive failures in large storage arrays. TRANSMISSION PROTOCOLS Fibre Channel has five layers: FC-0 through FC-4. The FC-0 layer defines the media and interface characteristics of full-duplex serial links between points. It lets Fibre Channel scale its signaling rates and define conforming cabling and connectors without affecting upper level protocols. As such, the FC-0 layer facilitates high-performance availability to Fibre Channel storage systems. The FC-l layer defines transmission protocols. It defines how FC-0 signals are patterned to carry data and how port-to-port links are initialized and, if necessary, recovered from error conditions. Within a Fibre Channel network, the transmitter keeps track of the number of binary 0s and 1s. Likewise, the receiver also tracks the running disparity of 0s and 1s to detect any errors. Fibre Channel also uses a control character to synchronize See synchronization. word boundaries. With a specified bit-error rate of less than one bit error in 1012 bits, the FC-1 layer provides low-cost, reliable transmit-and-receive circuits and a transmission protocol that is independent of media, distance, or data rate. Together, the FC-0 and FC-1 layers provide a solid foundation for reliable, high-performance access to Fibre Channel storage systems configured in a switched fabric network. Along with the other layers, they also present a standard, open architecture for interfacing Fibre Channel storage systems, supporting a competitive atmosphere that benefits the consumer. RAID SYSTEM RELIABILITY Most enterprise-level storage is "mission critical" these days. Corporate Intranets are the lifeblood life·blood n. 1. Blood regarded as essential for life. 2. An indispensable or vital part: Capable workers are the lifeblood of the business. of employees, vendors, and contractors. Presenting an appealing Web site to customers and prospects on a 24x7 basis is essential to competitive survival. Online databases and consumer activities require storage systems that are impervious im·per·vi·ous adj. 1. Incapable of being penetrated: a material impervious to water. 2. Incapable of being affected: impervious to fear. to normal fatigue or thermal failures. Reliable storage systems are crucial for SANs. "High availability" is implemented through redundancy of critical components, hot-swappability in the event of component failure, and management of heat build-up. * Thermal Management Heat, or thermal energy thermal energy Internal energy of a system in thermodynamic equilibrium (see thermodynamics) by virtue of its temperature. A hot body has more thermal energy than a similar cold body, but a large tub of cold water may have more thermal energy than a cup of boiling , is transferred from one body to another by virtue of a temperature differential. In short, heat flows from a high-temperature area to a lower-temperature area. If there is no means of removing heat, then a steady state condition will eventually be reached wherein the internal temperature of a system enclosure equals that of its hottest element. In general, there are three methods, or modes, of heat transfer: conduction conduction, transfer of heat or electricity through a substance, resulting from a difference in temperature between different parts of the substance, in the case of heat, or from a difference in electric potential, in the case of electricity. (transfer of heat through a solid caused by molecular oscillations oscillations See Cortical oscillations. ), convection (transfer of heat from the surface of a solid to the surrounding air), and radiation. LAND-5's PolAIRis, a thermal management system, focuses on removing this heat by using strategically placed conduits and fans to direct an optimized volume of airflow through its system enclosures. Fast and highly integrated circuits Integrated circuits Miniature electronic circuits produced within and upon a single semiconductor crystal, usually silicon. Integrated circuits range in complexity from simple logic circuits and amplifiers, about 1/20 in. (1. generate large amounts of heat. Although a typical ECL (Emitter-Coupled Logic) A digital circuit composed of bipolar transistors in which the emitter ends are wired together. ECL gates switch faster than TTL gates, but consume more power. See TTL, I2L and bipolar. 1. gate dissipates less than 10 milliwatts, 10,000 of these gates integrated onto a chip can bring total power consumption easily up to 20-30W At high temperatures, corrosion mechanisms accelerate and stresses are generated at the material interfaces because of different expansion coefficients. As a result, solder solder (sŏd`ər), metal alloy used in the molten state as a metallic binder. The type of solder to be used is determined by the metals to be united. Soft solders are commonly composed of lead and tin and have low melting points. Hard solders (i. and wire bonds fail. In addition, CMOS (Complementary Metal Oxide Semiconductor) Pronounced "c-moss." The most widely used integrated circuit design. It is found in almost every electronic product from handheld devices to mainframes. switching speed degrades as the temperature increases. To eliminate negative temperature effects, heat must be removed rapidly from semiconductor devices. In computer equipment, disk drives, processors, ASICs, and power supplies tend to be the hottest components. Disk drives operate at high Revolutions Per Minute (RPM) and quickly begin to generate considerable heat, the leading cause of disk drive failure. High-performance CPUs typically have a dedicated fan and a heat sink A material that absorbs heat. Typically made of aluminum, heat sinks are widely used in amplifiers and other electronic devices that build up heat. Small heat sinks are the most economical method for cooling microprocessors and other chips. to dissipate heat build-up. However, most dedicated I/O (Input/Output) The transfer of data between the CPU and a peripheral device. Every transfer is an output from one device and an input to another. See PC input/output. I/O - Input/Output subsystems now contain powerful processors, as well (such as Intel's 1960), and these generate considerable heat that must be discharged by the enclosure's thermal management. Likewise, most ASICs become local hot spots hot spots acute moist dermatitis. within an enclosure, endangering surrounding components unless their heat is rapidly dissipated dis·si·pat·ed adj. 1. Intemperate in the pursuit of pleasure; dissolute. 2. Wasted or squandered. 3. Irreversibly lost. Used of energy. . Power supplies, critical to continuous system operation, also quickly fail without adequate cooling. Thus, it is clear that disk drive failures are often related to heat build-up within an enclosure. Generally speaking, disk drive reliability drops sharply as internal enclosure temperature rises above 45[degrees]C (113[degrees]F). A reduction in temperature of five degrees Centigrade centigrade /cen·ti·grade/ (sen´ti-grad) having 100 gradations (steps or degrees); see under scale. cen·ti·grade adj. Celsius. can significantly improve disk drive reliability from 15% to 40%, depending on the actual inside cabinet air temperatures. To address the requirement for extreme system uptime, better RAID storage systems implement sophisticated thermal management using a reverse air cooling a. 1. In devices generating heat, such as gasoline-engine motor vehicles, the cooling of the device by increasing its radiating surface by means of ribs or radiators, and placing it so that it is exposed to a current of air. Cf. Water cooling. process, multiple fans, and a chassis design that creates a "wind tunnel wind tunnel, apparatus for studying the interaction between a solid body and an airstream. A wind tunnel simulates the conditions of an aircraft in flight by causing a high-speed stream of air to flow past a model of the aircraft (or part of an aircraft) being tested. " effect, drawing cool air across heat-generating components. Airflow is controlled to conform to Verb 1. conform to - satisfy a condition or restriction; "Does this paper meet the requirements for the degree?" fit, meet coordinate - be co-ordinated; "These activities coordinate well" one direction, thereby maximizing the cooling effect of multiple fans. Heat dissipation Noun 1. heat dissipation - dissipation of heat chilling, cooling, temperature reduction - the process of becoming cooler; a falling temperature is further aided by designing the system to reduce heat sources. As a final measure, temperature monitoring, along with visual and audio alarms, is required. * System Redundancies Even with excellent thermal management, hardware failures are inevitable in any storage system. Hence, it is essential to design a high-availability RAID system with redundancies in order to ensure that storage access is not interrupted whenever a failure occurs. The most common redundancy is dual power supplies. If one power supply fails, the remaining power supply should be sufficient to allow continued system operation for an indefinite period. Added safety is achieved by designing the power system to include automatic load balancing The fine tuning of a computer system, network or disk subsystem in order to more evenly distribute the data and/or processing across available resources. For example, in clustering, load balancing might distribute the incoming transactions evenly to all servers, or it might redirect them , thereby prolonging the life cycle of each power supply. Having separate power cords allows each power supply to be plugged into a separate circuit, enhancing protection against the failure of an electrical system within the building. Adding an UPS buys time in the event of a complete power outage Noun 1. power outage - equipment failure resulting when the supply of power fails; "the ice storm caused a power outage" power failure equipment failure, breakdown - a cessation of normal operation; "there was a power breakdown" . As discussed, aggressive system cooling is essential to continuous operation. Hence, redundant fans are critical. Many RAID systems have unfortunately not learned this lesson and their users suffer accordingly. Redundant RAID controllers A disk controller card that supports one or more RAID configurations. Originally only for SCSI drives, RAID controllers have become very popular for PATA and SATA drives. See RAID. have two benefits. They provide a fail-over capability in case one fails. Moreover, in an "active-active" mode, the controllers can share the workload, thereby enhancing system performance. Channel failures do occur. A truly mission-critical RAID system compensates for this possibility by having built-in redundancies in the form of A-B A-B Air-Britain (UK-based aviation historical society) A-B Research Centre Applied Biocatalysis (Graz, Austria) loops for each internal channel. If one loop fails, the remaining loop kicks in to ensure continued operations. In the future, RAID controllers will be able to take advantage of dual loop architectures to significantly increase transfer rates through "active-active" operation. For example, the new LAND-5 ICEbox FC 2500 RAID storage system has three disk channels, each supported by independent A-B loop access. Now providing up to an aggregate 300MB/sec transfer rate, the potential exists to double performance to 600MB/sec when controllers that support simultaneous dual-loop data access become available. Having at least a global hot-spare disk drive is a universal requirement for a mission-critical RAID system. More sophisticated systems also support local hot spares. * Hot Swappability for Critical Components Mission-critical storage systems demand the ability to perform repairs without interrupting operations. Thus, major system components that are the most likely to fail over time must be "hot swappable See hot swap. ." Local personnel must be able to access and swap out a failed disk drive, fan, or power supply with minimal effort. Better systems with redundant RAID controllers also support replacement of a failed controller "on the fly." * RAID System Architecture Considerations Two backplane An interconnecting device that has sockets for printed circuit boards to plug into. Passive and Active Although resistors may be used, a "passive" backplane adds no processing in the circuit. architectures are available in commercial RAID systems--active and passive. Both support an A-B loop architecture. A passive backplane A backplane that adds no processing in the circuit. See backplane. allows hot-swappability of controller and channel interface boards. However, its architecture increases the design complexity and cost. Active backplanes allow channel segmentation, a performance boost. They are also less costly to design and build. The downside is that if a channel fails, then the entire backplane must be replaced. PROTECTION AGAINST MULTIPLE DISK DRIVE FAILURE RAID storage configurations have proven to be the best hedge against the possibility of a single drive failure within an array. Each RAID level, however, has its pluses and minuses: * While RAID 0 delivers high performance, it cannot sustain even a single drive failure because there is no parity Not using a parity bit to check for errors. For example, an 8-N-1 setting in a communications program, which was widely used before the Web, means each character transmitted contains (8) eight bits, (N) "no" ninth parity bit and (1) one additional stop bit to mark the end. See non-parity memory. information or data redundancy Writing data to two or more locations for backup and data recovery. For example, data can be stored on two or more disks or disk and tape or disk and the Internet. See disk redundancy and data recovery. . * Although the most costly, mirroring data on separate drives (RAID 1), means that if one drive fails, critical information can still be accessed from the mirrored drive. Typically, RAID 1 involves replicating all data on two separate "stacks" of disk drives on separate SCSI SCSI in full Small Computer System Interface Once common standard for connecting peripheral devices (disks, modems, printers, etc.) to small and medium-sized computers. SCSI has given way to faster standards, such as Firewire and USB. channels, incurring the cost of twice as many disk drives. There is a performance impact, as well, since data must be written twice, consuming both RAID system and possibly server resources. * RAID 3 and RAID 5 allow continued (albeit, degraded) operation by reconstructing lost information "on the fly" through parity checksum A value used to ensure data are stored or transmitted without error. It is created by calculating the binary values in a block of data using some algorithm and storing the results with the data. calculations. Adding a global hot spare provides the ability to perform a background rebuild of lost data. With the exception of costly RAID 1 (or combinations of RAID 1 with RAID 0 or RAID 5) configurations, there has been no solution for recovering from a multiple drive failure within a RAID storage system. Even the exceptions sustain multiple drive failures only under very limited circumstances. For example, a RAID 1 configuration can obviously lose multiple (or all) drives in one mirrored stack, as long as not more than one disk falls in its mirrored partner. Combining striping Interleaving or multiplexing data to increase speed. See disk striping. striping - data striping and parity within mirrored stacks buys some additional capabilities, but is still subject to these drive-failure limitations. Why would a system need protection against more than one drive failure at a time? Isn't the reliability of today's disk drives so high that the chances of a multiple drive failure are remote? Disk drive manufacturers publish Mean Time Between Failure (MTBF (Mean Time Between Failure) The average time a component works without failure. It is the number of failures divided by the hours under observation. MTBF - Mean Time Between Failures ) figures as high as 800,000 hours (91 years). Yet, as one examines these claims, disk drive manufacturers readily admit that such claims are unrealistic. In fact, the practical life of a disk drive is five to seven years of continuous use. Information Technology managers can painfully testify that disk drives fail with great frequency. That's why all companies place emphasis on storage backup and there is such a large market for tape systems. It is clear that the likelihood of a drive failure increases as more drives are added to a disk RAID storage system. For example, a terabyte of RAID 5 storage consisting of fiftyeight 18GB disk drives can expect a drive to fail every 44 days! Moreover, when one drive fails, the statistical odds of a second drive failing increase dramatically and if two drives fail, the odds of a third failure jump again. In short, the more drives configured in a RAID storage system, the greater is its potential for suffering multiple drive failures. Also, disk drives configured within a RAID storage system can be of different ages, including a mixture of new and older drives. This profile increases the odds of a multiple drive failure. The consequences of a multiple-drive failure can be devastating dev·as·tate tr.v. dev·as·tat·ed, dev·as·tat·ing, dev·as·tates 1. To lay waste; destroy. 2. To overwhelm; confound; stun: was devastated by the rude remark. . Typically, if more than one drive fails, or a service person accidentally removes the wrong drive when attempting to replace a failed drive, the entire RAID storage system is out of commission. Access to critical information is not possible until the RAID system is re-configured, tested, and a backup copy A disk, tape or other machine readable copy of a data or program file. Making backup copies is a discipline most computer users learn the hard way-- after months of work is lost. See backup and LAN free backup. restored. Transactions and information written since the last backup may be lost forever. Extensive research and development by LAND-S has resulted in a set of software and hardware algorithms that augments RAID storage by performing automatic, transparent recovery from multiple drive failures without interrupting ongoing operations. Called "eRAID," these patented algorithms allow users to select the degree of disk-loss insurance desired. Continued operations are possible even in the event of N1 drive failures. Moreover, because these algorithms have exceptionally fast computational speeds, storage transfer rate performance actually increases under eRAID while adding virtually unlimited data protection. eRAID consists of a series of software matrix array formulas. It involves breakthrough algorithms for accomplishing XOR (eXclusive OR) A Boolean logic operation that is widely used in cryptography as well as in generating parity bits for error checking and fault tolerance. XOR compares two input bits and generates one output bit. The logic is simple. If the bits are the same, the result is 0. calculations (which are the basis of RAID 5). eRAID dramatically alters the reliability of RAID storage by circumventing previous limitations on the number of permissible drive failures. With eRAID, all but one drive can fail (assuming sufficient capacity) and users will still have access to critical information. HOW DOES ERAID DIFFER FROM TRADITIONAL RAID? Today, the ultimate protection for critical information is accomplished through RAID 1 (mirroring), overlaying RAID 5 (striping with parity), and then adding a global hot spare. For example, if user data consumes four disk drives, then reliability is improved by replicating this data on a second "stack" of four drives. Within each stack, however, losing just one drive would make the whole database useless. To further enhance reliability, each mirrored stack can be configured as an individual RAID 5 system. Since implementing parity requires an additional drive, user data and parity information are now striped across five drives within each stack. This provides protection against the loss of a single drive within each stack. So, from an original database that required just four drives, this RAID configuration has grown to include: * Four drives for the original data * Four drives for the mirrored data * One parity-drive (equivalent) for each stack (Two total) * One global hot spare (standby drive on which data can be rebuilt if a drive fails) This architecture now requires a total of eleven disk drives (Fig 1). Thus, seven drives have been added to protect data on the four (original) drives. This configuration can recover from a failed drive in either stack. Even if all the drives in one stack failed, the remaining drives in the surviving stack would still provide access to critical data. However, in this case, only one drive failure in the remaining stack could be tolerated. Overall, if multiple drive failures occur within each stack, access to the database is lost. Barring a total stack failure, its maximum protection is against the failure of three drives, but in a limited fashion (maximum of two failures in any one stack). Looking at the same example using eRAID to achieve equal protection against multiple drive failure (Fig 2), protection against three-drive failure is achieved at less cost and overhead: * Requires only eight disk drives compared toll for traditional RAID * Requires less administrative overhead Hence, if these disk drives cost $1,000 each, the eRAD solution saves $3,000 while providing better insurance, since any three random drives can fail and the system will continue to properly function. Many databases rely strictly upon RAID 5 with striping and parity for protection against drive failure because RAID 1 solutions are so costly. However, RAID 5 supports continued operation only in the event of a single inoperable drive at any one moment. Losing two or more drives under RAID 5 brings operations quickly to a halt. For the cost of adding just one more drive, eRAID mitigates the risk of data loss by providing the means to sustain up to two drive failures. LAND-5 eRAID, however, can support continuous operation even in the event several drives fail. Thus far, LAND-5 has successfully tested recovery when 50 percent of the disk drives fail. With eRAID, network administrators can manually assign the level of desired drive-failure protection. In short, eRAID allows the user the flexibility of selecting the level of drive-failure protection to fit specific needs. The tangible cost of eRAID is that an additional parity drive A separate disk drive that holds parity bits in a disk array. See RAID. equivalent is consumed for each incremental Additional or increased growth, bulk, quantity, number, or value; enlarged. Incremental cost is additional or increased cost of an item or service apart from its actual cost. protection level. For instance, if a user desires to protect a 100-drive storage system against the possibility of two concurrent drive failures, then the equivalent of two disk drive capacities will be allocated for eRAID parity-related data. Thus, while users can still read from 100 drives, they can write to only 98 drives, reducing usable storage capacity by two percent. Hence, protection from (say) five concurrent drive failures reduces data storage capacity by only five percent. As any Information Technology Manager will testify, this is a small price to pay for dramatically enhanced storage reliability. Aside from protection against multiple drive failures, some significant benefits of eRAID are: * eRAID supports continued operations even in the event of a total SCSI channel failure, whereas this would be catastrophic under traditional RAID 3 or 5. * In a traditional RAID 1 (or 0+1, 5+1, RAID 6, etc.) storage configuration, with (say) data mirrored on two independent SCSI channels, all data could be lost in one channel and operation would continue. However, if more than one drive failure concurrently occurs in both mirrored channels, then the entire storage system becomes inoperable. With eRAID, on the other hand, random multiple drive failures are sustainable. Kris Land is the chief technical officer of LAND-5 Corporation (San Diego San Diego (săn dēā`gō), city (1990 pop. 1,110,549), seat of San Diego co., S Calif., on San Diego Bay; inc. 1850. San Diego includes the unincorporated communities of La Jolla and Spring Valley. Coronado is across the bay. , CA). |
|
||||||||||||||||||

Printer friendly
Cite/link
Email
Feedback
Reader Opinion