Improving data availability with RAID 6.
What is RAID 6?
While no formal standard specifications on RAID 6 have been published to date, a disk array configuration that can provide failure protection to more than one hard disk is called RAID 6. Recently, the Storage Network Industry Association (SNIA) defined RAID 6, in its Common RAID Disk Data Format Specification (DDF), as a damage protection method for two hard disks. Though still only a draft, most storage device manufacturers have accepted this definition of RAID 6.
The rise of RAID 6 is closely connected with the extensive popularization of disk array technologies since 2000. Vigorous market demand has helped stimulate the improvement of storage technologies and the shortcomings of common RAID levels 0,1 and 5 are being brought to light. Newer hard disks and RAID 6 chip accelerators are laying the foundation for RAID 6 technologies to deliver the next stage in data protection.
[FIGURE 1 OMITTED]
A description of RAID 6 that may soon enjoy wide acceptance was launched in 1997 based on Reed-Solomon coding. However, it attracted little attention until recently. The damage protection method can be briefly explained via these two mathematical expressions:
P = D1 + D2 + D3 + D4
Q = 1*D1 + 2*D2 + 3*D3 + 4*D4
They are actually simultaneous equations, positing that if any two (2) digits in P, Q, D1, D2, D3 and D4 become unknown (or lost), then the above simultaneous equations will become an independent binary simple equation and we can regain the 2 lost digits simply through resolving the equation. Therefore, it can provide for data recovery.
Now to annotate P, Q, D0, D1, D2 and D3 in terms of disk array: let D0, D1, D2 and D3 stand for user data stored in different hard disks and P and Q represent the parity data used to protect the user data. See Figure 1.
From above mathematical expressions, we know that even if two data points are lost (two hard disks damaged), we can figure out their exact values (data recovery) by making use of remnant data (undamaged hard disks).
New Technologies and Demands Stimulating RAID 6 Development
Although Fibre Channel and Serial Attached SCSI (SAS) meet the demand for large storage capacity, with support for over 125 devices, the failure rate of the storage system consisting of a large number of hard disks rises significantly as the number of hard disks increase. The challenge is how to make a large capacity storage system that is both highly reliable and cost effective.
As far as cost is concerned, SAS specifications allow users the flexibility to create a large-scale disk array by using SAS or SATA hard disks at comparatively lower cost.
A disk array composed of no more than 8 hard disks with RAID 5 single parity capability appears to be highly reliable. However, increasing the number of hard disks in arrays effectively raises the failure rate of the first hard disk and before the system has recovered using a spare, the failure rate of the second hard disk will multiply. Therefore, the disk array system composed of more hard disks needs further protection measures to ensure available data access in case two hard disks are damaged simultaneously. Obviously, RAID 5 cannot reach the goal, but RAID 6 can.
We believe that in step with increasing storage capacity, demand for RAID 6 will become much stronger. In addition to providing better overall protection when connecting more hard disks, RAID 6 solutions with multiple parity can resolve the perennial problem unavoidable with traditional single parity array configurations (such as RAID 1, 5 and 10).
The Inevitable Vulnerability of RAID 5 Is Solved by RAID 6
In the disk array configuration that consists of multiple hard disks for single parity capacity, when one hard disk damaged, the disk array can replace the damaged hard disk with a spare hard disk and restore the system's normal operating mode, a process known as "disk array recovery". However, a disk array with single parity capability is vulnerable during the recovery. Whenever an error occurs in such a process, data corruption is usually the result.
In RAID 5, data is cut into fixed-size strips and separately stored in the hard disks in turn; meanwhile, a parity block will be created in each horizontal stripe or block and is designed to protect other data blocks in horizontal blocks, but can only handle single data block damage. Following figure illustrates the distribution of data blocks and parity (redundancy) blocks in RAID 5, composed of 4 hard disks: (See Figure 2).
[FIGURE 2 OMITTED]
Damage to one hard disk can be regarded as a damaged horizontal block damaged and it can no longer read and store the data.; However, the remnant data blocks and redundancy blocks can rebuild the damaged blocks (or data on the damaged hard disks).; The rebuilt data will be transferred to another hard disk used as the spare disk and after all data is copied, the spare hard disk will take the place of the damaged hard disk.
In the process of disk array recovery, data recovery is the most important. During the data recovery process, the arraywhich needs to read out all blocks from the undamaged part of the hard disks. If any small section of data were to fail to read out, data recovery will be half-baked and a corresponding "shadow section" will be found after the data is transferred into the spare hard disk. In point of fact, the corresponding sector may be sound, i.e. the reading and writing operations are still available in this sector but just the recovery data in this section is incorrect.
Following figure illustrates how the shadow section is created in the process of disk array recovery in the RAID 5 composed of 4 hard disks:
Generally speaking, in the process of disk array recovery all sectors of the remnant hard disks will be read from top to bottom, that is to say all bad sectors in these hard disks will be found and accordingly create corresponding "shadow sections".
These shadow sections show an important fact that in the process of RAID 5 recovery, if there appear any bad sectors, RAID 5 data will be half-baked and it may cause some data to be damaged, or severe damage to the system, depending on the location of "shadow sections".
Because RAID 6 can recover the lost data even when two hard disks are damaged, recovering the damaged hard disk will not be affected if there are bad sector(s) in other hard disk(s); however, if there are more than 2 bad sectors occurring in the same horizontal section, RAID 6 will meet same problem as RAID 5, but the potential is extremely low. Some types of RAID 6 may provide further advanced damage protections for 3, 4 or more hard disks that may reduce the failure rate even further.
Characteristics and Efficiencies/Functions of RAID 6
After understanding the working principle of RAID 6, many people may ask the question: if the system has to make two backups, will the functions be influenced? As far as present RAID 6 is concerned, except for Random Write efficiency of about 70% that of RAID 5, in other functions, such as Random Read, Sequential Write and Sequential Read, efficiency of RAID 5 and RAID 6 is equivalent.
RAID 6 provides users with unprecedented reliability for disk array systems; with dedicated I/O processors from Intel with RAID 6 accelerators, disk array manufacturers are responding to pressure from the market and are beginning to enable RAID 6 in their products, helping to accelerate acceptance and availability of RAID 6. The new technical specification not only raises the level of competitiveness in the storage device market, but also enables a new challenge.
Chi-Chen Yu is vice president of engineering and chief technology office and hammer chien is engineering manager at Promise Technology, Milpitas, CA
|Printer friendly Cite/link Email Feedback|
|Title Annotation:||Storage Management|
|Publication:||Computer Technology Review|
|Date:||Jun 1, 2005|
|Previous Article:||IP-SAN, the networked storage wave-of-the-future?|
|Next Article:||Managing e-mail as a business process.|