Cost-optimizing RAID systems: comparing the availability, performance and cost of 36GB-drive striped parity (RAID-5) to 146GB-drive mirrored arrays.
The advent of drives with individual capacities above 100GB will mean a migration away from RAID-5 implementations to RAID-1+0 implementations in order to maintain current levels of (striped-parity array) availability, increase current levels of performance and decrease current hardware acquisition cost levels. This article compares the availability, performance and cost characteristics of 36GB-4 drivestriped parity (RAID-5) to 146GB-drive mirrored arrays.
Raid Level Comparison
In RAID configurations, drives are arranged into striped or mirrored array groups. RAID levels vary according to how they lay Out data, handle redundancy, and by how many drives they require.
RAID-1 (JBOD mirroring): Requires drives in pairs for mirroring. Files reside on separate disks, and two copies of the data are kept, one copy per disk.
RAID-5 (striped parity): Requires three or more drives. Data and parity are rotated such that they are distributed evenly across all disks in the array.
RAID-1+0 (mirrored striping): Requires three or more drives. Data is striped and mirrored on adjacent drives in the array.
Data Availability Comparison
RAID levels vary by the way they handle data redundancy, which is the method used to provide data availability.
RAID-1 (JBOD mirroring): If one disk in the mirror fails, no data is lost; however, simultaneous loss of both mirrored disks results in data loss.
RAID-5 (striped parity): A single disk failure does not result in data loss; however, if a second disk fails before replacement of the first failed drive and prior to stripe reconstruction, all the data in the stripe is lost.
RAID-1+0 (mirrored striping): Losing one drive does not cause data loss. Losing any two adjacent drives causes data loss. Losing any two non-adjacent drives does not cause data loss.
RAID levels vary by the way they handle data movement, which is the method employed to retrieve and store data. RAID levels perform latency-driven tasks well while others perform band-width-driven tasks well.
RAID-1 (JBOD mirroring): Each file has to be written twice, once to each disk in the mirror. Caching RAID controllers help by requiring only one write from the host (but they still must perform two writes to the disks). Read performance is enhanced because the RAID controller can read from either disk in the mirror, so if one disk is busy, the data can be retrieved from its mirrored counterpart.
JBOD mirroring is generally used with randomly-accessed small files, generally, those under 8KB in size. It is also used for mirroring host operating systems and/or host applications because, although they are large in size, they are infrequently accessed or updated. JBOD mirroring is particularly well suited for highly random write-heavy workloads because, since there is no parity generation required, response time is kept low.
RAID-5 (striped parity): This RAID level causes a "write penalty" for each storage request because new parity must be generated for each bit written, which requires two writes plus reading back all other pieces of the stripe. Also, since the controller must wait until only after proper execution of both data and parity IO processes before it can confirm that a write operation has been properly executed, additional overhead is expended.
The greater the number of drives, the greater the number of parity calculations, so keeping the size of the array group small increases performance, but requires more drives for parity. The larger the stripe size, the higher the number of parity calculations due to more data sent per IO, so the longer the time it takes to complete a write Operation.
The smaller the stripe size, the higher the frequency of parity calculations since more writes occur, which is why large, record writes are mated with large stripe sizes to transfer more data per write operation.
Read operations are not impacted by parity, so data retrieval is as fast as with RAID-1 or RAID-1+0 and, in some cases, as with a large file which is spread out across drives as opposed to residing in its entirety on one drive, faster than RAID-1.
RAID-1+0 (mirrored striping): Like RAID-1, all writes have to be written twice and data can be retrieved from either mirror. Like RAID-5, large files can be distributed across multiple drives, which takes advantage of parallelism wherein many drives do the work of one.
Due to the combination of striping (for large files) and mirroring (for small files), plus its lack of parity (for frequent writes), RAID-1+0 often exhibits higher throughput than RAID-1 and lower response times than RAID-5.
RAID levels vary by the way they handle failed drive replacement, which is how fast they can fully recover after a failed drive has been replaced with a new drive. Resiliency is how fast an array can recover after replacement of a failed drive to full operation and the degree to which performance is negatively impacted during the data recovery period.
RAID-1 (JBOD mirroring): During a single disk failure, data is at risk of loss because there is no redundancy. After a failed drive has been replaced, data is copied from the original drive to the new drive (the process of recovering data to a mirror is called "resilvering"). This operation is dependent on individual drive capacity and on formatted transfer rate, but generally will not take longer than an hour at most, even for the upcoming 10Krpm 146GB drive.
RAID-5 (striped parity): During a single disk failure, data is at risk of loss because there is no redundancy. After a failed drive has been replaced, data is recalculated from parity and copied onto the new drive in the stripe (this process of recovering data from parity is called "reconstruction"). This operation is dependent on individual drive capacity, formatted transfer rate, and stripe width (the number of drives in the stripe) and can take many hours to perform.
Reconstruction takes longer to complete the larger the data stripe size and/or the higher the drive capacity; moreover, during the reconstruction period, the array group will exhibit a significant decrease in performance. Reconstruction time can be kept to a minimum using small capacity drives and small stripe widths of three to four drives; the former requires more drives for data and parity; the latter requires more drives for parity. Or, mirroring two RAID-5 arrays together allows one mirror to take all IO activity while the other undergoes parity rebuild. Deploying a mirrored array (RAID-1 or RAID-1+0) eliminates the reconstruction issue in its entirety.
RAID-1+0 (mirrored striping):
During a single disk failure, data is at risk of loss because there is no redundancy between the failed drive and either adjacent drive. After a failed drive has been replaced, data is copied from data on the two adjacent drives onto the new drive (this process is also called "resilvering"); other drives in the array are not affected. Just as with RAID-1, this operation is dependent on individual drive capacity and on formatted transfer rate, but generally will not take longer than an hour at most even for the upcoming l0Krpm 146GB drive.
Hardware Acquisition Cost Comparison
Back when drive sizes were smaller, the incremental difference in capacity from one size to the next was relatively small. With today's drives, however, the difference is now much more such that it significantly affects the performance and cost of RAID implementation. For example, the proportional difference between a 9GB drive and an 18GB drive is 1-to-2, and the relative difference is only 9GB, whereas the proportional difference between a 73GB drive and a 146GB drive is still 1-to-2, but the relative difference is eight times at 73GB.
With 146GB drives, writes to parity arrays will take longer, and reconstruction from a failed drive will take significantly longer, especially when the stripe width exceeds four drives. It makes sense, then, to analyze the relative cost differences between two arrays, where one consists of a RAID-5 configuration using 36GB l5Krpm drives where the stripe width is limited to a maximum of 14 drives (13+1) and the other array consists of a RAID-1+0 configuration using 146GB 10Krpm drives with the mirror limited to a maximum of 14 drives. For cost, $1,000 is used as a base reference for a 36GB l5Krpm drive and 2.2X is used as the multiplier for the cost ($2,220) of a 146GB l0Krprn drive. Only relative drive costs are used; controller costs, power supplies; cabinets; cables, host bus adapters; and all other storage system components are not used for this analysis. It is important to note that, in fact, when non-drive hardware pieces are taken into account, the storage system that uses fewer drives will also use few er non-drive componentry, so it will be even less expensive than what is indicated below. But, again, what is being analyzed is the relative difference between a RAID-5 array using small capacity drives and a RAID-1+0 array using large capacity drives.
The first relevant comparison is the total number of disk drives generated from a given raw capacity (note that raw capacity will be held constant for each comparison that follows). Raw capacity is usable capacity plus redundancy capacity, which takes the form of parity for RAID-5 and a mirror for RAID-1 and RAID-1+0. At lower capacities, the difference between the number of drives required for the two RAID level/drive capacity configurations is not significant, but starting around 2TB, the difference becomes significant and grows from there.
The total usable capacity generated from a given raw capacity, which is total raw capacity minus capacity generated for redundancy (again, mirroring for RAID-1 and RAID-1+0, which is seven drives for a 14-drive array; parity for RAID-5, which is one drive for a 14-drive array). At lower capacities, the difference in usable capacity, which is that utilized to store primary data, for the two RAID level/drive capacity configurations is not significant, but starting around 2TB, the difference becomes significant and grows from there.
Next, when the costs of each configuration are compared the 2TB raw capacity point is where the differences start to become significant. This is due to the fact that at 2TB, four times as many 36GB drives in four 13+1 RAID-5 array groups as are required to generate 2TB of raw capacity than 146GB drives in a single 7+7 RAID-1+0 array group.
The cost per raw capacity is a measurement of RAID cost because it illustrates how much the entire RAID system costs. At every raw capacity point, using a RAID-1+0 array with 146GB drives is less expensive than a RAID-5 array with 36GB drives.
Usable capacity is the amount of storage space that is actually utilized to store primary data, which makes it a relevant measurement of a storage system's cost as it relates to the portion that is actually used to produce productive work. With the exception of a usable capacity below 1TB, the cost per usable GB is about equal for a RAID-1+0 array with 146GB drives to a RAID-5 array with 36GB drives.
The new 146GB large capacity drives will be here soon. It is time to rethink RAID and the traditional notion that RAID-5 is less expensive than RAID-1 or RAID-l+0. Beyond 2TB, using 146GB drives in a RAID-1+0 implementation will prove more cost effective while providing enhanced data protection, read/write performance, and resiliency.
Richard Sims is network storage product marketing manager at Sun Microsystems (Santa Clara, Calif.)
|Printer friendly Cite/link Email Feedback|
|Publication:||Computer Technology Review|
|Date:||Feb 1, 2003|
|Previous Article:||A dangerous money Waster: The hidden costs and security risks of in-house backup programs can endanger data security and the bottom line. (Internet).|
|Next Article:||Watch your back: The mounting risks of unauthorized data access, theft and corruption in secondary storage. (SAN).|