XOR SUPPORT IN THE DISK DRIVES.
Many of today's RAID controllers implement various types of XOR hardware engines for parity calculations. These hardware engines typically consist of a high-speed memory buffer and a XOR calculator/sequencer. RAID controllers using this method for XOR calculations normally perform RAID writes from 15MB/sec to 40MB/sec. The main bottleneck in these types of designs is the speed of the memory chips that are being used by the XOR engine for its data buffering (Fig 1).
To perform a write operation to a RAID 5 disk array, it is necessary to perform what is normally referred to as a "Read-Modify-Writeback" operation. Several steps have to be performed:
1. Read "old" data from the data drive.
2. Read "old" parity data from the parity drive.
3. XOR the old data from the data drive with the old parity data from the parity drive.
4. Write the new data to the data drive.
5. XOR the new data with the parity data and write the result to the parity drive.
For each RAID 5 write, it is, therefore, necessary to perform up to four data transfers before the write command has been fully completed and the parity information on the parity drive has been updated. Clever caching can reduce the number of required data transfers, but in most cases, there will be at least two or more data transfers required for each RAID 5 write. To be able to sustain a 40MB/sec RAID 5 write data stream, it will be necessary for the XOR engine to sustain at least four times the required bandwidth, in this case over 160MB/sec. If the overhead of actually calculating the XOR data is added, the required XOR engine memory bandwidth would have to be in the 200MB/sec-250MB/sec range or even more.
Designing a XOR engine data buffer with a 250MB/sec bandwidth is not a trivial task. At 250MB/sec, each byte has to be transferred every 4ns (nanoseconds) in and out of the XOR engine memory buffer. If we assume that this memory buffer is 64-bit wide, each 64-bit word would have to be transferred every 32ns. This is possible to do using today's very fast SRAM or synchronous DRAM memory.
With Fibre Channel technology, however, it is now possible to reach transfer rates significantly higher than 40MB/sec. We are now looking at several hundred megabytes per second and, in the near future, close to 1GB/sec. Designing a hardware XOR engine capable of handling these performance levels is becoming increasingly difficult with today's technology. Therefore, new solutions for calculating XOR in RAID controllers are needed.
The Solution-XOR Support In The Disk Drives
The solution to this problem is moving the responsibility of calculating XOR data from the RAID controller to the disk drives instead. Performing the XOR operation in the disk drives eliminates the need for the RAID controller to perform any XOR calculations. Each disk drive in an array will be able to perform XOR calculations in parallel with other drives (Fig 2).
This parallelism allows for arrays to be built with very high data transfer rates. It is the disk drives and not the RAID controller that determine the upper performance levels. As disk drives get faster and faster, so will the RAID disk array. This scalability increases the lifetime of the RAID controller significantly. There is no longer a need to upgrade the RAID controller to keep up with increased disk drive speed.
Most major disk drive manufacturers, as well as disk drive silicon providers, already have products that support this new XOR calculation standard. Seagate Technology's line of Fibre Channel disk drives were the first drives to come out on the market with full XOR support. IBM recently released their new drives with XOR support and other major disk drive vendors are soon to follow with product releases. Silicon providers such as QLogic also have full support for hardware-assisted XOR generation in their Fibre Channel disk drive controller chips.
Array Controller Supervised XOR Operations
A set of new SCSI commands has been defined to allow for disk drives to perform XOR calculations: XDREAD, XDWRITE, XPWRITE, XDWRITE EXTENDED, REGENERATE, and REBUILD. For RAID disk array controllers, only three of these commands are necessary--the XDREAD, XDWRITE, and XPWRITE commands. The remaining commands can be used by software RAID implementations, but because of the inherent data corruption issues involved in software RAID, there has been little success in their implementation. By using only the XDREAD, XDWRITE, and XPWRITE commands, all required RAID functionality could easily be implemented.
A RAID 5 write operation will typically perform the following command sequence:
1. An XDWRITE command is sent to the data drive. This transfers the new data to the data drive's internal data buffer. The drive, then, reads the old data into its internal data buffer, performs a XOR operation on this old data and the new data, keeps the result in its data buffer, and writes the new data to the medium.
2. An XDREAD command is sent to the data drive. The data drive will return back the calculated XOR sum of the old data and the new data.
3. An XPWRITE command is, then, sent to the disk drive containing the parity information. This command transfers the XOR data (received in the previous XDREAD command) to the parity disk drive's data buffer. The parity drive reads its old parity data and performs a XOR calculation on this old parity data and the new parity information. The result is, then, written to the medium. A full RAID 5 write has now been completed (Fig 3).
Drive-Based XOR Calculation Benefits
There are several benefits to using disk drive-based XOR calculations:
* Increased Performance
Since each of the disk drives in the disk array has its own built-in XOR capability, it is now possible to perform many XOR calculations in parallel on all drives in an array. This allows a RAID controller to keep multiple outstanding RAID 5 writes executing in parallel at the same time. The RAID controller is no longer the limiting factor in RAID 5 arrays. Since many RAID 5 writes can be executed in parallel, new levels of RAID 5 write performance can be reached.
Performance in disk drive XOR-based disk arrays is fully scalable. As disk drives get faster and faster, so will the XOR function in the drive. As disk drives get faster, so will the disk arrays. The RAID controller is no longer the limiting factor to what performance levels can be achieved. The RAID controller's product lifetime will be significantly increased.
It is now possible to build very large disk arrays and SAN networks based on a standardized method for calculating XOR in RAID environments. It is no longer necessary for RAID software and hardware designers to invent their own methods for XOR calculations. It will be possible to combine solutions from RAID developers, as well as disk drive manufacturers, in ways previously unheard of.
The life cycle of a disk array based on a RAID controller using disk drive-based XOR generation is extended significantly. Since performance is no longer determined by the hardware implementation in the RAID controller, the need to upgrade to a faster RAID controller becomes less urgent. As disk drives get faster and faster, so will the disk array. It is the disk drives that determine the maximum performance level, not the RAID controller.
As illustrated in this article, device-based XOR calculation is the way of the future. As performance demands increase, it becomes necessary to implement new, improved methods for generating XOR in RAID disk arrays. Device-based XOR is the answer for several reasons.
Scalability is important in today's storage implementations. The current trend today is moving towards larger Storage Area Network configurations with terabytes of RAID disk storage. Moving the responsibilities of generating XOR parity data out to the peripherals allows for maximum scalability, as well as significantly increased RAID controller lifetime expectancy.
Johan Olstenius is the president of OneofUs (Taipei, Taiwan).