Printer Friendly

Averting disaster with redundant hardware.

Hard disk failures can be catastrophic; they can necessitate three days of downtime while a replacement drive arrives and is installed, and a backup tape restores the data. It can take as much as three weeks to recover data initially, arid three months to recover it completely.

Initial recovery replaces damaged data or programs on a priority basis. In the remaining months, data or programs of minor value can be replaced.

While backups are traditionally the mainstay of recapturing and restoring lost data, they are not foolproof for restoring failed drives; the last day's data is generally missing and the damaged portions of the drive that are faithfully being backed up are of no value when restored.

RAID Subsystems

The crisis that results from a hard drive failure can be averted by the purchase of a redundant array of inexpensive disks (RAID) subsystem, which can remove a single failure point by spreading the risk of hard disk failure over multiple disks instead of one. The redundancy offered by multiple disks (offsets the lower reliability of a single disk. RAID can be either hardware-based or software-simulated. A hardware-based RAID system provides more robust features than its slower software-based simulated alternative. RAID offers various mixes of performance, reliability and cost, which have been standardized and categorized in six levels known as RAID levels 0 through 5. (The Microsoft Technet document entitled Reliability and Fault Tolerance, which describes RAID as implemented ill the Microsoft Windows NT Server Network Operating System, can be found at http:// content/complete/boes/bo/winntas/ technote/reliabil.htm.)

RAID fault tolerance. RAID allows restoration of information on an old drive without a backup tape. In fact, when a failing drive is removed, a server continues to operate normally and no data is lost due to the drive's removal. All this is possible because the RAID hard disk controller reconstructs the missing information from the information that resides on the remaining redundant healthy disks.

Disk striping. A RAID controller keeps track of the different physical disks and allocates storage space on all of these disks in a particular manner called striping. Striping allows the contents of a single data file or program to be spread over three or more disks. This can provide faster access to the data than if the file were solely located within one drive. If one drive fails, the missing data would be lost, but RAID level 5, for example, adds parity data (check digits) to disk striping, allowing the data on the failed drive to be reconstructed from the data remaining on the other drives.

Hot swapping. Some RAID subsystems allow a technician to remove a failing hard disk drive without shutting down a server. This ability is known as "hot swapping." Many older drives are not hot swappable. While a RAID controller supports only hot swapping, this feature requires hot-swappable drives to work.

RAID hardware recovery. When a new drive is slipped back into a server, it contains no data and is completely unformatted. This is when a RAID controller really works best; it detects the new drive and determines it is unformatted, and precedes to format the drive without anyone's intervention. The RAID controller then adds the newly formatted drive to the existing RAID array of drives. The missing data is then reconstructed and written back to the new drive. It can take as little as 20 minutes for a RAID controller to finish writing the missing data to the new drive, completing the recovery process.

Low- vs. High-End RAID Subsystems

It takes a minimum of three identical drives to obtain RAID level-5 fault tolerance. To an operating system (such as Windows NT), these three drives appear as one large drive that can then be partitioned into logical sections. Of the three drives, two hold actual data, while a third contains check digits. In reality, the data is actually striped across all three drives, such that only part of the data resides on any one drive. The check-digit (the parity data) portion rotates from one drive to the next so that each drive contains some data and some check digits.

Striping with parity data across the three drives is what provides the fault tolerance and allows for no loss of data should one complete drive be removed. If one of the three drives is actually removed, a RAID controller recreates the missing data in realtime, based on the bits of data remaining on the other two drives. While the controller can interpolate the missing data, the RAID controller and the remaining two drives are no longer fault-tolerant; the third drive must be replaced to regain fault tolerance. Should a second drive fail before the third drive is replaced, all data would be lost.

Replacing and hot swapping drives.

Low-end RAID controllers require a user to turn off a server when removing or adding drives. High-end RAID controllers allow removal and addition of drives while the server is running (previously referred to as hot swapping). A system must have swappable drives to use this feature. When a new drive is replacing a bad drive, low-end RAID controllers require the running of a rebuild utility before booting the operating system. High-end RAID controllers rebuild a new drive when a server is restarted. In fact, high-end RAID controllers with hot-swappable drives rebuild the new drive without even turning off the server.

Adding and increasing drive space. When adding additional drives, low-end RAID controllers require a user to reformat all the drives from scratch. The operating system and the backup software must be reinstalled to restore the rest of the programs and data from the backup tape. High-end RAID controllers allow the addition of ch-ives to a controller and dynamically adds them to existing drives, thereby increasing the size of the usable space without reformatting the entire drive subsystem. High-end RAID controllers with hot-swappable drives allow this while the server is still running.

Recovery costs. The cost of RAID level-5 fault tolerance is the premium paid for the RAID controller, plus the cost of one extra hard drive. For example, three 9GB drives provide only 18GB of usable space, because the remaining 9GB is used for the parity data. If an entry-level RAID controller costs $800 while a standard controller costs $400 and its 9GB drives cost $800 each, a disk drive subsystem would cost $3,200; users would be paying a 38% premium for RAID-5 protection. The additional cost of the RAID hardware would be $1,200 ($400 extra for the RAID controller plus $800 for the third drive). However, 27GB of usable space (instead of 18GB) can be added with just one more 9GB drive, dropping the premium to 30%; the additional cost remains $1,200 while the total cost becomes $4,000.

The actual cost of RAID hardware varies with features purchased--the more features, the higher the RAID premium. A fully loaded RAID controller can cost between $2,000 and $4,000, while hot-swappable 9GB drives with 5ms access and spinning speeds of 10,000 rpm can cost upwards of $2,000. Of course, prices drop on computer hardware every month. Drives purchased in January 1998 for $1,600, for example, can now be purchased for $800.

Backup Tapes

While RAID hardware provides fault tolerance and redundancy, it does not provide any recovery during complete hardware meltdown. RAID technology removes a single source of potential failure and replaces it with multiple redundant sources of failure. If any one drive fails, the failed drive can be replaced and the missing data can be reconstructed on the new drive from the data remaining on the other drives.

If more than one RAID drive fails simultaneously, however, the fault tolerance will not work. The only source of recovery from multiple-drive failures or complete hardware failure is a backup tape.

Disaster Plan

Hardware alone will not guard against a catastrophic disaster. Offsite storage of duplicate backup tapes is helpful, but a comprehensive disaster plan should be designed and tested. The disaster plan should include a disk and data replacement plan, as well as a plan to supply a stable conditioned source of power to a computer. The availability of spare parts and the technical expertise needed to install and configure them should be considered. An IT professional who services and maintains a network should be able to help design an appropriate strategy.

Editor's note: Messrs. Maida and Brown are members of the AICPA Tax Division's Tax Technology Committee.

If you would like additional information about this article, contact Mr. Maida at (609) 8826874 or or Mr. Brown at (703) 848-2502 or


Joseph C. Maida Shareholder Nicholas C. Maida C.P.A., Chartered Princeton, NJ

Steven D. Brown Managing Partner Brown & Brown, PC McLean, VA
COPYRIGHT 1999 American Institute of CPA's
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 1999, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.

Article Details
Printer friendly Cite/link Email Feedback
Title Annotation:computer hardware
Author:Brown, Steven D.
Publication:The Tax Adviser
Geographic Code:1USA
Date:Oct 1, 1999
Previous Article:Current developments.
Next Article:ADR procedures.

Related Articles
Disaster recovery planning checklist.
One Pc now controls and monitors your process. (Injection Molding).
Rapid restores from data disasters.
TCO should include value as well as cost.
Peace of mind: disaster recovery plans can keep your business alive.
Redundancy key to high availability voice services.

Terms of use | Privacy policy | Copyright © 2022 Farlex, Inc. | Feedback | For webmasters |