Image backup & disaster recovery.
In day-to-day computer system management, the requirement to restore from a backup media, such as tape, online disk, UNC path, FTR CD-R, optical, etc., will more likely be to restore a strategic file that has become corrupted or accidentally deleted. Less frequently, an entire drive must be restored due to hardware failure or a corrupt operating system partition, which renders the system unbootable.
Normal file-by-file backups usually work well except when the failure occurs on an Intel-based machine's Windows NT/XP/2000/2003 operating system (OS) disk. This is because Intel machines only support one active boot disk. Here are a few common failures that can totally cripple a Windows server or workstation: registry file corruption, installing new device drivers, deleting strategic files and a failed OS hard drive. Any of these problems will make a system unbootable. If your system is unbootable, then a regular file-by-file backup cannot be used until at least a minimal OS has been re-installed; disk partitions are established when required; device drivers are reinstalled; and, if used, any third-party backup software product is re-installed. Only then can a previous file-by-file backup tape be read to recover lost files and/or restore the Registry of the failed machine. For those unfamiliar with Registry files, these are critical system files that contain key information about program groups, security, network connections and other important data that can be very difficult and time consuming to recreate manually, if lost.
A standard file-by-file sys tern disk recovery can take anywhere from two hours to two days, depending on circumstances and the degree of difficulties encountered. Because of file-by-file recovery limitations, alternative methods of backup have been devised that circumvent key file-by-file recovery limitations when attempting to restore an OS partition or disk. While there are a number of names for this alternative method of backup, it is commonly called "Image Backup." Users of Digital Equipment Corporation's old VMS operating system used the term "Stand-alone Backup." Currently in vogue is the phrase "Snap Shot."
Image backup is a process that can completely back up a partition or an entire physical hard drive on a low level, bit-by-bit basis. The image backup process typically does not care what is on the hard drive or even what the hard drive is doing at the time of backup. The backup process simply starts at the first block on the drive and reads every cylinder, track and sector until every bit on the drive has been backed up. The crudest implementation of image backup technology does not see partitions, files or any other arbitrary file system structures--it doesn't know if it's backing up data or empty space. More sophisticated software allows partition-level backup, only active disk cluster back up, software compression, media spanning, and tape or disk output media choices. Even more useful is software that allows (from a recovery perspective) backing up and restoring from network UNC paths, FTP devices, and for large IBM users, Tivoli Storage Manager. Being able to back up and recover from network storage locations enables disaster recovery protection for machines that do not have locally attached tape storage devices.
Performing an image backup is only half the story on how to recover an unbootable machine. The other half is how the disaster recovery part works. Remember, the machine died for some reason and won't boot after any necessary repairs. So, how do you boot the dead machine and "quickly" restore its operating system?
The alter ego to image backup is the ability to quickly reboot a failed system using nothing more than a universal boot CD containing a temporary OS with every device driver, and the image recovery software. The CD boots a "dead" machine, loads all device drivers, connects to the network and initiates the restore of the failed machine's OS partition. With sophisticated software, the recovery can be from local or remote tape, local disk, UNC path, FTP device or from Tivoli. When image technology is combined with network UNC path backups, there is no faster means of recovering a failed machine. With server down time costs ranging from hundreds to thousands of dollars a minute, even speeding up the recovery of a failed machine by a few minutes can pay big dividends. On a fast network, users can expect to restore image backups at speeds from 300-400MB per minute, while users of gigabit networks can see 600-800MB per-minute speeds (second-generation LTO drives also restore at this speed when tape is available).
Part of the beauty of image backup technology is that restores can be scripted so any authorized person can simply insert the boot CD into a dead machine and initiate a "Look Ma, No Hands!" recovery.
Image backup and disaster recovery can be discussed in general terms but when it comes to describing exactly how the process works, it's best to use a reference. The following describes how one backup package uses image technology for backup and disaster recovery of Microsoft Windows servers and workstations.
The software we'll reference (UltraBac for Windows NT/XP/2000/2003) is both a regular file-based backup program and an image backup product that allows either or both methods of backups during a backup session. After the software has been installed, the backup administrator creates the required backup groups and sets. A backup group controls a secure, service based, scheduled "lights out" backup that sets the time, day, device to use, and other relevant parameters including which files to back up. A standard file-by-file backup set is a definition of files, which can be as small as a single file and as large as a logical partition. An image set defines one or more disk partitions (up to 32). Selecting a disk drive for inclusion in the image set includes any partitions that might be present, even if it is a multiple boot operating system disk. Therefore, a scheduled unattended or interactive "on demand" backup session can back up any disk in the network using standard file-by-file backup technology and can include an image backup of any of the local machine's hard drives. Up to 32 concurrent backup sessions can be launched simultaneously. While image backup is designed to restore an entire partition as fast as possible, it also allows restoring single files, provided this option was enabled during the backup phase. This is required in order to create an index to the individual files within the image file stored on media. This process uses additional overhead and, therefore, upon install is turned off by default.
Implementing the image option simply requires implementing a backup regime where an image backup is made of the system drive on a periodic basis (e.g., every night, once a week, etc.) Unlike a previous version's DOS-based recovery technology, the new WinPE-based CD requires no setup. After the first image backup, any operator can initiate a disaster recovery operation by using the recovery CD.
In most business environments, the controller and tape drive are SCSI-based. While having SCSI devices is not a requirement for image backup and recovery, it is probably more reliable than IDE/EIDE or floppy-based disk/tape drives. The major advantage of SCSI-based disks for image backup and recovery is bad blocks on the original disk drive and/or bad blocks on the restore disk are irrelevant. In other words, you do not have to worry about the cylinder, track or sector configuration of the disk drives. Another advantage is that the only restore limitation of a SCSI disk is the target disk must be equal to or greater in size than the original. This makes image backup a simple way to upgrade a SCSI operating system disk to a larger size. Note, however, that upgrading to a significantly larger disk will probably require the administrator to use the Disk Administration utility to create a new partition before the extra space on the new drive can be accessed.
With the implementation of backups with image sets, everything necessary is in place to perform a disaster recovery of a failed OS partition or disk. If a failure occurs, the following steps provide a simple and quick recovery from a network UNC path:
* Power on the equipment
* Insert the boot CD
* Boot the machine
* If unscripted, key in the UNC path to restore the machine
* Reboot the machine after the network restore is complete.
The process addressed above will automatically initiate the recovery software to restore a failed OS partition from a network UNC path.
To restore an image backup from a tape device:
* Power on the equipment (including the tape drive, if a local restore)
* Insert the desired backup tape containing an image set
* Insert the boot CD
* Boot the machine
* Select the tape device if more than one local or remote
* Select the image set to restore
* Acknowledge the partition selection (closest match is highlighted)
* Initiate the restore
* Re-boot the machine.
This process will automatically initiate the recovery software to read the tape and locate the first image set. From the list of all disks backed up by the set (there can be up to 32 disks in one image set), the operator simply selects the disk to recover and then, from a list of all available online disks, selects the target disk to initiate the physical restore.
Note that the target disk can be the original unbootable disk or a new disk right out of the box. No formatting other than a standard low-level disk format is required. Whatever was on the original disk, including multiple partitions with various operating systems, will be restored exactly as it was last backed up.
The restore process bypasses the normal operating system and therefore approaches the maximum speed that the hardware configuration is capable of sustaining (tape drive, controller, disk drive, network path, etc). After the restore process is complete, a reboot is all that is required to bring the operating system disk back to the state it was when backed up. This process also allows an administrator to go to another machine with the same or similar configuration and perform a recovery, quickly turning the alternative machine into the identity of the failed machine.
Image backup is relevant even with dynamic mirroring and clustering. Dynamic mirroring instantly mirrors corruption and file deletions. Mirroring strategic failure results in every OS partition becoming equally unbootable. A failed cluster node must also he brought back online ASAP or the Cluster Server can become a single point of failure. Few products can restore to a failed node on an active Cluster Server.
Image backup is relevant even with tile-by-file backups. While single files can be restored from image backups, the real power of image is quickly restoring a failed OS partition or disk. Since image backups can be made to UNC path, FTP and Tivoli, all machines can be protected regardless of whether tape is available or not. Image backups are basically everything in a partition or nothing. Image technology does not support file selection logic or incremental or differential backup capability. Image backups also typically bypass the file system and do not absolutely guarantee file integrity of open files. Therefore, they are a poor choice for backing up active databases like Microsoft Exchange and SQL.
Image backup is relevant even with RAID. Hardware-based RAID is transparent to image backup. The image software sees the total storage capacity of a RAID array as one physical disk. When creating an image set, one or more partitions can be selected for backup. Note that when the software performs an image backup, only active disk clusters are backed up and when the output media is a network UNC path, software compression is invoked to reduce the output media requirements. When the output media is local tape, either hardware or software compression may be used and if the media fills, the image backup can span multiple tapes. While RAID insulates a system against a single disk drive failure, it cannot guard against registry corruption, a deleted file, or a single point of hardware failure. Less expensive RAID devices do have single point of failure when constructed with a single controller, a single power supply, a single cable to multiple devices, and/or a disk channel card. Any of these listed problems could cause a RAID-based system to be unbootable. Even expensive RAID devices with redundant controllers and power supplies sometimes only have a single disk channel card supporting a "bank" of disks. If that single disk channel card fails, then all of its connected disks are lost as well!
The combined attributes of file-by-file and image backup provide users with the safety and security of being able to completely control their backup process and when required, perform the recovery process which best suits the situation. Users can combine both image and file-by-file backup technology, or use them independently. This allows administrators to perform network backup critical OS partitions using image technology and regular file-by-file backup of all other partitions containing active database files like Exchange and SQL. This strategy provides for lightning-fast OS restores, guarantees the integrity of databases and allows full, incremental or differential backups to any media of choice.
Bottom line image backup, coupled with a universal boot CD, represents the fastest and simplest method to recover from a failed operating system disk. With almost unbelievable figures quoted by industry trade journals for the hourly cost of server downtime, every minute counts when recovering a failed operating system. While your organization's downtime cost may be bearable, one quantifiable fact is that the image backup and boot recovery process eliminates the majority of the time and most of the frustration of recovering a failed system.
Until servers and workstations have single point of failure potential totally eliminated, using image technology with or without regular file-by-file backup is a low-cost option that can supplement even dynamic mirroring, clustering and RAID to further enhance system reliability and uptime.
Morgan Edwards is president and CEO of UltraBac Software (Bellevue, WA )
|Printer friendly Cite/link Email Feedback|
|Publication:||Computer Technology Review|
|Date:||Aug 1, 2003|
|Previous Article:||Blackout power.|
|Next Article:||AIT Tape: best long-term replacement for archival Magneto-Optical.|