Backup & recovery using revolutionary MAID architecture: Part 2.D2D2T (Disk-to-Disk-to-Tape) Refers to backing up data on disks first and tape (or optical disc) second. Backing up onto tape is performed at less frequent intervals than from disk to disk. See D2D and virtual tape. Options The first type of disk option, simply buying an inexpensive disk array and backing up to a filesystem, doesn't sound that expensive at first. The disk array alone will cost more than the average price of a similarly sized tape library, especially if you consider compression. Your backup software See backup program. (tool, software) backup software - Software for doing a backup, often included as part of the operating system. Backup software should provide ways to specify what files get backed up and to where. may also require a license to back up to a filesystem-type device, such as a disk array. Additionally, backing up to a filesystem is a little different than backing up to tape, and there will be a management cost associated with that difference. For example, while tape library systems can be dynamically shared across multiple backup servers A computer in a network used to store copies of files from client machines or other servers. Such servers typically have their disks set up in a RAID configuration to provide fault tolerance. See backup program, RAID, SAN and LAN free backup. , disk cannot. Therefore, you will need to create separate volumes for each backup server and manage those volumes as needs grow. Also, while most backup software products understand when a tape runs out of space, they don't like it very much when a filesystem runs out of space. A tape will get marked as full, while the disk will simply report an I/O (Input/Output) The transfer of data between the CPU and a peripheral device. Every transfer is an output from one device and an input to another. See PC input/output. I/O - Input/Output error, and warnings will go off all over the place. Some backup software products keep attempting to write to a filesystem, even after it's full. The differences between filesystem-based backups and tape/virtual tape-based backups may manifest themselves as anything from a minor inconvenience to an actual loss of data. [FIGURE 1 OMITTED] The second option, a disk-based virtual tape library A hard disk array that emulates a tape library. A virtual tape library (VTL) enables the storage medium to be switched from tapes to disks while continuing to use the existing tape backup software. See virtual tape system and storage virtualization. system, is easier to integrate into an existing backup and recovery system and gives you the benefit of both disk and tape, without the downsides of tape mentioned earlier in this paper. The challenge with most of today's VTL See virtual tape library. products is that they are priced based on their value relative to physical tape--typically 5-10 times the cost of a physical tape library. While their value may justify such a price, it does not allow for a cost-effective deployment of an all-disk-onsite solution. Enter MAID The recently introduced acronym acronym: see abbreviation. A word typically made up of the first letters of two or more words; for example, BASIC stands for "Beginners All purpose Symbolic Instruction Code. MAID refers to a massive array of idle disks--as the disks in a MAID array power down when not being utilized. To fully understand the value of MAID, let's first examine the use of traditional RAID in an array designed for secondary storage needs, such as backups. While tapes were designed for backups, RAID wasn't. To explain, let's compare backups to a large database. An update to a single table will, at some point, cause the flushing of buffers, causing all data files to be written to disk. This write will write data to every disk in every volume on which the database resides. A large query will also require reading the entire database into memory, reading from all disks simultaneously. Therefore, a large database will constantly read from and write to the entire disk array. Backups are a Write Once Read Occasionally (WORO (Write Once, Read Occasionally) Refers to data archives that are accessed only once in awhile. See active archiving and MAID. ) application. That is, they write data once, but only occasionally access that data. Also, when one backup is being written to one disk, it is not writing to the other disks in the RAID array. The same holds true for restores. A restore reading from one disk does not read from the other disks in the array. WORO applications, therefore, do not require access to the entire array all the time. That means that, at any given point in time, there are a lot of disks spinning in a RAID that are neither reading nor writing. Perhaps they're spinning because they're being used to write RAID-5 parity, which is written to all members of a RAID group. Perhaps they're spinning because a filesystem is actually striped across several disks. And perhaps they're spinning simply because that's what RAID arrays do. Of course, all of this unnecessary spinning of disks requires constant power, which generates constant heat; therefore, it requires constant cooling as well. In a backup and recovery storage configuration, traditional RAID seems like overkill overkill Vox populi An excess of anything . Having all of the disks spinning at the same time seems like having all the tapes in a robot loaded in drives all the time. It's simply not necessary. The high-performance cache and non-blocking interconnect schemes used in RAID arrays are also designed for applications with a lot of IOPS IOPS Input/Output Per Second IOPS Input/Output Operations Per Second (server performance measurement) IOPS International Organization of Pension Supervisors IOPS Information Operations Planning System IOPS Internet Official Protocol Standards , and are not designed for high-bandwidth applications like backup. What if there was a technology that brought the benefits of RAID to secondary storage, but didn't have the power and cooling overkill of RAID? What if you could have the benefits of parity protection without having to power on all the drives in a RAID set? What if you re-examined access requirements and minimized some of the expensive schemes used to maximize IOPS? This is where MAID comes in. For the purposes of explanation, let's consider the first-ever disk storage system designed exclusively for WORO storage: COPAN Co·pán A ruined Mayan city of western Honduras that flourished from c. 300 b.c. to a.d. 900. The ruins include the Hieroglyphic Stairway with nearly 2,000 glyphs. Systems' Revolution 200T. Their innovative design has extended the concept of MAID and ultimately reduced purchase and maintenance costs, while simultaneously improving reliability--and increasing the service life of the disk drives in their array. To understand the MAID concept, you need to understand that a virtual tape library has a finite number of virtual tapes, based on the number of disk drives in the array. Each of these virtual tapes is represented by a contiguous section of disk belonging to a virtual disk volume protected by a parity disk. This sounds like RAID-4, but it's not. If this were a traditional RAID-4 volume, backing up to a virtual tape residing on this RAID 4 volume would cause a stripe of data to be written across all data members of the RAID 4 array, and parity would be written to the parity disk (see Figure). COPAN Systems concatenates the volumes instead of striping Interleaving or multiplexing data to increase speed. See disk striping. striping - data striping them, but still calculates parity for them. In a six-disk MAID volume, they write to and fill up disk 1 before writing to disk 2. Then they fill up disk 2 before writing to disk 3. That way, we can power off disks 2-5 while we're writing to disk 1, and power off disks 1, 3, 4 and 5 while writing to disk 2, respectively. When they're writing to disk 1, they know that disks 2-5 contain all nulls, so they don't need to power them on to calculate parity. Once they're writing to disk 2, they can use the historical parity and combine it with the nulls on disks 3-5 to calculate parity for disk 1. The result is that at any one time, only the parity drive A separate disk drive that holds parity bits in a disk array. See RAID. and the active drive need to be powered on. A few issues might come to mind when thinking about this plan. Perhaps you're thinking that the parity drive would get busy and worn out with time. Perhaps you're thinking that the disk at the end of the concatenation might never get used, and might therefore never get powered on. Or worse, there might be virtual tapes that never get used, and their drives stay powered off all the time. The answers to these questions are found in COPAN Systems Disk Aerobics. Their goal is to distribute the workload (and rest periods) among all drives, ensuring that no individual drive gets worked too much or too little. Data that's on a drive that is getting used too much or is reaching an error threshold is transparently moved to a more idle drive, removing any hot drive concerns. This also will help avoid RAID rebuilds. A drive that's been idle for a while is powered on and exercised, while its data is checked for integrity. This unique design means that a COPAN Systems disk array has 25% of the power and cooling requirements of a typical array. That means a higher density than tape and less real estate cost. That means 75% fewer power supplies and 75% fewer cooling systems cooling systems for housed animals include spraying of roofs with water, evaporative pads with fans, foggers and misters; for pastured animals shelter from the sun by trees or artificial shade devices and cooling ponds are used. . Instead of having a power supply in each disk shelf, they have a single N+1 power rack that supplies DC power to all the shelves. Since MTBF (Mean Time Between Failure) The average time a component works without failure. It is the number of failures divided by the hours under observation. MTBF - Mean Time Between Failures is based on power-on time, it also means that their SATA (Serial ATA) A serial version of the ATA (IDE) interface, which has been the de facto standard hard disk interface for desktop PCs for more than two decades. The original Parallel ATA (PATA) interface was launched in 1986. disks last four times longer than disks in a traditional RAID array, making it in the ballpark of a SCSI SCSI in full Small Computer System Interface Once common standard for connecting peripheral devices (disks, modems, printers, etc.) to small and medium-sized computers. SCSI has given way to faster standards, such as Firewire and USB. , with 1.2M MTBF hours. COPAN Systems has released their first product as a virtual tape library in order to take advantage of the ease of integration with current environments. The result of their MAID architecture and the VTL personality is the first virtual tape library priced at the same price as tape. Where a fully populated A circuit board whose sockets are completely filled with chips. tape library costs $1-$4/GB, a typical VTL solution can cost from $10-$22/GB. The Revolution 200T, expandable to 224TB and 720 MB/s, is priced at $3.50/GB. At that price, you can deploy a completely disk-based solution on site, and use tape only for off site. This system gives you all the speed and manageability benefits of disk and RAID for both backups and restores, without spending an arm and a leg. Part 1 of this article appeared in the August edition of CTR See click-through rate. and discussed issues with tape--both pros and cons pros and cons Noun, pl the advantages and disadvantages of a situation [Latin pro for + con(tra) against] . W. Curtis Preston is vice president of service development at GlassHouse Technologies, Inc. (Framingham, MA) www.glasshouse.com |
|
||||||||||||||||||||

Printer friendly
Cite/link
Email
Feedback
Reader Opinion