Printer Friendly

Maximizing Data Throughput.

Smoothing the road for high transfer rates

Tape drives have been improving not only in increased capacity, but also in performance. Today's drives boast transfer rates of 10MB/sec or higher. While the prospect of backing up data at rates approaching 36GB an hour may sound extremely attractive to the data manager who has been wishing for such a high performance backup method, getting there can be a major problem.

In many ways, the current generation of tape drives, ranging from the Ecrix VXA-1 with a native rate of 3MB/sec on up to Exabyte's Mammoth 2 and future LTO and Super DLT drives, which can approach 10MB/sec, are like finely tuned race cars. On an ideal, straight track, the Ferrari can hit full speed. Throw in some turns, add a lot of cars, and toss in some potholes, and the car can't hit its rated performance levels.

The equivalent of traffic, potholes, and other obstacles are common on today's computer systems. Attaching a high performance drive to a system that can't support the required data flow can be a serious disappointment and could, perhaps, even be job threatening. Putting a tape drive that can handle continuous data rates of 6MB/sec on a system that can only provide a maximum of 2MB/sec (or less) data can seem equivalent to dropping a Ferrari onto a curvy dirt road. Without some serious repaving, the Ferrari won't be able to come anywhere near its maximum speed.

For IT or data managers who assume that merely adding a high speed tape drive to a system will improve backup speeds, it may be extremely disappointing to find little or no improvement if the real bottleneck was not the drive at all. Understanding the system to which you plan to attach a high performance tape drive before getting the drive or library can help avoid such disappointment. Making some of the changes proposed in this article, based on advice from a number of tape drive manufacturers interviewed for this article, may help to improve transfer rate short-comings that have already been noted by owners of high performance tape drives.

Choosing The OS

Not all operating systems are created equal. The engineers contacted for this article indicated that NT has many more layers in its I/O structure than does Linux. According to a source contacted for this article, the way that the I/O structure is architected has a major effect on data throughput.

"We had a drive streaming 12MB/sec, running off a 166 megahertz PC running Linux. If we put NT on the same machine, we couldn't get (a data rate) of 3MB/sec," this source noted.

It was not clear whether Windows 2000 improved matters, although it seems unlikely that the newest version of NT will have streamlined its I/O structure. "We're hoping that Microsoft has improved the I/O structure on Windows 2000," the source commented.

"Linux has really grown up from being a workstation OS run by geeks to a Web server OS running on high performance systems . . . On Linux, you can feed data faster. You will be able to move data through it at fantastic rates," the source noted.

System Configuration

System-related factors also have significant impact on data throughput. For example, "you must make sure that you have a disk array system that will provide data at a rate that the drive can receive. A single IDE or SCSI hard disk will not provide a data rate fast enough to keep the (tape) drive running," according to one of the engineers contacted for this article.

Additionally, if possible, the block size should be optimized to the tape drive. "In the old days, tape drives used to write to a 512 byte block. The absolute minimum today is 32K with 64K being better and 128K being best," an engineer at a drive maker said. The block size settings may not be easy to change, requiring modification of ASPI settings or changes in software to accomplish.

The experts polled agreed that putting the tape drive on a separate SCSI controller could help improve the flow of data to the tape drive. By separating the tape drive from other SCSI devices, delays that may occur due to data transfers being made by other devices on the bus can be avoided.

Further, optimizing the drives being backed up can help reduce the seek times required to locate the data that will be backed up. A fully optimized drive will access the data more quickly and will be more likely to produce a continuous data stream. Making sure that the drives being backed up are optimized will help to increase data flow between the drives and the tape drive.

The flip side to the optimization question, however, is that it's recommended that drives be backed up before they are optimized in order to assure that the drives can be recreated in case errors occur during optimization. Still, once optimized, most drives require re-optimization infrequently and the re-optimization process is less time consuming than it would be for a highly fragmented drive.

Using A Backup Server

The issue of backing up multiple drives over a network calls into play such factors as network performance and drive characteristics. Copying data from a slow, IDE disk on a workstation that is connected to the network through a 10Mbps Ethernet connector is asking for performance problems. Copying from multiple drives on multiple workstations over a slow network further compounds the problems and is an almost certain prescription for not matching the potential performance of tape drives with the flow of data.

Removing any of the above factors (slow drives, multiple workstations, or a slow network) will help to improve the likelihood of successfully providing the data flow required to get full performance from a tape drive. One solution would be to upgrade the network to 100MB/sec or higher--Gigabit Ethernet or Fibre Channel being the preferred technologies. This way, it can be fairly safely concluded that the bottleneck won't be the network architecture.

Reducing the number of individual drives or drive arrays being backed up can also help to improve data throughput. Ideally, copying the workstation drives onto a separate array that is designed specifically as a backup or storage server may be the method of choice for optimum backup performance. A "poor man's SAN" can be attached to a network and exist only to function as the backup server. Data to be backed up can be streamed off the network onto this server and the server would have a single connection to the tape drive. With the device properly isolated from the basic network traffic (other than updating the data files to be backed up), it can perform its primary task, streaming data to the tape drive at maximum speed. The use of a drive array to store and read the data can provide optimal data access and throughput speeds.

Choosing The Software

Not all backup software is created equal. Not all software handles data on a drive in the same manner--some attempt to create drive images while others ship data directly off to the drive with little or no "intelligence" added to slow the process. Further, an application that allows the size of the data blocks to be modified can also speed up data transfers by increasing the amount of data sent with each block (and increasing the ratio between data and such overhead factors as error correction code). For today's systems, as a rule, the larger the block size, the faster the tape drive can work.

The way that a drive and software work with compression can also impact performance. Compression in software is unnecessary because nearly all tape drives and, in all cases, the high-end drives include hardware compression. However, some backup programs still include software compression and, in some cases, may even turn it on by default. Assuring that software compression is disabled will help to improve data throughput. As a rule, the less processing of data that is done on the software side, the higher the possible throughput will be because fewer clock cycles will be required to get the data ready to send through the system bus.

Freeing The Bus

Although it should be obvious that any backup software is actually performing work when a backup is being performed, it may still be useful to point out that the best throughput can be achieved when the server is doing as few other tasks as possible. During a backup (or restore), the system should ideally be doing only one task: running the backup software application.

Attempting to run a backup session on a server that is also involved in other tasks is asking for trouble--not the least of which is reduced performance. This is one of the reasons why there's logic behind the idea of running backups after hours--because fewer people are on the network and because the server is (presumably) doing less work.

In these days, when data must be available around the clock 365 days a year (or 366 days during a leap year), the luxury of "free" time for backup has all but disappeared. Thus, the argument for a server that is dedicated to backup and only runs the one task at scheduled intervals is increasingly compelling. When it's not performing backup, this server can be updating data over the network, assuring that the data that will be backed up is the most current available.
COPYRIGHT 2000 West World Productions, Inc.
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2000, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.

Article Details
Printer friendly Cite/link Email Feedback
Title Annotation:Technology Information; tape drives
Author:Brownstein, Mark
Publication:Computer Technology Review
Geographic Code:1USA
Date:Mar 1, 2000
Words:1580
Previous Article:Ultrium Reality Check.
Next Article:Zoning For Fibre Channel SANs.
Topics:


Related Articles
Ultera Masters The Tape Library.
AIT-2 Advances ADIC's Tape Libraries.
Ultera Tapes It Up With Exabyte.
Ours is just a little more extensive.
For Business Preservation [ldots] Get It On Tape.
Get It On Tape--Again!
ADIC USES NEW LTO ULTRIUM TECHNOLOGY TO CREATE HIGH CAPACITY TAPE STORAGE.
ADIC ANNOUNCES SUPPORT FOR SUPER DLTTAPE DRIVES.
StorageTek Offers next-generation fast access tape drive StorageTek T9840C tape drive delivers faster throughput and higher capacity.
Exabyte announces 221L LTO-3 tape library with fibre channel.

Terms of use | Copyright © 2018 Farlex, Inc. | Feedback | For webmasters