Maximizing Data Throughput.Smoothing the road for high transfer rates Tape drives have been improving not only in increased capacity, but also in performance. Today's drives boast transfer rates of 10MB/sec or higher. While the prospect of backing up data at rates approaching 36GB an hour may sound extremely attractive to the data manager who has been wishing for such a high performance backup method, getting there can be a major problem. In many ways, the current generation of tape drives, ranging from the Ecrix VXA-1 with a native rate of 3MB/sec on up to Exabyte's Mammoth 2 and future LTO (Linear Tape Open) A family of open magnetic tape standards developed by HP, IBM and Quantum (formerly the Certance subsidiary of Seagate) that are licensed to third-party vendors. LTO cartridges contain a memory that stores historical usage data. and Super DLT (Digital Linear Tape) A magnetic tape technology originally developed by Digital for its VAX line. The technology was later sold to Quantum, which makes it available to other manufacturers. DLT uses half-inch, single-hub cartridges similar to IBM's 3480/3490/3590 line. drives, which can approach 10MB/sec, are like finely tuned race cars. On an ideal, straight track, the Ferrari can hit full speed. Throw in some turns, add a lot of cars, and toss in some potholes, and the car can't hit its rated performance levels. The equivalent of traffic, potholes, and other obstacles are common on today's computer systems. Attaching a high performance drive to a system that can't support the required data flow can be a serious disappointment and could, perhaps, even be job threatening. Putting a tape drive that can handle continuous data rates of 6MB/sec on a system that can only provide a maximum of 2MB/sec (or less) data can seem equivalent to dropping a Ferrari onto a curvy dirt road dirt road n (US) → camino sin firme dirt road n → chemin non macadamisé or non revêtu dirt road dirt n . Without some serious repaving, the Ferrari won't be able to come anywhere near its maximum speed. For IT or data managers who assume that merely adding a high speed tape drive to a system will improve backup speeds, it may be extremely disappointing to find little or no improvement if the real bottleneck A lessening of throughput. It often refers to networks that are overloaded, which is caused by the inability of the hardware and transmission lines to support the traffic. It can also refer to a mismatch inside the computer where slower-speed peripheral buses and devices prevent the CPU was not the drive at all. Understanding the system to which you plan to attach a high performance tape drive before getting the drive or library can help avoid such disappointment. Making some of the changes proposed in this article, based on advice from a number of tape drive manufacturers interviewed for this article, may help to improve transfer rate short-comings that have already been noted by owners of high performance tape drives. Choosing The OS Not all operating systems Operating systems can be categorized by technology, ownership, licensing, working state, usage, and by many other characteristics. In practice, many of these groupings may overlap. are created equal. The engineers contacted for this article indicated that NT has many more layers in its I/O (Input/Output) The transfer of data between the CPU and a peripheral device. Every transfer is an output from one device and an input to another. See PC input/output. I/O - Input/Output structure than does Linux. According to according to prep. 1. As stated or indicated by; on the authority of: according to historians. 2. In keeping with: according to instructions. 3. a source contacted for this article, the way that the I/O structure is architected has a major effect on data throughput. "We had a drive streaming 12MB/sec, running off a 166 megahertz One million cycles per second. See MHz. MegaHertz - (MHz) Millions of cycles per second. The unit of frequency used to measure the clock rate of modern digital logic, including microprocessors. PC running Linux. If we put NT on the same machine, we couldn't get (a data rate) of 3MB/sec," this source noted. It was not clear whether Windows 2000 improved matters, although it seems unlikely that the newest version of NT will have streamlined its I/O structure. "We're hoping that Microsoft has improved the I/O structure on Windows 2000," the source commented. "Linux has really grown up from being a workstation OS run by geeks to a Web server OS running on high performance systems . . . On Linux, you can feed data faster. You will be able to move data through it at fantastic rates," the source noted. System Configuration System-related factors also have significant impact on data throughput. For example, "you must make sure that you have a disk array system that will provide data at a rate that the drive can receive. A single IDE (1) (Integrated Development Environment) A set of programs run from a single user interface. For example, programming languages often include a text editor, compiler and debugger, which are all activated and function from a common menu. or SCSI SCSI in full Small Computer System Interface Once common standard for connecting peripheral devices (disks, modems, printers, etc.) to small and medium-sized computers. SCSI has given way to faster standards, such as Firewire and USB. hard disk will not provide a data rate fast enough to keep the (tape) drive running," according to one of the engineers contacted for this article. Additionally, if possible, the block size should be optimized to the tape drive. "In the old days, tape drives used to write to a 512 byte block. The absolute minimum today is 32K with 64K being better and 128K being best," an engineer at a drive maker said. The block size settings may not be easy to change, requiring modification of ASPI (Advanced SCSI Programming Interface) An interface specification developed by Adaptec, Inc., Milpitas, CA, that provides a common language between drivers and SCSI host adapters. settings or changes in software to accomplish. The experts polled agreed that putting the tape drive on a separate SCSI controller A common term for a SCSI host adapter. See SCSI. SCSI controller - SCSI adaptor could help improve the flow of data to the tape drive. By separating the tape drive from other SCSI devices, delays that may occur due to data transfers being made by other devices on the bus can be avoided. Further, optimizing the drives being backed up can help reduce the seek times required to locate the data that will be backed up. A fully optimized drive will access the data more quickly and will be more likely to produce a continuous data stream. Making sure that the drives being backed up are optimized will help to increase data flow between the drives and the tape drive. The flip side Flip side In the context of general equities, opposite side to a proposition or position (buy, if sell is the proposition and vice versa). to the optimization question, however, is that it's recommended that drives be backed up before they are optimized in order to assure that the drives can be recreated in case errors occur during optimization. Still, once optimized, most drives require re-optimization infrequently and the re-optimization process is less time consuming than it would be for a highly fragmented drive. Using A Backup Server A computer in a network used to store copies of files from client machines or other servers. Such servers typically have their disks set up in a RAID configuration to provide fault tolerance. See backup program, RAID, SAN and LAN free backup. The issue of backing up multiple drives over a network calls into play such factors as network performance and drive characteristics. Copying data from a slow, IDE disk on a workstation that is connected to the network through a 10Mbps Ethernet connector is asking for performance problems. Copying from multiple drives on multiple workstations over a slow network further compounds the problems and is an almost certain prescription for not matching the potential performance of tape drives with the flow of data. Removing any of the above factors (slow drives, multiple workstations, or a slow network) will help to improve the likelihood of successfully providing the data flow required to get full performance from a tape drive. One solution would be to upgrade the network to 100MB/sec or higher--Gigabit Ethernet or Fibre Channel being the preferred technologies. This way, it can be fairly safely concluded that the bottleneck won't be the network architecture. Reducing the number of individual drives or drive arrays being backed up can also help to improve data throughput. Ideally, copying the workstation drives onto a separate array that is designed specifically as a backup or storage server may be the method of choice for optimum backup performance. A "poor man's Poor man's is a common slang term used to compare one thing with another. It is not necessarily a derogatory term. It is usually used in a sentence as "X is a poor man's Y", with "X" being the person or thing one is referring to, and "Y" being the superior but similar person or SAN" can be attached to a network and exist only to function as the backup server. Data to be backed up can be streamed off the network onto this server and the server would have a single connection to the tape drive. With the device properly isolated from the basic network traffic (other than updating the data files to be backed up), it can perform its primary task, streaming data Data that is structured and processed in a continuous flow, such as digital audio and video. See streaming audio and streaming video. to the tape drive at maximum speed. The use of a drive array to store and read the data can provide optimal data access and throughput speeds. Choosing The Software Not all backup software See backup program. (tool, software) backup software - Software for doing a backup, often included as part of the operating system. Backup software should provide ways to specify what files get backed up and to where. is created equal. Not all software handles data on a drive in the same manner--some attempt to create drive images while others ship data directly off to the drive with little or no "intelligence" added to slow the process. Further, an application that allows the size of the data blocks to be modified can also speed up data transfers by increasing the amount of data sent with each block (and increasing the ratio between data and such overhead factors as error correction code Noun 1. error correction code - (telecommunication) a coding system that incorporates extra parity bits in order to detect errors ECC telecommunication - (often plural) the branch of electrical engineering concerned with the technology of electronic ). For today's systems, as a rule, the larger the block size, the faster the tape drive can work. The way that a drive and software work with compression can also impact performance. Compression in software is unnecessary because nearly all tape drives and, in all cases, the high-end drives include hardware compression. However, some backup programs Software that copies data from a single machine or from selected computers in a network to a secondary storage medium. Backups can be scheduled at periodic intervals, or individual files can be automatically backed up right after they have been updated. still include software compression and, in some cases, may even turn it on by default. Assuring that software compression is disabled will help to improve data throughput. As a rule, the less processing of data that is done on the software side, the higher the possible throughput will be because fewer clock cycles will be required to get the data ready to send through the system bus. Freeing The Bus Although it should be obvious that any backup software is actually performing work when a backup is being performed, it may still be useful to point out that the best throughput can be achieved when the server is doing as few other tasks as possible. During a backup (or restore), the system should ideally be doing only one task: running the backup software application. Attempting to run a backup session on a server that is also involved in other tasks is asking for trouble--not the least of which is reduced performance. This is one of the reasons why there's logic behind the idea of running backups after hours--because fewer people are on the network and because the server is (presumably pre·sum·a·ble adj. That can be presumed or taken for granted; reasonable as a supposition: presumable causes of the disaster. ) doing less work. In these days, when data must be available around the clock 365 days a year (or 366 days during a leap year leap year: see calendar. ), the luxury of "free" time for backup has all but disappeared. Thus, the argument for a server that is dedicated to backup and only runs the one task at scheduled intervals is increasingly compelling. When it's not performing backup, this server can be updating data over the network, assuring that the data that will be backed up is the most current available. |
|
||||||||||||||||||||

Printer friendly
Cite/link
Email
Feedback
Reader Opinion