Getting disk into the backup process; adding benefits of disk while supporting existing processes.
Integrating Disk Into Enterprise Backup
The first major challenge to implementing a disk backup strategy is integrating it into an existing environment. Enterprise backup is complex, heterogeneous, and can touch all of an enterprise's data in hundreds or thousands of separate jobs each night. Moreover, backup isn't limited to copying and recovering files. It also organizes data sets based on what the data pool looked like at many different points in time. It keeps track of different versions. And it manages the rotation and use of media for long-term disaster recovery and data retention compliance. With this kind of scale, users can't simply insert disk resources into the process without re-architecting procedures and using different versions of backup software.
The easiest way to let enterprise backup take advantage of the performance and fault tolerance of disk is to make a disk-based system look like tape to backup applications. The technology to do that--virtual tape technology--has not been generally available for open systems backup. When ADIC created its Pathlight VX disk-to-tape system, we developed a variation of virtual tape technology built specifically to support open systems backup and we embedded it in a local controller. The system it creates provides a layer of disk-based storage with its performance and fault tolerance gains, but allows it to fit directly into an existing backup system designed around tape without requiring that a user change applications or basic processes (Figure 1). The process that is supported includes a path to real tape creation.
The Path to Tape
For long-term storage, for removability, for disaster recovery, and for compliance with data retention regulations, a removable medium like tape remains an essential part of most organizations' comprehensive data protection systems. So before any disk-based storage is made part of a backup system, two key questions need to be asked: How will data get to tape? And what kind of format and management support will it have?
How Data Gets to Tape
For the first question--how data will get to tape--there are two basic answers (see Figure 2). The fist architecture writes data to disk once, and then uses a second, on-line process to move the data to tape. The second architecture embeds the movement to tape, creating removable media off-line, in the background, and over isolated connections. Let's look at each option:
[FIGURE 1 OMITTED]
Two-Step Media Creation: This system writes data first to disk, and then uses a separate process to move the data a second time from the disk resource to a tape target, normally an automated tape library. The processors used to move the data in this kind of system are normally the same for both moves--the processors in the applications servers or in the media servers. And the connections used to move the data are also the same--for most enterprises applications, it's the Fibre Channel SAN used for backup.
[FIGURE 2 OMITTED]
The advantages of the two-step process are mostly on the side of the disk and tape vendors--it is simpler to create a single device, a disk array or a tape library, and to leave the job of connecting them to the end user or to an integrator. The disadvantages are several.
The integration job for the end user is much more difficult; complex backup procedural changes are required on the software side, and installation costs are likely to be higher. Server operation and network performance can be degraded much more than with conventional backup since the same processors and same network are moving the data two times--effectively doubling whatever the negative effects of backup are on the system. Ongoing system management is also much more complex since staff retraining is required, and there may be two or three vendors supporting and servicing different parts of the same system. Finally, the actual backup performance--the performance of writing to tape--is likely to be slower than with conventional backup since the system has added another element to be integrated, managed, and tuned.
Background Media Creation: The second path to tape, the one that ADIC has used in its Pathlight VX solution, first moves data over the public network to the disk resource, then moves the data to tape in the background, using dedicated, embedded data movers and isolated connections. The only real disadvantage to this system is that the vendor must assume more of the responsibility for creating an integrated system. There are several advantages for the end user
The negative impact on the system is dramatically reduced--data is only moved once by the applications servers over the SAN. The actual creation of media is moved completely outside the backup window and off the public network. Installation and on-going support is simplified since the user only has to manage one system, the basic procedures do not change from a tape-only architecture, and the solution has a single point of vendor responsibility. And finally, the actual performance during tape creation can be significantly increased because the system is carefully tuned specifically for backup and it is not competing for resources with other kinds of services.
Tape Format and Media Management
The answer to the second question--the kind of media format and tape management system--is closely related to the issue of integrating the disk-based system into an existing backup process. The approach that we recommend is one that creates tapes that keep the same format that the backup application believed it wrote in the first place, and that allows the backup application's media management system to be retained without change.
This is the approach that we have used in the Pathlight VX disk-to-tape solution. The tapes that the new system generates are identical to tapes created by the backup application writing to a system that uses conventional library architecture. A central advantage to this approach is that the tapes that are created can be loaded and read in any compatible drive anywhere--they do not require a disk-to-tape system for restores. And because the system creates a one-to-one link between the virtual media created at the time of backup and the real tapes that are created for export and long term retention, it also allows the established disaster recovery and data retention policies to be continued without change. The end result is a system that uses existing software for the backup and restore process, that exports and manages real media using existing policies and procedures, but that inserts a layer of disk in the process to provide both fault tolerance and higher performance.
Keeping performance optimized is the third major issue, which users who integrate disk components into backup face. Although it doesn't seem intuitive at first, careful system management is required to get consistent performance benefits from the addition of disk to backup. Tape is optimized for handling large data streams while disk is optimized for high I/O performance. Any time there is a need to stop and start the data flow or to move between random blocks, disk-based storage will excel. But if users just point a high band-width backup stream at a RAID array, they are likely to see reduced performance from the disk system when they compare it to tape.
The solution to getting increased performance across all backup operations is to manage the disk resource so that it can support streaming performance. The technique involves providing a combination of RAID volume management, controller balancing, contiguous space use, and the matching of block size with stripe groups. This configuration borrows technology from applications that have to get streaming performance from disk--large parallel compute operations and delivery of rich media content and digital entertainment.
When ADIC created its Pathlight VX disk-to-tape backup system, for example, it included in it the company's high performance data management software technology (from ADIC's StorNext Management Suite) and it configured the disk resources for high-performance write and restores. This is the technology combination that delivers disk streaming performance characteristics for supercomputer environments and for broadcast and media delivery applications around the world. When applied to backup, it ensures that the disk can provide consistent performance gains with sustained rates above a terabyte per hour.
The availability of serial ATA technology holds out the promise of increasing the performance of backup and restore, and giving it the kind of fault tolerance that we associate with RAID arrays. But users are discovering that these benefits do not come automatically from buying a SATA array. Getting real benefits from SATA disk depends on integrating it into carefully designed system-based solutions that integrate easily into existing backup environments, that retain an integrated, off-line path to real tape creation, and that are designed to get true streaming performance from the disk systems.
RELATED ARTICLE: Intelligent Storage Networks
At the center of the new storage network architecture is the future envisioned for today's network switches, directors and routers. Sometimes referred to as the Storage Domain Director, an advanced, fault-tolerant storage switching architecture is unfolding, enabling centralized and outboard management of distributed computer resources finally to become a reality. A key goal for the Intelligent Storage Network is to significantly minimize the number of storage management touch points.
Many types of storage management applications now make sense for hosting on the fabric. These include controlling SAN traffic, centralized storage management, storage consolidation, SRM, HSM, backup/recovery, snapshot copy, replication, and outboard data movement between disk and tape subsystems (server-less functions). The capability to provide non-disruptive scalability and storage topology reconfiguration enables the long-awaited capability for proactive and people-transparent virtual storage management to become a reality.
In addition, the ability to perform block and file data storage operations in parallel and thus bridge the "number and name" worlds within the same storage subsystem is a strategic and valuable outcome of this advanced architecture. Emerging concepts such as IP Storage, advanced SRM, active HSM, and In-band (symmetric) or Out-of-band (asymmetric) virtualization appliances are providing valuable building blocks for advanced and intelligent storage architectures. Optimally, the virtualization and volume-management layer for simplified storage and network control will reside here.
Though a debate continues over where to locate the storage functionality, there is widespread agreement on the need to make storage management independent from the type of servers being deployed. Many companies are now delivering fundamental pieces and building blocks of the intelligent fabric, but the complete vision will take a few more years to arrive. Nonetheless, the path to the future for storage subsystems can be described using many of the components that are now becoming viable.
Excerpted from Storage: New Game New Rules by Fred Moore, Horison Information Strategies [c]2003. www.horison.com
Scott Hamilton is director of product management at ADIC (Redmond, WA)
|Printer friendly Cite/link Email Feedback|
|Title Annotation:||Storage Networking|
|Publication:||Computer Technology Review|
|Date:||Jan 1, 2004|
|Previous Article:||Tiered storage cuts costs, improves business alignment.|
|Next Article:||Solid-state disks: moving from luxury to necessity.|