The optimal backup solution: it's now within your reach.
Traditionally, each server that required backup had a tape drive or small library attached. This solution allowed for fast backups but was expensive to deploy and complex to manage. Over time, companies deployed large backup servers with large libraries attached and performed backups over the LAN. These solutions are easily able to grow as more data is brought online. Unfortunately, with growth they become more expensive and more complex to manage.
Enter the SAN
Servers attached to a SAN can quickly (and cost effectively) take advantage of the SAN to backup data. By attaching backup servers to the SAN and enabling "IP-over-Fibre" for them, the SAN-attached servers will allow backup traffic that used to travel over the LAN to travel over the SAN using IP. Changes to the backup server will be needed but are minor. For companies that are still using 10/100 networks for backups, the performance increase will be dramatic. Companies using Gig-E for backups may see a performance increase, but will see less traffic flowing over the production network during backups.
SAN and Tape
Adding tape libraries to a SAN allows users to address backup issues in a variety of ways. For many companies, it is difficult to efficiently make use of their tape libraries. Some libraries are constantly busy while others are idle. Adding the backup servers and tape libraries to a SAN gives administrators the flexibility to configure their backups to use the resources efficiently. Many backup servers can utilize large libraries attached to a SAN instead of dedicating a large (expensive) backup server directly to a large library. Also, many smaller libraries can easily be consolidated into a large, SAN-attached library.
For many backup environments, the bottleneck is the network. Once a library is attached to a SAN, every server attached to the SAN has access to it. Most backup software will allow a server attached to the SAN to back itself up directly to the SAN-attached library. These servers will backup at tape speeds since they will no longer send their data over the network. Since servers attached to the SAN usually have a large amount of data, their backups usually occupy a large amount of the backup servers' time. By having the application servers back themselves up, not only will their backups finish quicker, resources on the backup servers will be free to backup the rest of the environment faster.
The next logical step in leveraging the SAN to solve the backup problems is to consider backing up production data to SAN-attached disk. This allows backups to finish very quickly, usually faster than backing up to tape and more importantly, data recovery is very fast. The biggest drawback of disk-based backup is that there is no tape that can be stored offsite. This is solved by copying the data to tape after the backups are complete. Besides the raw performance increases it provides, disk is also faster than tape because the disk-based system does not have to physically find a tape in the library, load it, and advance the tape to the data. With disk-based backup systems, "tapes" are loaded instantly and access to data, anywhere on the "tape", is instantaneous. There are many ways to implement a disk-based backup solution.
Vendors are starting to introduce libraries that use inexpensive ATA disks with a virtualization engine front end, which emulates a tape library to the backup software running on the backup server. This type of disk-based library typically emulates a library with many tape drives and tape slots. There is nothing special that needs to be done to the backup software; it simply sees another multidrive/multi-slot library that it can use. Attaching one of these virtual libraries to a SAN allows them to be used in the same manner as the SAN-attached tape libraries described above.
Some backup software vendors have an option to allow for backing up directly to disk. With the same benefits of the hardware-based systems, the software is also more cost effective and has the flexibility to use any available type of disk storage. While the hardware implementations limit users to the amount of backup data that can be stored on a unit, the software version lets them use as much disk as they'd like, enabling them to store huge amounts of backup data near line. Using SAN-attached disk allows for virtually unlimited capacity that can be allocated for backups.
The ultimate in backup technology is the snapshot. A snapshot is, after all, just a copy of data, which is exactly what a backup is. Snapshots are considered a disk-based backup solution because they are stored on disk. During a backup, the applications on the servers being backed up are either in degraded mode (e.g., a database is put into hot backup mode) or are shut down (called a cold backup). With a library (tape or disk), the applications are degraded or down for a large amount of time, frequently for several hours. With snapshot technology, applications are impacted only for seconds or minutes. Usually, it takes longer for the application to shut down or a database to enter hot-backup mode than it takes to perform the snapshot. Snapshots are powerful because a system can be backed up in seconds, then archived to tape at any time. With a SAN-attached tape library, backup performance would increase by allowing SAN-attached application servers to back themselves up directly to the library. With a snapshot, it can be mounted to a SAN-attached backup server and archived to tape with virtually no overhead or impact to the application server.
There are a variety of ways to perform snapshots:
Split-mirror snapshots create an instantaneous second copy of data that can be used at a later time to recover data: this is accomplished by mirroring disk drives and then breaking the mirror. It is traditionally done within the disk subsystem or from the host using some sort of volume management software, such as VERITAS Volume Manager. Today, SAN appliances like FalconStor's IPStor also perform this function. The major issue with a split-mirror snapshot is that there must be enough storage to accommodate not only the original copy but the split mirrors as well. Assuming that a snapshot is taken every hour and held for 24 hours, a database with one terabyte of disk space allocated to it would need 25 terabytes of usable disk space (production copy plus 24 snapshots). By retaining multiple split-mirror snapshots, the amount of time to recover will be much less than going to a backup system to recover the data.
Snap-copy snapshots, like split-mirror snapshots, provide a complete copy of the data. The major difference is that when a split mirror is initiated, it creates an instantaneous copy of the data, so the data is available immediately. When a snap-copy is initiated, the data is copied to another area of storage, which may take from a few minutes to hours. As with a split mirror, each snap-copy snapshot requires enough storage to hold an exact copy of the original data and can be created on the host (Volume Manager), within a SAN appliance (IPStor) or on the disk subsystem using, for example, Engenio's SANtricity Storage Manager software.
Pointer-based snapshots are not exact copies of the data but a set of pointers that point to the original data. As the original data is written to, the changed blocks are written to the snapshot reserve area and the pointer moved to that block; this process is called "copy on write." Subsequent writes to the original data are not copied to the snapshot reserve area because the original data has already been moved. One of the most attractive aspects of a pointer-based snapshot is that the snapshot reserve area needs just a fraction of the original disk space, since only the changed blocks are copied. Because pointer-based snapshots require such a small amount of space, they can be taken more frequently at a low cost. Pointer-based snapshots are the most robust of the snapshot technologies and can be created on the host (Volume Manager), within a SAN appliance (IPStor) or on the disk subsystem (SANtricity).
The optimal solution, which solves a majority of most companies' backup issues, is to implement a SAN with Fibre Channel disk with snapshot technology for production, ATA for disk-to-disk backup and a tape library at a remote location attached via a stretched SAN. A backup would consist of a snapshot of the production data, which would be mounted to a backup server. The backup server would then backup the snapshot to the ATA disk. That backup would then be archived to tape. The copy of the backup that resides on the ATA disk would "age" using the same retention policies that were in place for the local tape copy. The backup that was archived to tape can remain at the remote site and fulfill most companies' requirements for offsite tape storage.
Historically, solving backup needs was an easy task. It was just a function of the backup window, the amount of data to be backed up, and the speed of the backup drives to calculate how many drives were needed. For some, this is still the preferred method. For others, the backup window may be so small, or their environment so complex, they will need to implement some sort of SAN enabled technology. If the need is for fast backups and a lot of random restores, then a SAN disk-based backup solution may be in order. Many companies may implement a variety of solutions (e.g., adding a disk-based backup unit or a snapshot solution that extends the useful life of an existing tape-based library). Whatever the need-backup, restore or archive--there is a SAN-enabled solution on the market to address it.
Jim McKinstry is senior systems engineer at Engenio Information Technologies. Inc. (Milpitas, CA)
|Printer friendly Cite/link Email Feedback|
|Title Annotation:||SAN Trends|
|Publication:||Computer Technology Review|
|Date:||Oct 1, 2004|
|Previous Article:||Intelligent SANs: issues to consider when selecting an enterprise-class network storage controller.|
|Next Article:||Information security demands a layered approach in IP SANs.|