Disk-based backup: is LAN-based or SAN-based the fairest mirror of them all?
Although the enthusiasm for disk-based backup is growing, the implementation options are daunting. There are a variety of technologies coming to market, each with cost and functionality tradeoffs that make direct comparisons challenging. Surprisingly, among the options there is one important but often overlooked differentiation: does the replication data flow over the SAN or the LAN? An attribute as fundamental as this may seem too important to ignore, but the LAN vs. SAN decision often gets aggregated with a variety of other decisions that effectively settle the debate before it even gets started. That will soon be changing. Emerging technologies will create new SAN-based data replication options, making the SAN vs. LAN question one of the most critical of all.
The Case for Disk-Based Backup
The rising interest in disk-based Backup is driven by the technique's two significant advantages over tape-only methods.
The first is the enhanced backup process itself. To achieve optimum performance, traditional backup processes require careful scheduling of server and tape resources, a tightly choreographed ballet that may run into overtime of any of the pieces fails. Disk-to-disk solutions eliminate that headache by maintaining a copy of the data on a second array. The mirrored data, rather than the primary data, can then be used to generate a tape-based backup, simplifying scheduling and improving overall process reliability.
The second and more significant advantage of disk-to-disk is faster data restore. Tape-based data restore often involves reading multiple pieces of media as both full and incremental backups are pieced together to re-create the original files, a process that may take hours or days. Disk-based solutions, by comparison, make files instantly available. Simply mount the desired volume and access the needed point-in-time image. In addition to being faster, disk-based solutions are less prone to error. Because the backup process is more reliable, the restore process ultimately becomes faster and more reliable as well.
Mirroring Techniques Compared
LAN- and SAN-based replication solutions both address the main objectives of disk-to-disk backup: they replicate data from a primary array to a physically separate mirror array that may be instantly accessed when needed. The distinction between LAN and SAN solutions is how the data is moved and which topology is employed.
LAN-based solutions: These solutions replicate file-level data between servers connected by an Ethernet link that may be either dedicated or shared with other traffic. The servers may be either conventional servers or NAS devices, though both the primary and secondary servers must run instances of the associated data replication software.
LAN-based data replication is most commonly asynchronous, meaning replication is scheduled rather than continuous. Because replication occurs at intervals, the process provides point-in-time images of data, allowing users to "roll back in time," if needed. And for fail-over redundancy, users and applications can usually access data directly from either the primary or secondary device.
SAN-based solutions: SAN-based replication provides benefits similar to LAN-based replication: it delivers fast access to replicated data and point-in-time images for data rollback. In addition, both solutions allow data to be accessed through multiple paths for redundancy.
Though there are similarities, the distinctions between LAN- and SAN-based replication are significant. With SAN-based solutions, data is mirrored between two storage arrays rather than between two servers. Block-level
data is moved over a Fibre Channel SAN, with the data movement driven either by the arrays themselves (i.e., array-based replication), or by the servers (i.e., host-based replication). These differences have implications that ultimately affect the performance, availability and cost of the solution.
Performance: There are several ways to characterize the performance of a mirroring solution. First, the ideal solution would not impact ongoing file services. That is, users on the LAN should see no performance degradation while mirroring is in progress. Second, the replication process should complete as quickly as possible to provide the greatest scheduling flexibility. On both metrics, SAN-based mirroring offers advantages over LAN-based.
File serving performance: Mirroring in a LAN-based solution tends be both processor and network intensive. When a user executes a data write, that file is first written to the primary server's disk and examined to see what was changed from any previous instances of that file. Those changes are then retrieved from disk and dispatched over the LAN to the secondary server. This requires data to move twice through the processor stack, and twice across the LAN as well (unless a dedicated link is used), consuming resources on both. The impact on file serving performance can be significant. By contrast, SAN-based mirroring preserves resources by moving block-level data--which consumes far less processor bandwidth--over the high-speed SAN. The result is that SAN solutions have trivial impact on file serving performance.
Mirroring completion time: Because SAN-based mirroring is inherently higher-speed, replication operations are completed more quickly and with less interference to ongoing file services. This gives the IT administrator greater flexibility to schedule mirroring operations when needed, and to increase the frequency of data replication to ensure a more up-to-date backup copy.
Availability: The ability to access replicated data at any time is a fundamental requirement of data mirroring solutions. Without guaranteed access to the secondary data set, the operation is pointless. While LAN-based solutions are robust, they lack the inherent fault tolerance of SAN-based mirroring. Because NAS devices and conventional servers use direct attached storage, replicated data is usually accessible though only one device. If that device is down, mirroring stops and the data is not available. SAN-based mirroring enhances availability by providing multiple paths to the mirrored data. Any device that can mount the mirrored volume can provide read/write access, greatly enhancing availability.
Backup to Tape Flexibility: Many IT managers will continue to maintain tape-based backup copies, even with disk-based data replication in place. In LAN-based solutions, the tape device is often directly attached to the secondary server. Generating backups from the mirrored data allows the process to proceed on a flexible schedule, but also introduces a point of vulnerability. If the secondary server is down, both the disk and tape-based backup processes are down as well. With a SAN-based solution, multiple servers can access the primary and secondary arrays, and can access the tape library as well. This flexibility gives the IT manager more options to either generate backups or to execute a data restore when resources are unavailable.
Cost: Software costs can be a significant factor in the overall cost of a mirroring solution. LAN-based solutions usually require both the primary and secondary servers to host replication software packages. Software and maintenance costs can grow quickly, particularly in large environments with multiple server pairs. SAN-based mirroring tends to require fewer software instances, particularly in the case of a SAN filer-based solution as opposed to a SAN array-based solution. Many SAN filer-based solutions will require only two software instances, saving both acquisition and maintenance expense.
SAN Filer-based vs. SAN Array-based Replication
Within the category of SAN-based replication, there is a further distinction between array-based and SAN filer-based solutions. In array-based solutions, the storage arrays drive the replication process independent of server control. While this has the advantage of offloading servers from this task, it has a significant disadvantage: the primary and secondary arrays must be from the same vendor, and often are required to be the same type of array as well. For most users, this requirement makes array-based solutions cost prohibitive for backup.
The rising popularity of disk-to-disk backup is largely driven by the shrinking cost of arrays, particularly as new, lower-cost SATA arrays enter the market. In SAN filer-based solutions, the filer drives the replication process. Primary and secondary arrays may be of different types, allowing either existing or new arrays to be used as backup devices while higher-cost arrays are maintained as primary storage.
The search for improvements to the backup process is not new. Indeed, data backup was the original "killer app" for storage area networking. By facilitating the sharing of tape libraries, SANs both lowered costs and increased flexibility. Fortunately, SANs deliver similar benefits to disk-based data replication as well. SAN-based disk-to-disk backup conserves resources by increasing server availability, lowering software costs and allowing both server and array resources to be deployed as needed. Furthermore, SAN flier-based replication saves considerable capital resources by allowing data to be mirrored between arrays of different types and from different manufacturers.
The predecessor of the new SAN-based replication solutions are the high-end storage arrays found in the most demanding IT environments. These proprietary solutions provide block-based data replication, but are too costly for most applications. LAN-based solutions have been widely deployed in workgroup and departmental settings, but lack the scalability needed for many enterprise environments. Now, the combination of lower-cost disk arrays and new mirroring technologies, such as SAN fliers from ONStor, will greatly reduce the expense and complexity of SAN-based replication and will bring its benefits to the open systems arena.
Jon Toor is director of marketing for ONStor (Los Gatos, CA)
|Printer friendly Cite/link Email Feedback|
|Publication:||Computer Technology Review|
|Date:||Jul 1, 2003|
|Previous Article:||Serial ATA ensures data availability.|
|Next Article:||The end of spam? Unmasking the stealth spammer through Source Authentication.|