Disk-based backup: is LAN-based or SAN-based the fairest mirror of them all?Data backup management continues to be a pain point for IT professionals. Between time consuming backup jobs, high resource demands, and poor reliability, this critical task has long been a source of frustration. Fortunately, new disk-based backup solutions are starting to provide options for real relief. By streamlining both the data backup and restore processes, disk-based solutions offer an alternative that Gartner predicts will become the dominant source for data restores by 2008. IT managers appear to agree. In a recent leading storage industry publication survey, 64% of managers polled said they planned to integrate disk-based backup solutions within the next six to twelve months.
Although the enthusiasm for disk-based backup is growing, the implementation options are daunting daunt
tr.v. daunt·ed, daunt·ing, daunts
To abate the courage of; discourage. See Synonyms at dismay.
[Middle English daunten, from Old French danter, from Latin . There are a variety of technologies coming to market, each with cost and functionality tradeoffs that make direct comparisons challenging. Surprisingly, among the options there is one important but often overlooked differentiation: does the replication In database management, the ability to keep distributed databases synchronized by routinely copying the entire database or subsets of the database to other servers in the network.
There are various replication methods. data flow over the SAN or the LAN (Local Area Network) A communications network that serves users within a confined geographical area. The "clients" are the user's workstations typically running Windows, although Mac and Linux clients are also used. ? An attribute as fundamental as this may seem too important to ignore, but the LAN vs. SAN decision often gets aggregated with a variety of other decisions that effectively settle the debate before it even gets started. That will soon be changing. Emerging technologies will create new SAN-based data replication options, making the SAN vs. LAN question one of the most critical of all.
The Case for Disk-Based Backup
The rising interest in disk-based Backup is driven by the technique's two significant advantages over tape-only methods.
The first is the enhanced backup process itself. To achieve optimum performance, traditional backup processes require careful scheduling of server and tape resources, a tightly choreographed ballet that may run into overtime of any of the pieces fails. Disk-to-disk solutions eliminate that headache headache
Pain in the upper portion of the head. Episodic tension headaches are the most common, usually causing mild to moderate pain on both sides. They result from sustained contraction of face and neck muscles, often due to fatigue, stress, or frustration. by maintaining a copy of the data on a second array. The mirrored data, rather than the primary data, can then be used to generate a tape-based backup, simplifying scheduling and improving overall process reliability.
The second and more significant advantage of disk-to-disk is faster data restore. Tape-based data restore often involves reading multiple pieces of media as both full and incremental backups See backup types.
(operating system) incremental backup - A kind of backup that copies all files which have changed since the date of the previous backup. The first backup of a file system should include all files - a "full backup". Call this level 0. are pieced together to re-create the original files, a process that may take hours or days. Disk-based solutions, by comparison, make files instantly available. Simply mount the desired volume and access the needed point-in-time image. In addition to being faster, disk-based solutions are less prone to error. Because the backup process is more reliable, the restore process ultimately becomes faster and more reliable as well.
Mirroring Techniques Compared
LAN- and SAN-based replication solutions both address the main objectives of disk-to-disk backup: they replicate rep·li·cate
1. To duplicate, copy, reproduce, or repeat.
2. To reproduce or make an exact copy or copies of genetic material, a cell, or an organism.
A repetition of an experiment or a procedure. data from a primary array to a physically separate mirror array that may be instantly accessed when needed. The distinction between LAN and SAN solutions is how the data is moved and which topology topology, branch of mathematics, formerly known as analysis situs, that studies patterns of geometric figures involving position and relative position without regard to size. is employed.
LAN-based solutions: These solutions replicate file-level data between servers connected by an Ethernet Ethernet
Telecommunications networking protocol introduced by Xerox Corp. in 1979. It was developed as an inexpensive way of sending information quickly between office machines connected together in a single room or building, but it rapidly became a standard computer link that may be either dedicated or shared with other traffic. The servers may be either conventional servers or NAS (1) See network access server.
(2) (Network Attached Storage) A specialized file server that connects to the network. A NAS device contains a slimmed-down operating system and a file system and processes only I/O requests by supporting the popular devices, though both the primary and secondary servers must run instances of the associated data replication software.
LAN-based data replication is most commonly asynchronous Refers to events that are not synchronized, or coordinated, in time. The following are considered asynchronous operations. The interval between transmitting A and B is not the same as between B and C. The ability to initiate a transmission at either end. , meaning replication is scheduled rather than continuous. Because replication occurs at intervals coming or happening with intervals between; now and then.
See also: Interval , the process provides point-in-time images of data, allowing users to "roll back in time," if needed. And for fail-over redundancy, users and applications can usually access data directly from either the primary or secondary device.
SAN-based solutions: SAN-based replication provides benefits similar to LAN-based replication: it delivers fast access to replicated data and point-in-time images for data rollback A DBMS feature that reverses the current transaction out of the database, returning the data to its former state. A rollback is performed when processing a transaction fails at some point, and it is necessary to start over. See two-phase commit. . In addition, both solutions allow data to be accessed through multiple paths for redundancy.
Though there are similarities, the distinctions between LAN- and SAN-based replication are significant. With SAN-based solutions, data is mirrored between two storage arrays rather than between two servers. Block-level
data is moved over a Fibre Channel SAN, with the data movement driven either by the arrays themselves (i.e., array-based replication), or by the servers (i.e., host-based replication). These differences have implications that ultimately affect the performance, availability and cost of the solution.
Performance: There are several ways to characterize the performance of a mirroring solution. First, the ideal solution would not impact ongoing file services. That is, users on the LAN should see no performance degradation DEGRADATION, punishment, ecclesiastical law. A censure by which a clergy man is deprived of his holy orders, which he had as a priest or deacon. while mirroring is in progress. Second, the replication process should complete as quickly as possible to provide the greatest scheduling flexibility. On both metrics metrics Managed care A popular term for standards by which the quality of a product, service, or outcome of a particular form of Pt management is evaluated. See TQM. , SAN-based mirroring offers advantages over LAN-based.
File serving performance: Mirroring in a LAN-based solution tends be both processor and network intensive. When a user executes a data write, that file is first written to the primary server's disk and examined to see what was changed from any previous instances of that file. Those changes are then retrieved from disk and dispatched Dispatched was a Swedish melodic death metal band formed in 1992 by Daniel Lundberg. Their sound is very similar to the older Gothenburg style of early In Flames. Biography
Dispatched was formed just before New Year's Eve of 1991 by Daniel Lundberg and Krister Andersson. over the LAN to the secondary server. This requires data to move twice through the processor stack, and twice across the LAN as well (unless a dedicated link is used), consuming resources on both. The impact on file serving performance can be significant. By contrast, SAN-based mirroring preserves resources by moving block-level data--which consumes far less processor bandwidth--over the high-speed SAN. The result is that SAN solutions have trivial TRIVIAL. Of small importance. It is a rule in equity that a demurrer will lie to a bill on the ground of the triviality of the matter in dispute, as being below the dignity of the court. 4 Bouv. Inst. n. 4237. See Hopk. R. 112; 4 John. Ch. 183; 4 Paige, 364. impact on file serving performance.
Mirroring completion time: Because SAN-based mirroring is inherently higher-speed, replication operations are completed more quickly and with less interference to ongoing file services. This gives the IT administrator greater flexibility to schedule mirroring operations when needed, and to increase the frequency of data replication to ensure a more up-to-date backup copy A disk, tape or other machine readable copy of a data or program file. Making backup copies is a discipline most computer users learn the hard way-- after months of work is lost. See backup and LAN free backup. .
Availability: The ability to access replicated data at any time is a fundamental requirement of data mirroring solutions. Without guaranteed access to the secondary data set, the operation is pointless. While LAN-based solutions are robust, they lack the inherent fault tolerance See fault tolerant.
(architecture) fault tolerance - 1. The ability of a system or component to continue normal operation despite the presence of hardware or software faults. This often involves some degree of redundancy.
2. of SAN-based mirroring. Because NAS devices and conventional servers use direct attached storage, replicated data is usually accessible though only one device. If that device is down, mirroring stops and the data is not available. SAN-based mirroring enhances availability by providing multiple paths to the mirrored data. Any device that can mount the mirrored volume can provide read/write access, greatly enhancing availability.
Backup to Tape Flexibility: Many IT managers will continue to maintain tape-based backup copies, even with disk-based data replication in place. In LAN-based solutions, the tape device is often directly attached to the secondary server. Generating backups from the mirrored data allows the process to proceed on a flexible schedule, but also introduces a point of vulnerability. If the secondary server is down, both the disk and tape-based backup processes are down as well. With a SAN-based solution, multiple servers can access the primary and secondary arrays, and can access the tape library as well. This flexibility gives the IT manager more options to either generate backups or to execute a data restore when resources are unavailable.
Cost: Software costs can be a significant factor in the overall cost of a mirroring solution. LAN-based solutions usually require both the primary and secondary servers to host replication software packages. Software and maintenance costs can grow quickly, particularly in large environments with multiple server pairs. SAN-based mirroring tends to require fewer software instances, particularly in the case of a SAN filer-based solution as opposed to a SAN array-based solution. Many SAN filer-based solutions will require only two software instances, saving both acquisition and maintenance expense.
SAN Filer-based vs. SAN Array-based Replication
Within the category of SAN-based replication, there is a further distinction between array-based and SAN filer-based solutions. In array-based solutions, the storage arrays drive the replication process independent of server control. While this has the advantage of offloading servers from this task, it has a significant disadvantage: the primary and secondary arrays must be from the same vendor, and often are required to be the same type of array as well. For most users, this requirement makes array-based solutions cost prohibitive pro·hib·i·tive also pro·hib·i·to·ry
1. Prohibiting; forbidding: took prohibitive measures.
2. for backup.
The rising popularity of disk-to-disk backup is largely driven by the shrinking cost of arrays, particularly as new, lower-cost SATA (Serial ATA) A serial version of the ATA (IDE) interface, which has been the de facto standard hard disk interface for desktop PCs for more than two decades. The original Parallel ATA (PATA) interface was launched in 1986. arrays enter the market. In SAN filer-based solutions, the filer drives the replication process. Primary and secondary arrays may be of different types, allowing either existing or new arrays to be used as backup devices See backup storage. while higher-cost arrays are maintained as primary storage.
The search for improvements to the backup process is not new. Indeed, data backup was the original "killer app A software application that is exceptionally useful or exciting. Killer apps are innovative and often represent the first of a new breed, and they are extremely successful. For example, in the late 1970s, the VisiCalc spreadsheet was the killer app for the Apple II, providing reason " for storage area networking. By facilitating the sharing of tape libraries, SANs both lowered costs and increased flexibility. Fortunately, SANs deliver similar benefits to disk-based data replication as well. SAN-based disk-to-disk backup conserves resources by increasing server availability, lowering software costs and allowing both server and array resources to be deployed as needed as needed prn. See prn order. . Furthermore, SAN flier-based replication saves considerable capital resources by allowing data to be mirrored between arrays of different types and from different manufacturers.
The predecessor of the new SAN-based replication solutions are the high-end storage arrays found in the most demanding IT environments. These proprietary solutions provide block-based data replication, but are too costly for most applications. LAN-based solutions have been widely deployed in workgroup and departmental settings, but lack the scalability needed for many enterprise environments. Now, the combination of lower-cost disk arrays and new mirroring technologies, such as SAN fliers from ONStor, will greatly reduce the expense and complexity of SAN-based replication and will bring its benefits to the open systems arena.
Jon Toor is director of marketing for ONStor (Los Gatos Los Gatos (lôs gä`tōs, lŏs, găt`əs), city (1990 pop. 27,357), Santa Clara co., W Calif.; inc. 1887. It is an affluent residential community and health resort. , CA)