Data protection and disaster recovery of local and remote file servers.
Traditionally, this task has been daunting and cumbersome because available technology generally fell far short of what was needed for efficient, centralized management of remote data. If an organization couldn't afford a private network between its locations, the time, bandwidth and extra security required for moving data over a public WAN made it an impractical, if not impossible, alternative. Now, however, new technologies are in place that makes this task easy, efficient, accurate and automated.
One aspect of data backup and management that has particularly benefited from the latest technologies is data replication. To this point, "data replication" has most often been used to refer to real-time replication for disaster recovery and business continuity. Real-time, or immediate, replication can be either synchronous (data is written to both storage devices at the same time, and the write operation isn't complete until both devices have completed writing) or asynchronous (data is written to both storage devices at once, although the write operation is considered complete when the primary storage device has completed writing).
[FIGURE 1 OMITTED]
Real-Time, Synchronous Data Replication
Real-time synchronous replication is the most effective type of replication for high-availability clusters and server failover in a LAN environment, since the data is completely identical at all times between the primary and secondary storage server. Data is replicated (or mirrored) between a single source storage server and a single target storage server, and often the realtime replication is tied into a cluster application. However, the characteristics of real-time replication that make it great for high-availability applications are the same characteristics that make it a poor choice for remote data protection. There is no scalability with this solution, as synchronous replication is almost always a one-to-one configuration. Real-time synchronous replication works well in high-bandwidth network environments, such as a Fibre Channel SAN or MAN, but when the primary and secondary storage servers are separated by more than a few miles, network latency can cause serious performance problems for the storage application. Also, with data being written to both storage servers at the same time, there is no protection against data corruption or file deletion, since the data will be corrupt on both systems at the same time. A traditional backup product is required in addition to synchronous replication to provide file-level data protection.
Real-Time, Asynchronous Data Replication
Like its synchronous counterpart, real-time asynchronous replication is used to protect organizations from the loss or unavailability of a primary storage device. However, since the new or changed data does not need to be sent to the secondary device at the same time it is written to the primary storage, asynchronous replication is not bound by the same latency and distance constraints of synchronous replication. This allows for disaster recovery across a WAN, and works well for long distance disaster recovery, although not high-availability failover. Latency and security concerns limit clustered servers from being spread over a WAN, preventing long distance automated failover.
Real-time asynchronous replication can scale to support one-to-many (data distribution) and many-to-one (data consolidation) configurations, but support for remote offices is still problematic due to the 'always on' nature of real-time replication, which consumes network resources on a constant basis. Data corruption remains a problem for real-time asynchronous replication, where corrupt data is immediately replicated to all of the storage servers in the replication configuration.
Point-in-Time, Asynchronous Data Replication
The preferred solution for remote data management is point-in-time, or scheduled, asynchronous replication. Point-in-time, asynchronous replication offers an excellent compromise between system-level and file-level data protection, and provides the flexibility needed for remote office data protection and data management. Replication jobs are run on a scheduled basis, bringing the target storage server up-to-date at preset intervals and based on corporate policies that can be tailored to each type of data that needs to be protected.
In this sense, point-in-time replication acts more like a backup than a mirror, except that it stores the copied data in its original format so it can be accessed immediately without the need for recovery and rebuilding. Remote offices any distance away can be protected since point-in-time replication does not suffer from distance limitations, and data consolidation and distribution can be scheduled as needed to most efficiently use limited network resources. (Figure 1.)
Remote Office Data Protection: A Recipe for Success
Effectively protecting remote storage servers and providing disaster recovery requires several elements: centralized management, efficient network utilization, flexible scheduling and job configuration, and robust security. Any replication solution or other data protection product that doesn't provide these features can leave remote data vulnerable to loss, corruption, or just not being available, resulting in lost time, money and productivity.
Remote offices generally aren't large enough to have dedicated IT resources on site, so data protection needs to be automated and centrally managed. Replicating remote data to a central office storage server for archiving to tape eliminates the need for tape backup in the remote offices, improves data protection reliability and recoverability, and can save thousands of dollars in hardware and administration costs. Replication software solutions that offer web-based management consoles provide the easiest way to manage and monitor data protection. Since any workstation on the network with a browser can be used to access, manage and monitor replication jobs, local IT professionals can still manage data movement whether they're in the main office, or on rounds to the remote offices. Centralized management also enables IT administrators to automate data movement not only for data consolidation, but also automated data distribution to assure everyone is working from the same data.
High speed, dedicated network connections between corporate offices are usually only, available between the main data centers of an enterprise since usage and maintenance fees for these lines are quite expensive. Smaller remote offices access the corporate network through low-speed lines, or over the Internet on a secured network pipe, and users and applications compete for network bandwidth. Network-optimized replication uses bandwidth throttling, byte-level differential data replication and data compression to keep the amount of data transferred over the network to an absolute minimum. With byte-level differential replication only the differences between the original and replicated file need to be transferred, dramatically reducing the traffic on the network. Applying data compression algorithms to condense that data will continue to shrink the amount of data on the network. Bandwidth throttling allows an administrator to set the maximum percentage of bandwidth that replication will use for data transfer, and if the application supports it, bandwidth throttling can also be set based on the time of day and day of week that the replication takes place.
Not all data is created equal, so flexibility in scheduling data movement is vital for remote data management and protection. Automated scheduling for data replication lets an IT administrator "set it and forget it", so that once configured, all replication happens automatically without any manual intervention from IT staff. Data that has a very low tolerance for data loss can be kept in sync with up-to-the-minute replication. Often both of these types of data are located on the same storage server, so configuring directory or file-level data movement rather than file-system level replication will serve the needs of all levels of data. If network performance is the most important consideration, data transfers can be scheduled "after hours" to eliminate impact to end users.
With increasing news coverage about security breaches compromising business data, replicating data across private and public networks has to be highly secure. The first line of defense for most businesses is the corporate firewall, which is fine for replicating data within a local LAN, and data replication performance can be improved by not using additional security. But when data moves beyond the firewall to the remote office additional layers of data protection are required. The minimum that most replication solutions offer is data encryption. Data is encrypted on the server using any one of several encryption schemas, and then sent over the network to its destination server, which decrypts the data before writing it to disk. However, encryption alone doesn't guarantee that sensitive data isn't being intercepted by the wrong system. To prevent rogue servers from intercepting data in transit, the servers themselves should be digitally authenticated before any data is moved between them. Digital certificates are managed by a central Certificate Authority server, and before any data is moved, the certificates on the source and destination server are validated by the Certificate Authority. Look for replication products that have this functionality built in to provide the additional security without additional management complexity. By adding extra layers of security for remote data protection, data can be moved over inexpensive Internet connections, rather than requiring an expensive private network link, providing a significant cost savings for businesses with many remote offices.
Point-in-time replication solutions, such as Adaptec Snap Enterprise Data Replicator (EDR) software, that encompass all of these features make remote data protection easy, efficient and cost-effective.
Julie Herd Goodman is product manager, NAS application software, at Adaptec, Inc., Milpitas, CA
|Printer friendly Cite/link Email Feedback|
|Title Annotation:||Storage Networking|
|Author:||Goodman, Julie Herd|
|Publication:||Computer Technology Review|
|Date:||Aug 1, 2005|
|Previous Article:||Is SATA ready for the enterprise?|
|Next Article:||Grid storage for grid computing.|