Open-file solutions: choosing the best package for your enterprise. (Storage Management).
Many integrators and resellers have concluded that an open-file solution--software that enables backup software to reliably backup data while files are in use--is essential to meet this mission-critical need for their clients. What remains at issue is which open-file tool should be employed. The right option can lead to a larger degree of data security. The wrong one will have system integrators falling well short of their customer strategies.
How an Open-File Solution Works
A generic open-file solution works by monitoring the file system for read-requests coming from the backup program. When the backup operation is initiated, it begins maintaining data in a pre-write cache for all open files on the system. As the backup progresses, any file-write operating from another application goes directly to the proper file, while a copy of the pre-write data (the data that will be overwritten) is placed by the open-file package into the pre-write cache. When the backup program reaches a part of the file that has been changed during the backup of that file, the open-file software then substitutes the original pre-write data from the pre-write cache to fulfill the backup request.
Generic Vs. Specific Open-File Solutions
There are two conventional ways system integrators deal with the open-file dilemma. One solution is to install an agent that works with specific applications to solve the open-file dilemma. Another solution is to implement a generic open-file utility that provides the backup software with a "window" to the data in the open files.
Dedicated agents are available for a handful of database and email applications. They typically are designed by software companies to support open-file backup of a specific application's data using their backup program. Because of a direct integration, they provide powerful capabilities, such as object-level restores.
However, pricing typically runs from $695 to $2,995 per agent, which can be a costly investment when you consider that each dedicated agent only works with its one defined application. Additionally, agents require ongoing maintenance, as they often must be upgraded when the application or backup software is updated. With the current strain on already tight IT budgets, integrators may very well price themselves out of competition by using specific open-file solutions.
Unlike specific tools, generic open-file utilities are application agnostic. They give backup software access to open files across the board, regardless of originating software application.
Unfortunately, many generic open-file tools are not compatible with all types of backup software. Generic open-file tools that function regardless of backup package being used are therefore at a distinct advantage, especially if an enterprise frequently changes versions or types of backup software, or its primary backup package fails.
For example, consider a company that normally uses third-party software for backup purposes. If the company has deployed a generic open-file tool that only works with that particular package, it would be unable to support other emergency backup solutions such as the standard operating system default backup utility. With a generic open-file tool that works wit all programs, the company could conduct a reliable backup even if its third-party software suddenly failed.
In addition, customers with multiple platforms (such as migrations between NetWare and NT sewers) must often purchase a new generic agent license in order to protect open files on the new system. Because of this, companies must be wary about the type of generic open-file tool they choose. If they do not select wisely, they may have to obtain frequent updates to their generic open-file agents or be unable to obtain complete backups using alternate programs.
Application-specific open-file agents also have compatibility issues. Because they function only with specific applications, changes to the application or backup package might require a corresponding change in the agent. As a result, the customer is always playing catch-up, installing new software and then going through extensive testing to ensure that it is working and configured properly.
Additionally, because they work with specific versions of applications and only with designated backup software, changing an application or software version will generally require replacing the original agent.
Partial Vs. System-Wide Synchronization Methods
Another factor to consider, when choosing an open file solution is whether the utility is capable of providing system-wide synchronization, which ensures the most complete and reliable backup and restore. System-wide synchronization is at the root of transactional integrity. A "transaction" is a set of multiple operations that are logically inseparable. Either all or none of the operations must occur for the system state to remain logically consistent.
If only some but not all of the changes that comprise a logical set are tracked, this is considered a "partial transaction." Partial transactions can exist anywhere that relationships between data exist, such as within a single file, between files, within databases distributed across multiple. volumes, between file content and file attributes, or within file system metadata.
A backup set has transactional integrity if it contains no partial transactions. Conversely, a backup set is corrupt if it does contain partial transactions.
As a simple example, consider a file that includes the text: "Our database is very valuable."
As the file is being backed up without an open-file solution, the text is replaced with: "Valuable data is not corrupt!!"
Since the text is modified as it is backed up, the resulting backup set may then contain: "Our database is very corrupt!!"
Most open-file agents are limited in their capabilities. Instead of system-wide, the best they can offer is file-by-file or volume-by-volume synchronization. In some cases, administrators can overcome the shortcomings of partial synchronization processes by manually grouping related files.
With file-by-file, synchronization is initiated for each open file individually when the backup application first accesses each file. The backup set will contain many synchronization points, one for each open file backed up, rather than a single point-in-time view. This does not achieve transactional integrity for data that is logically related but physically distributed between multiple files, such as databases with separate table files, separate data and index files, or separate data files and transaction logs. Examples include virtually all enterprise database management systems (DBMSs) and email systems.
Volume-by-volume synchronization uses automatic grouping of files, where a set of files resides on the same volume. Different volumes are synchronized to different points in time. This addresses the impractical requirement. of manually maintaining related groups, but does not address inter-application relationships between files. It also does not address intra-application relationships spanning volumes, which are very common.
A standard technique to optimize performance, particularly for DBMSs, is to distribute a file set across multiple volumes, where the volumes reside on separate physical devices. This takes advantage of the I/O parallelism (true simultaneous operation) possible across multiple physical devices. Volume-by-volume synchronization invalidates this technique, if coherent backups are to be achieved.
More generally, application data is commonly distributed across multiple volumes, even without any explicit intent to do so. Application-specific data can exist in the application's installation directory, in its data directory, and in the system directory, which may all reside on separate volumes.
Operating systems allow directory structures to span volumes. This feature permits data sets to grow beyond the limitations of their original volumes and beyond the administrator's projections of capacity requirements. The feature is supported by mount points, for example, which allow one volume's directory tree to seamlessly exist within another volume's directory structure. Volume-by-volume synchronization will provide two points in time in this case, even though individual application data sets may transparently span the volumes. Ultimately, volume-by-volume synchronization is appropriate and reliable only for systems that have just a single volume.
In some situations, administrators can overcome the limitations of partial synchronization processes by manually identifying groups of related files that need to be handled in a synchronized manner. This approach allows the system administrator to explicitly specify groups of files that are interrelated. The generic agent then synchronizes those files together, while still applying file-by-file synchronization to all other files on the system.
However, this approach has several shortcomings that result in corrupt backups:
* It relies on the administrator knowing precisely which files on the system are interrelated, which can be a substantial undertaking.
* It requires the administrator to constantly update group definitions, which is burdensome and often impossible.
* Relationships span applications. Any application that supports "links" to other applications' data can create relationships that the administrator cannot foresee and protect.
The Optimum Choice
In most cases, and especially for Windows and Netware environments, the best solution for integrators is to use a generic open-file tool that allows a company's existing backup software to successfully capture files that are open and in use during the backup process. As a result, all data is backed up completely and without risk of corruption, even if the files are open and in use during the backup. In addition, system integrators need to insure that the open-file product reliably performs system-wide synchronization, ensuring that related files are backed up together and coherently.
The increased incidence of security breaches, when combined with the parallel explosion in data storage, has created a huge potential drain on a business' bottom line. According to the Hurwitz Group, the average large company will store more than 150TB of data in 2003--the equivalent to the storage capacity of approximately 240,000 compact discs. The more data there is to lose and the more vulnerable a network is, the more important it is to implement reliable backup and recovery strategies. Without a backup solution that includes an effective open-file solution, organizations can face crippling data loss and downtime in the aftermath of a virus encounter or other disaster.
And there are more security risks today than ever before. According to the 2001 ICSA Labs Virus Prevalence Survey, 72% of the surveyed companies reported that overall virus problems were either somewhat worse or much worse than the previous year, with total costs of lost data estimated at between $100,000 and $1 million per company per year.
The quantity of mission-critical information organizations are storing, managing and maintaining on computers is growing at an exponential rate. The need for implementing an open-file package has never been greater. Integrators who go with an open-file tool that enables system-wide synchronization will find the implementation easier for both themselves and their customers.
April Nelson is Open File Manager product manager at St. Bernard Software (San Diego, Calif.)
|Printer friendly Cite/link Email Feedback|
|Publication:||Computer Technology Review|
|Article Type:||Buyers Guide|
|Date:||Feb 1, 2003|
|Previous Article:||Accessing web-services. (Storage Management).|
|Next Article:||SRM Workflow and Automation. (Storage Management).|