Distributed backup is the key to ILM.
The Case for ILM
Since enterprises are so dependent on information about their processes, products, customers and suppliers, data storage is a challenge for IT executives and storage administrators everywhere. Reliable and secure data storage is crucial to business continuity plans. Many industries, such as finance and health care, face new regulatory policies that mandate ever-increasing durations of data retention.
Because of the combination of more data and longer retention times, the cost of managing information throughout its lifecycle grows as much as 20% to 30% per year, according to some estimates.
Though opinions vary, for the purposes of this article ILM will be defined as a data archiving process that automatically moves data to the most cost-effective storage media, based on predetermined policies of accessibility, security and long-term storage. Data is transferred automatically, with no manual intervention required, reducing hardware and real estate costs. As a result, ILM vendors promise a significant Return on Investment (ROI).
Archiving Versus Backup
All of an enterprise's data can be placed into one of two categories. Critical information is that which is needed for day-to-day operations and resides in the system's primary storage for fast access. Important information is the historical, legal and regulatory information that can safely be archived to secondary storage--lower cost disk or tapes stored offsite.
Critical data is typically accessed often. However, as a given file is accessed less and less frequently, over time this data eventually changes from critical to important. If, as a matter of policy, a file ceases to be critical and becomes important after ninety days of inactivity, an ILM solution automatically archives this data after ninety days to secondary storage, without any intervention by IT personnel. ILM solutions create a pointer or placeholder for every file moved to secondary storage. Should a user ask for a file after ninety days (if the important information becomes critical) this placeholder points to the new location and the system can retrieve it and move it back to primary storage.
Archiving data that is no longer needed for day-to-day operations by moving it to long-term storage is distinctly, functionally different from backup operations which protect operational, critical data before it can be archived.
One key failing of backup systems that are not ILM-aware is that they will continue to store backup files on tape or secondary disk, even though this data has been archived elsewhere. Since this secondary storage must still be managed, the overall return on the ILM investment will be considerably less than anticipated.
The Figure illustrates this process in a typical e-mail setup. This architecture includes a backup system that protects critical data on primary storage before it is archived to lower-cost disks or tape by an ILM solution. This traditional tape-based backup is the ILM solution's Achilles' heel when it comes to ROI.
The Problem With Backup
Typically, the backup saves files from primary storage to secondary storage on a daily basis. As long as a file remains critical (on primary storage) it will be backed up routinely--daily, in most enterprises. This means that the same file, often in multiple versions, is saved and stored many times, resulting in excessive hardware or media costs, administration time, and storage real estate, both onsite and offsite. A backup approach that is ILM-aware, and overcomes this problem, is Distributed Backup.
One advantage of using Distributed Backup in the ILM environment is that it eliminates the need for daily backups to tape, and the subsequent rotation, retrieval and storage of these tapes.
A Distributed Backup system collects the data to be backed up from LAN clients and sends it to offsite disk storage in a compressed and encrypted format. It also retrieves this data from offsite when it is needed for a restore. Because the process is fast and fully automated, backups can take place as often as desired.
ILM-aware Distributed Backup or, more simply. Backup Lifecycle Management (BLM), takes advantage of the ILM archive's placeholders to keep only one copy of the file on either backup or secondary storage--but not both. These placeholders help the backup determine which files have already been archived. This allows it to automatically remove them from the backup disks, freeing up storage space and eliminating file duplication.
When BLM recognizes a placeholder in the backup data received from the client, it knows that the associated file has been transferred to secondary storage. It therefore searches the backup disk for the original file, deletes it, and saves only the placeholder.
Thus, only current data in primary storage is backed up to the disk, keeping disk size and cost, to a minimum. Compared to tape backup, hardware and storage costs are lowered dramatically, and day-to-day backup administration is virtually eliminated. Distributed Backup also results in faster, more frequent backups and simpler restore operations.
While there are many backup solutions on the market, not all are ILM-ready, even among those that back up to disk. It is important to note that simply replacing tape with low-cost disk will not provide the technological advantages of a tested, technologically distinct BLM architecture.
Information Lifecycle Management is a growing trend that promises substantial savings in hardware and administration, but not if the existing backup system is ILM-unfriendly. To achieve the expected ROI, most enterprises will find it well worth choosing Distributed Backup that replaces traditional tape backup and integrates with ILM's unique technology for the greatest reduction in cost and complexity.
Eran Farajun is senior executive vice president of Asigra, Inc. (Toronto, Canada)