ILM: the promises and the problems.Information Lifecycle Management Information Lifecycle Management refers to a wide-ranging set of strategies for administering storage systems on computing devices. Specifically, four categories of storage strategies may be considered under the auspices of ILM. is a new storage paradigm that has been embraced to some degree by almost every enterprise storage hardware, software and systems vendor. We believe that the time has come to take a critical look at ILM, and explore both the benefits and the challenges associated with ILM. What is ILM? Where Did it Come From? The concept behind Information Lifecycle Management is both simple and powerful. With ILM, information is always stored in the right place. In this definition, always recognizes that the value of information, and its access requirements, changes at different points in its lifetime, from creation to eventual destruction. Right place means the least expensive storage resource available that meets the operational requirements (programming) operational requirements - Qualitative and quantitative parameters that specify the desired capabilities of a system and serve as a basis for determining the operational effectiveness and suitability of a system prior to deployment. for that piece of information, at that particular point in time. These operational requirements may include such variables as time-to-access, levels of protection and security, retention characteristics, etc. ILM is strongly related to HSM (1) (Hierarchical Storage Management) The automatic movement of files from hard disk to slower, less-expensive storage media. The typical hierarchy is from magnetic disk to optical disc to tape. (Hierarchical Storage Management See HSM. ). Where HSM was developed for direct-attached storage Direct-attached storage (DAS) refers to a digital storage system directly attached to a server or workstation, without a storage network in between. It is a retronym, mainly used to differentiate non-networked storage from SAN and NAS. , first in the mainframe environment and then adapted for client/server systems, ILM applies to networked storage. HSM is a 2-dimensional structure where data is automatically migrated from primary to secondary storage when certain policy parameters are met, such as the age of the data or the time since it was last accessed. Additional layers, or tiers, may be deployed to further take advantage of cost savings. For example, the primary storage may be a RAID array that stores all newly created or modified data files. After 60 days, the file would then be moved to an automated tape library that provides near-line access but is much less expensive than the RAID system. After a year, the data may be migrated to an off-line tape that is stored on a shelf, further reducing the costs of retaining this information. The migration happens automatically and in the background, and is transparent to users and applications. In effect, the capacity of the primary storage resource is extended or expanded by the capacity of the secondary resource(s). This is done by virtualizing, or combining, all of these resources into a single file system. The HSM engine may keep track of where migrated files are physically located, or it may leave pointers at the original file location on the primary device to indicate where the file was moved. In either case, when access to a migrated file is requested, the HSM software retrieves it from the secondary storage and delivers it to the user or application, just as if it were stored in its original location. The only difference may be in the amount of time it takes to retrieve the file, since many secondary storage resources, such as tape and optical disk libraries See optical disc library. , rely on robotics robotics, science and technology of general purpose, programmable machine systems. Contrary to the popular fiction image of robots as ambulatory machines of human appearance capable of performing almost any task, most robotic systems are anchored to fixed positions to move a cartridge (1) See phono cartridge. (2) A removable storage module that contains magnetic disks, optical discs, magnetic tape or memory chips. Cartridges are inserted into slots in the drive, printer or computer. from a shelf in the library and load it into a drive to read and write data. For this reason, the timeout values for applications that interface with HSM (and ILM) may need to be adjusted. ILM is more of a 3-dimensional model. It pools all of the available storage resources on a storage network into a single, large virtual repository. These resources are then organized into storage classes, each with its own value proposition (e.g., cost vs. performance). Programmed policies then monitor all of the information stored in this pool, and when conditions change such that the requirements of a policy are met, the affected data is moved to the storage class specified in the policy. Instead of using the tiered approach of the HSM model, ILM can move data to and between any of the devices in the storage network. These devices may include enterprise-class disk arrays (with appropriate levels of mirroring and data protection), NAS (1) See network access server. (2) (Network Attached Storage) A specialized file server that connects to the network. A NAS device contains a slimmed-down operating system and a file system and processes only I/O requests by supporting the popular and CAS filers, tape and optical disk libraries, and even off-site storage. But in practice, almost all ILM policies will define an HSM-like tiered approach to data management. There are few applications that, as a rule, have data sets that need to be moved from secondary storage back to primary storage at predetermined pre·de·ter·mine v. pre·de·ter·mined, pre·de·ter·min·ing, pre·de·ter·mines v.tr. 1. To determine, decide, or establish in advance: times. For example, old medical records may need to be retrieved prior to a patient's visit, or an archived legal transcript A generic term for any kind of copy, particularly an official or certified representation of the record of what took place in a court during a trial or other legal proceeding. A transcript of record may be needed when a new proceeding is started; but these are event-driven access requirements and cannot be programmed into a global policy. Benefits of ILM The introduction of storage networking allowed IT administrators to centralize cen·tral·ize v. cen·tral·ized, cen·tral·iz·ing, cen·tral·iz·es v.tr. 1. To draw into or toward a center; consolidate. 2. the management of diverse storage resources, using common tools from a single console to handle tasks such as resource utilization, adding and removing capacity, provisioning and data protection. ILM extends that concept to centralizing cen·tral·ize v. cen·tral·ized, cen·tral·iz·ing, cen·tral·iz·es v.tr. 1. To draw into or toward a center; consolidate. 2. the management of the data that is created and used by perhaps hundreds of applications and thousands of users. But the big benefit of ILM, and really the only reason to deploy it, is cost savings. The cost of managing information is measurable, but varies greatly between different organizations (and even within organizations). Having policies that manage where data is stored will certainly reduce these costs, but a detailed study would need to be conducted to determine the exact ROI (Return On Investment) The monetary benefits derived from having spent money on developing or revising a system. In the IT world, there are more ways to compute ROI than Carter has liver pills (and for those of you who never heard of that expression, it means a lot). . Instead, the cost of implementing ILM can easily be justified by the cost savings in the storage devices themselves. In general, the cost of storage is directly related to its data access and transfer performance: the faster the device, the more expensive it will be. In almost all cases, the capacity requirement for older data grows much faster than the need for capacity for new data. The exceptions are new and/or rapidly expanding businesses where active data is created in greater volumes than older data which becomes "less-active" or "inactive in·ac·tive adj. 1. Not active or tending to be active. 2. a. Not functioning or operating; out of use: inactive machinery. b. ". Once the organization understands the dynamics of its data--what percentage is active, less active, and inactive--and how that will change over time, it can determine the cost savings it can achieve by automatically migrating data to the appropriate storage devices as the data ages. Growing the lower end of the storage spectrum instead of the high end, as overall capacity requirements increase, should save the organization an enormous amount of capital. Additionally, by storing inactive, or "fixed" content on reliable secondary media, such as tape or optical disks, the costs of continually backing up that data can be eliminated. As compared to an environment where all data is stored on primary storage, backup windows and associated data protection hardware/software expenditures can be reduced by as much as 80%. Copies of this older data still need to be stored off-site for disaster recovery purposes, of course, but again at lower cost than critical, active data that will be needed to recover quickly and restart To resume computer operation after a planned or unplanned termination. See boot, warm boot and checkpoint/restart. operations following a disaster event. The Problems with ILM There are three primary challenges that ILM vendors have yet to fully address. The first of these challenges has been discussed at some length by others: effective ILM relies on the definition and management of data movement policies. Someone needs to examine every application, and every type of data created and/or used by that application, and make decisions on how that data is used by the organization and where it should be stored at various points in its lifecycle. In the complex environments that would most benefit from ILM, this analysis can be an overwhelming challenge. Often, workflow consultants will need to be engaged to help with this process. Policy management also becomes an on-going activity, as the addition of new applications and changing data access requirements will create the need for new policies. The second issue deals with implementation. When taking control of the storage-networking environment, some ILM solutions do not recognize or assimilate as·sim·i·late v. 1. To consume and incorporate nutrients into the body after digestion. 2. To transform food into living tissue by the process of anabolism. existing storage partitions. This means that all existing data will need to be moved from the devices it currently resides on. The device can then be added to the ILM storage pool, and the data copied back. For organizations that are feeling enough data management pain to make them look at an ILM solution, this conversion process may be huge. Other ILM solutions will handle this assimilation Assimilation The absorption of stock by the public from a new issue. Notes: Underwriters hope to sell all of a new issue to the public. See also: Issuer, Underwriting Assimilation in the background, eventually moving all data and resources into its domain. Customers should fully understand the process of adding ILM to their current data and storage environment, and how long it will take. The third issue is more architectural. One of the benefits of storage networks is the centralizing of device management, as noted above, while communication between application servers and their assigned storage resources remains decentralized de·cen·tral·ize v. de·cen·tral·ized, de·cen·tral·iz·ing, de·cen·tral·iz·es v.tr. 1. To distribute the administrative functions or powers of (a central authority) among several local authorities. , via a switched network. But by pooling all storage network resources into a single virtual repository, ILM creates the need for a virtual file system that controls the entire storage network and keeps track of where everything is stored. This means that the ILM controller becomes a single, central point for all data access requests. In large, complex systems, the ILM controller may become a major bottleneck A lessening of throughput. It often refers to networks that are overloaded, which is caused by the inability of the hardware and transmission lines to support the traffic. It can also refer to a mismatch inside the computer where slower-speed peripheral buses and devices prevent the CPU . This may then limit the scale of an ILM implementation, and thereby limit its value proposition. The ILM controller also becomes a potential single point of failure for the entire storage network. Data protection programs, such as nightly night·ly adj. 1. Of or occurring during the night; nocturnal: the cat's nightly prowl. 2. Happening or done every night: the physician's nightly rounds. and weekly backup, will also be complicated by relying on the ILM controller to feed the backup application with the data that needs to be backed up. The backup program Software that copies data from a single machine or from selected computers in a network to a secondary storage medium. Backups can be scheduled at periodic intervals, or individual files can be automatically backed up right after they have been updated. also needs to be made aware of what data is stored on secure secondary resources (i.e., tape) in order to exclude it from full backups See backup types. . Some ILM controllers sit in the data path, further exacerbating ex·ac·er·bate tr.v. ex·ac·er·bat·ed, ex·ac·er·bat·ing, ex·ac·er·bates To increase the severity, violence, or bitterness of; aggravate: the bottleneck problem. In addition to handling file system lookup A data search performed within a predefined table of values (array, matrix, etc.) or within a data file. tasks from the myriad of users and applications, the data must move through the ILM controller. Additionally, data being migrated between devices, based on the programmed ILM policies, also must move through the ILM controller. Other ILM systems use an out-of-band approach to monitor and manage data movement. These implementations therefore require distributed software agents and data movers Also called a "storage router," it is a device in a backup system that manages the transfer of data to the backup storage. See LAN free backup. to act on behalf of the ILM controller. All file system lookup activity is still handled by the ILM controller, as is backup and restore activity. Lastly, some companies are proposing the placement of the ILM engine within the fabric of the storage network, using the capabilities of a new generation of intelligent switches. This approach holds promise, but so far it's just that. Summary Information Lifecycle Management offers the potential of extending the value of storage networks to data management. Data movement tasks now performed manually can be programmed to occur automatically at the times defined by the organization. Resource utilization can be improved when ILM pools like-systems into virtual storage classes, further reducing the need for new storage acquisitions. And new acquisitions should be geared to more cost-effective secondary storage systems. But before selecting and implementing an ILM solution, users need to fully understand the effects it will have on their organization and on their storage network. Data management activities will be replaced with policy management. There is a lot of up-front data analysis that needs to be performed. And the ILM architecture needs to be evaluated, designed and tuned to minimize the performance hit that will result from centralizing data access. The cost of the ILM solution needs to be balanced against the potential savings, while taking the scalability of the solution into account. Richard Vining Vining is the name of several places in the United States:
rvining@signiant.com |
|
||||||||||||||||||

Printer friendly
Cite/link
Email
Feedback
Reader Opinion