Streamline data to support the ILM infrastructure.
Information Lifecycle Management Information Lifecycle Management refers to a wide-ranging set of strategies for administering storage systems on computing devices. Specifically, four categories of storage strategies may be considered under the auspices of ILM. (ILM) has become increasingly important as new regulations continue to mandate retention of more and more data for longer periods of time. But that's not the only reason that efficient ILM infrastructure components are needed.
The "online, all the time" business environment stemming from the Internet and the global business market--where it's always "dbh" somewhere--means that 24X7 data access now requires massive of amounts of data be available and immediately accessible--and not just fixed-content, but "live" data that is actively being accessed.
The problem then becomes, how can you afford to keep that much data online-with its need for massive amounts of primary and archival storage, enormous network capacity to move even quiescent quiescent
at rest; latent; the G0 stage of the cell cycle. data around and very high speed access to "live" data.
And finally, how do you backup--or even more difficult--restore these massive amounts of data, should that become necessary?
In The Beginning
To gain a clear perspective, let's go Let's Go may refer to: Television
Managing information from "cradle to grave" is not a new concept--it's been around for over a quarter of a century. HSM (1) (Hierarchical Storage Management) The automatic movement of files from hard disk to slower, less-expensive storage media. The typical hierarchy is from magnetic disk to optical disc to tape. (hierarchical storage management See HSM. ) and SMS (1) (Storage Management System) Software used to routinely back up and archive files. See HSM.
(2) (Systems Management Server) Systems management software from Microsoft that runs on Windows NT Server. (systems managed storage) software have addressed the issue of managing the file-level storage and migration of data. They moved data from primary to secondary storage (i.e., disk to tape), based primarily on the age of the files.
Evolving data management needs over the ensuing decades brought new solutions (or at least new acronyms) in the form of SRM (1) (Storage Resource Management) The management of the storage resources in an organization in order to avoid duplication of files and to determine space utilization across all servers. . DLM See ILM.
DLM - Distributed Lock Manager on distributed VMS systems. and ILM.
During the 1990s, as shared storage and storage networks began to supplant much direct-attached storage Direct-attached storage (DAS) refers to a digital storage system directly attached to a server or workstation, without a storage network in between. It is a retronym, mainly used to differentiate non-networked storage from SAN and NAS. , SRM (storage resource management) tools became popular. SRM, like HSM, can function as a data management tool, but it deals with network-based (NAS (1) See network access server.
(2) (Network Attached Storage) A specialized file server that connects to the network. A NAS device contains a slimmed-down operating system and a file system and processes only I/O requests by supporting the popular and SAN) storage rather than traditional DAS with network-based backup.
SRM provides storage provisioning and information about disk utilization and access. Depending on the implementation, it may be no more than the moral equivalent of a network management system (NMS See NetWare Management System. ) for storage. Or SRM may support data storage, backup/restore, as well as tools for scaling the storage infrastructure and, in some cases, virtualization An umbrella term for enhancing a computer's ability to do work. Following are the ways virtualization is used.
Partitioning the computer's memory into separate and isolated "virtual machines" simulates multiple machines within one physical computer. capabilities.
Managing Data Over Time
Data Lifecycle Management (DLM) tools automate the process of dealing with data--how it's handled, where it resides, when it's moved, and how long it's retained--based on policies determined by the implementing enterprise.
DLM tools may be used to manage the tiered storage A data storage system made up of two or more types of storage based on their access speed. For example, magnetic disk and tape or magnetic disk and optical disc are widely used in a tiered storage system. See HSM. environment. Generally speaking, the newest (and most frequently accessed) data is stored closer to the point of most frequent access and on the most expensive storage equipment. Older or otherwise less important data is stored on less expensive, slower and more remote storage media.
DLM vs. ILM
DLM and ILM are sometimes used interchangeably, but most storage professionals differentiate them based on the way they handle the data. Data management tools are an important building block in an ILM infrastructure.
DLM products generally deal with data from outside the file, looking at externally visible file attributes, whereas ILM deals with the information contained within the file. ILM is able to access and use the content both in the conduct of business and to make decisions about the value of the data.
Classification, automation and evaluation of data are the critical tasks. While other tools, perhaps SRM, are used to classify the data, DLM deals with day-to-day data management and automates data movement. The ILM process goes the next step by augmenting these functions with policy-based management See policy management. of the information itself, according to according to
1. As stated or indicated by; on the authority of: according to historians.
2. In keeping with: according to instructions.
3. its value to the enterprise.
Because we need a better way to manage the massive amounts of data that enterprises generate, work with on a daily basis and need to retain for business and legal reasons. This need goes beyond the solutions developed for managing data and the storage media where it resides--straining the capabilities of traditional SRM, HSM and DLM products.
What is ILM? Well, one thing that it's not is a product or even a product suite. It's better described as a process that requires both business planning and technology infrastructures to support its realization.
What exactly must the technology infrastructure provide? A means for efficiently storing, continuously protecting and efficiently moving information, while managing content and restoring the data, as needed as needed prn. See prn order. , throughout its life-cycle.
On the technology infrastructure side, ILM product building blocks will span tiered storage solutions, content storage, continuous data protection and DLM tools. But for any of them to become effective components of the ILM technology foundation, they'll need to master the data explosion by efficiently reducing the unnecessary data being stored every day.
We can call this data "unnecessary," because it's redundant--and we're not talking about backup copies, which then, in turn, replicate the redundancy of the original. That goes a long way toward explaining the "data explosion" that everyone is talking about.
Today's Approaches: Compression and Coalescence coalescence /co·a·les·cence/ (ko?ah-les´ens) the fusion or blending of parts.
a fusion or blending of parts.
Many vendors rely on tiered storage and the continually decreasing cost of physical media to reduce the costs associated with the explosively growing amount of online, near-line and offline data that enterprises need to retain. Some products seek to reduce the amount of physical disk needed by reducing the space required to store a given amount of data.
Today's solutions seek to reduce disk utilization by compressing the data or reducing unintended redundancy through coalescence at the file or block levels.
Compression has the advantage of reducing a data amount by a substantial percentage of its original size. The downside, of course, is that to restore the data, the entire compressed unit must be uncompressed to restore even a small portion. The result is that the restore, which is always more of a problem than the initial store or backup operation, is extremely inefficient.
Elimination of file-level redundancy is helpful, but doesn't save as much space as might be hoped. Take, for example, the case of a 2MB PowerPoint presentation that is sent to ten sales people. Each one modifies only the date on the presentation, so most of the information is identical, yet each file as a whole is different, therefore, 22 MB of data is stored when only a few bytes have changed from one copy to the next.
Redundancy elimination at the block level can be considerably more efficient, since blocks are smaller than files, often by orders of magnitude. Yet, because of fixed-length block boundaries, any small change to a single block can have a "domino" or cascading effect on subsequent blocks. This will tend to reduce the efficiency of the space compaction process.
Neither traditional compression nor current data coalescence techniques increase storage efficiencies of today's data management tools to support enterprise ILM initiatives. Redundancy reduction in varying degrees is not enough. Coalescence at a finer level of granularity--streamlining data--is needed to completely eliminate redundancy.
Only by dealing with data at the sub-block level can the level of data compaction (1) Using encoding methods to reduce the amount of data that is stored and transmitted; for example, converting country names into two-character country codes. Eliminating redundancy is another data compaction method.
(2) Another term for data compression. demanded by ILM be achieved. And by adding support for variable-length "sub-blocks," the cascading effect associated with fixed-length blocks can also be avoided.
Data needs to be streamlined if enterprise ILM is to become reality. And streamlining data means employing any and all of the space reduction techniques we've discussed. Further compressing data after the data has already been coalesced co·a·lesce
intr.v. co·a·lesced, co·a·lesc·ing, co·a·lesc·es
1. To grow together; fuse.
2. To come together so as to form one whole; unite: or compacted can achieve even greater storage efficiency.
Transforming DLM Tools Into ILM Infrastructure Components
ILM initiatives will only have a supporting technology infrastructure if enterprises can streamline their data. What does streamlining the raw data mean for ILM? It allows data and content management tools to become much more efficient and become ILM-ready. And that means that businesses using these enhanced tools will be able to store all their data using less space, transfer smaller amounts of data less frequently, reducing both network traffic and overhead. And that's the most obvious way to get your ILM technology infrastructure "ready for primetime."
Todd Viegut is a senior vice president at Rocksoft (Annapolis, MD)