Streamline data to support the ILM infrastructure.
The "online, all the time" business environment stemming from the Internet and the global business market--where it's always "dbh" somewhere--means that 24X7 data access now requires massive of amounts of data be available and immediately accessible--and not just fixed-content, but "live" data that is actively being accessed.
The problem then becomes, how can you afford to keep that much data online-with its need for massive amounts of primary and archival storage, enormous network capacity to move even quiescent data around and very high speed access to "live" data.
And finally, how do you backup--or even more difficult--restore these massive amounts of data, should that become necessary?
In The Beginning
To gain a clear perspective, let's go back to the roots of today's ILM movement, review current approaches and discuss what's still needed to make today's products ILM-ready.
Managing information from "cradle to grave" is not a new concept--it's been around for over a quarter of a century. HSM (hierarchical storage management) and SMS (systems managed storage) software have addressed the issue of managing the file-level storage and migration of data. They moved data from primary to secondary storage (i.e., disk to tape), based primarily on the age of the files.
Evolving data management needs over the ensuing decades brought new solutions (or at least new acronyms) in the form of SRM. DLM and ILM.
During the 1990s, as shared storage and storage networks began to supplant much direct-attached storage, SRM (storage resource management) tools became popular. SRM, like HSM, can function as a data management tool, but it deals with network-based (NAS and SAN) storage rather than traditional DAS with network-based backup.
SRM provides storage provisioning and information about disk utilization and access. Depending on the implementation, it may be no more than the moral equivalent of a network management system (NMS) for storage. Or SRM may support data storage, backup/restore, as well as tools for scaling the storage infrastructure and, in some cases, virtualization capabilities.
Managing Data Over Time
Data Lifecycle Management (DLM) tools automate the process of dealing with data--how it's handled, where it resides, when it's moved, and how long it's retained--based on policies determined by the implementing enterprise.
DLM tools may be used to manage the tiered storage environment. Generally speaking, the newest (and most frequently accessed) data is stored closer to the point of most frequent access and on the most expensive storage equipment. Older or otherwise less important data is stored on less expensive, slower and more remote storage media.
DLM vs. ILM
DLM and ILM are sometimes used interchangeably, but most storage professionals differentiate them based on the way they handle the data. Data management tools are an important building block in an ILM infrastructure.
DLM products generally deal with data from outside the file, looking at externally visible file attributes, whereas ILM deals with the information contained within the file. ILM is able to access and use the content both in the conduct of business and to make decisions about the value of the data.
Classification, automation and evaluation of data are the critical tasks. While other tools, perhaps SRM, are used to classify the data, DLM deals with day-to-day data management and automates data movement. The ILM process goes the next step by augmenting these functions with policy-based management of the information itself, according to its value to the enterprise.
Because we need a better way to manage the massive amounts of data that enterprises generate, work with on a daily basis and need to retain for business and legal reasons. This need goes beyond the solutions developed for managing data and the storage media where it resides--straining the capabilities of traditional SRM, HSM and DLM products.
What is ILM? Well, one thing that it's not is a product or even a product suite. It's better described as a process that requires both business planning and technology infrastructures to support its realization.
What exactly must the technology infrastructure provide? A means for efficiently storing, continuously protecting and efficiently moving information, while managing content and restoring the data, as needed, throughout its life-cycle.
On the technology infrastructure side, ILM product building blocks will span tiered storage solutions, content storage, continuous data protection and DLM tools. But for any of them to become effective components of the ILM technology foundation, they'll need to master the data explosion by efficiently reducing the unnecessary data being stored every day.
We can call this data "unnecessary," because it's redundant--and we're not talking about backup copies, which then, in turn, replicate the redundancy of the original. That goes a long way toward explaining the "data explosion" that everyone is talking about.
Today's Approaches: Compression and Coalescence
Many vendors rely on tiered storage and the continually decreasing cost of physical media to reduce the costs associated with the explosively growing amount of online, near-line and offline data that enterprises need to retain. Some products seek to reduce the amount of physical disk needed by reducing the space required to store a given amount of data.
Today's solutions seek to reduce disk utilization by compressing the data or reducing unintended redundancy through coalescence at the file or block levels.
Compression has the advantage of reducing a data amount by a substantial percentage of its original size. The downside, of course, is that to restore the data, the entire compressed unit must be uncompressed to restore even a small portion. The result is that the restore, which is always more of a problem than the initial store or backup operation, is extremely inefficient.
Elimination of file-level redundancy is helpful, but doesn't save as much space as might be hoped. Take, for example, the case of a 2MB PowerPoint presentation that is sent to ten sales people. Each one modifies only the date on the presentation, so most of the information is identical, yet each file as a whole is different, therefore, 22 MB of data is stored when only a few bytes have changed from one copy to the next.
Redundancy elimination at the block level can be considerably more efficient, since blocks are smaller than files, often by orders of magnitude. Yet, because of fixed-length block boundaries, any small change to a single block can have a "domino" or cascading effect on subsequent blocks. This will tend to reduce the efficiency of the space compaction process.
Neither traditional compression nor current data coalescence techniques increase storage efficiencies of today's data management tools to support enterprise ILM initiatives. Redundancy reduction in varying degrees is not enough. Coalescence at a finer level of granularity--streamlining data--is needed to completely eliminate redundancy.
Only by dealing with data at the sub-block level can the level of data compaction demanded by ILM be achieved. And by adding support for variable-length "sub-blocks," the cascading effect associated with fixed-length blocks can also be avoided.
Data needs to be streamlined if enterprise ILM is to become reality. And streamlining data means employing any and all of the space reduction techniques we've discussed. Further compressing data after the data has already been coalesced or compacted can achieve even greater storage efficiency.
Transforming DLM Tools Into ILM Infrastructure Components
ILM initiatives will only have a supporting technology infrastructure if enterprises can streamline their data. What does streamlining the raw data mean for ILM? It allows data and content management tools to become much more efficient and become ILM-ready. And that means that businesses using these enhanced tools will be able to store all their data using less space, transfer smaller amounts of data less frequently, reducing both network traffic and overhead. And that's the most obvious way to get your ILM technology infrastructure "ready for primetime."
Todd Viegut is a senior vice president at Rocksoft (Annapolis, MD)
|Printer friendly Cite/link Email Feedback|
|Title Annotation:||Storage Management; Information Lifecycle Management|
|Publication:||Computer Technology Review|
|Date:||Jan 1, 2005|
|Previous Article:||SOX compliance: cutting through the static.|
|Next Article:||iSCSI over distance: how to avoid disappointment.|