Printer Friendly
The Free Library
14,506,428 articles and books
Member login
User name  
Password 
 
Join us Forgot password?

Unstructured data: the roadblock to effective ILM.


Information Lifecycle Management Information Lifecycle Management refers to a wide-ranging set of strategies for administering storage systems on computing devices. Specifically, four categories of storage strategies may be considered under the auspices of ILM.  (ILM) promises many benefits to IT organizations across diverse industry sectors. ILM products have been on the market for some time, yet few companies are achieving the promised results. So, what's the hold up? As IT leaders take the first step towards ILM by classifying their data, they run into the biggest obstacle to effective ILM--how to deal with unstructured data Data that does not reside in fixed locations. Free-form text in a word processing document is a typical example. Contrast with structured data. See free-form database. .

Unlike so many IT trends that have come and gone without much impact, ILM is an important concept that is changing the way IT professionals think about managing corporate data. Storage systems vendors and software providers are jumping on the ILM bandwagon band·wag·on  
n.
1. An elaborately decorated wagon used to transport musicians in a parade.

2. Informal A cause or party that attracts increasing numbers of adherents:
 with new product offerings--and many IT managers have been very receptive receptive /re·cep·tive/ (re-cep´tiv) capable of receiving or of responding to a stimulus. . The simple reason is that ILM has the potential to deliver workable solutions for real world IT problems.

There are two high priority problems that can be addressed by an effective ILM strategy.

First, the complexity of enterprise data storage management has increased due to the growth of office data contained in word processing word processing, use of a computer program or a dedicated hardware and software package to write, edit, format, and print a document. Text is most commonly entered using a keyboard similar to a typewriter's, although handwritten input (see pen-based computer) and  files, spreadsheets The following is a list of spreadsheets. Freeware/open source software
Online spreadsheets

Main article: List of online spreadsheets
  • EditGrid [1]
  • Simple Spreadsheet [2]
  • wikiCalc
 and presentations. The high degree of variance in data types, as well as the rate of growth and value to the business, adds to this complexity. Moreover, extracting value from these larger stores of information has become increasingly dependent on the company's ability to manage information about the data (metadata (1) (meta-data) Data that describes other data. The term may refer to detailed compilations such as data dictionaries and repositories that provide a substantial amount of information about each data element. ) such as who created a file, when it was last accessed, and so forth. In short, organizations have to cope with a lot more data, and a wider diversity of data, than they did even a few years ago.

Second, with the onslaught of stringent governmental compliance regulations (such as Sarbanes-Oxley, HIPAA (Health Insurance Portability & Accountability Act of 1996, Public Law 104-191) Also known as the "Kennedy-Kassebaum Act," this U.S. law protects employees' health insurance coverage when they change or lose their jobs (Title I) and provides standards for patient health,  and the Patriot Act Patriot Act: see USA PATRIOT Act. ) the stakes are much higher. Non-compliance with these statutes, not to mention existing intellectual property laws, exposes companies to greater legal liability than ever before.

These two trends have combined to create a "perfect storm" for IT organizations. To avoid falling victim to this storm, IT managers now need end-to-end information management solutions that allow them to manage their data across its full lifecycle--from creation to deletion deletion /de·le·tion/ (de-le´shun) in genetics, loss of genetic material from a chromosome.

de·le·tion
n.
Loss, as from mutation, of one or more nucleotides from a chromosome.
. ILM addresses this need. Definitions vary, but in essence ILM is about implementing policies to ensure that corporate information is managed, in accordance Accordance is Bible Study Software for Macintosh developed by OakTree Software, Inc.[]

As well as a standalone program, it is the base software packaged by Zondervan in their Bible Study suites for Macintosh.
 with business requirements. These requirements include resource and asset utilization, business productivity enhancement, regulatory compliance assurance and legal risk mitigation MITIGATION. To make less rigorous or penal.
     2. Crimes are frequently committed under circumstances which are not justifiable nor excusable, yet they show that the offender has been greatly tempted; as, for example, when a starving man steals bread to satisfy
.

Not All Data is Created Equal

Data storage vendors, in their marketing materials and sales pitches, speak directly to these ILM issues. However, if you look closer at the nature of the information management challenge, you will see that solving all of these problems with a single device or software product is virtually impossible.

Why is that? The reason lies in the differences between structured, semi-structured and unstructured data.

There is no such thing as a "typical organization" when it comes to the types of data a business generates and manages. However, a few key industry statistics provide valuable insight into the common scenario within enterprises. Recent studies show that in many enterprise networks as much as 50% of total storage capacity is consumed by unstructured data, such as Microsoft Word A full-featured word processing program for Windows and the Macintosh from Microsoft. Included in the Microsoft application suite, it is a sophisticated program with rudimentary desktop publishing capabilities that has become the most widely used word processing application on the market.  and Excel files, PowerPoint presentations and PDFs. In general, 25-30% of enterprise storage is comprised of structured data in enterprise databases, and the remaining 15-20% is made up of semi-structured data in e-mail or Exchange servers.

Not surprisingly, the tools used to manage these types of data have traditionally been as diverse as the data themselves. Consider structured data. Since databases have been a mainstay within large businesses for decades, the management of information stored in databases has evolved into a fairly exact science.

Benefiting from years of continuous development, database management tools make it possible for IT administrators to automate To turn a set of manual steps into an operation that goes by itself. See automation.  the implementation of policies, including those that govern records retention, privacy protection and so on. In addition, while hundreds of users may access information stored in an enterprise database on any given day, the number of database administrators who actually have management rights over that database is, in most cases, very limited. As a result, achieving ILM goals in a structured data environment is a relatively straightforward exercise. Semi-structured data (specifically e-mail) presents a similar case.

As e-mail has been a primary focus in the corporate world since the mid-1990s, there are now a large number of readily available tools for bringing e-mail systems in line with ILM policies. While not as widely deployed as database management utilities, these semi-structured data management tools enable the implementation of broad policies concerning retention, content and use. Furthermore, while individual e-mails are controlled by the end user, the fact that e-mails are gathered into a central repository (1) A database of information about applications software that includes author, data elements, inputs, processes, outputs and interrelationships. A repository is used in a CASE or application development system in order to identify objects and business rules for reuse.  (e.g. Exchange) under the control of a single administrator makes the implementation of policy relatively easy to accomplish.

Unstructured data, on the other hand, is the Wild, Wild West of enterprise data management.

Removing the Unstructured Data Roadblock

A typical enterprise has tens of millions of files generated by thousands (and often times tens of thousands) of users. These files are created by end users during the normal course of business and are primarily made up of office files, but can also include smaller databases (like Access or FoxPro), JPEGs, MPEGs and TIFFs. In general, these files are stored somewhere on networked storage. They could be on direct attached storage (DAS) systems running NT or Linux, networked attached storage (NAS (1) See network access server.

(2) (Network Attached Storage) A specialized file server that connects to the network. A NAS device contains a slimmed-down operating system and a file system and processes only I/O requests by supporting the popular
) appliances from NetApp or EMC (1) (EMC Corporation, Hopkinton, MA, www.emc.com) The leading supplier of storage products for midrange computers and mainframes. Founded in 1979 by Richard J. Egan and Roger Marino, EMC has developed advanced storage and retrieval technologies for the world's largest companies. , a storage area network fronted by some type of NAS head, or some combination of the three.

Management of these files has typically been the responsibility of the end user. In fact, aside from certain specialized spe·cial·ize  
v. spe·cial·ized, spe·cial·iz·ing, spe·cial·iz·es

v.intr.
1. To pursue a special activity, occupation, or field of study.

2.
 industries such as legal firms or advertising agencies, most IT organizations take a hands-off approach to managing unstructured data. For lack of a better alternative, they rely on written corporate policies to control end-user files, govern storage use and manage data retention.

In effect, end users are responsible for managing their own files in compliance with these corporate policies. The problem with this approach is that end users will not follow these policies unless forced to do so.

Is there an organization anywhere with the IT resources needed to ensure that each and every employee follows the corporate rules? Not likely. Especially since, in most organizations, there is no single point of control to drive the implementation of policy. Even if a single control point existed, IT departments lack the tools to automate the enforcement of policies or the clean-up tasks that policy compliance demands.

So, with no effective data management in place, what is IT's response? The obvious solution to the problem: buy more storage. With the relatively low (and continually declining) cost of networked file storage, organizations have been more than happy to add more and more capacity to the network, rather than investing in processes and software to clean up the ever-expanding mess.

The important point is: not only has there been a distinct lack of effective management tools for unstructured data, there has been an even greater scarcity Scarcity

The basic economic problem which arises from people having unlimited wants while there are and always will be limited resources. Because of scarcity, various economic decisions must be made to allocate resources efficiently.
 of strategic thinking on how to approach this growing problem.

Remember, the goal of ILM is to ensure that enterprise information is managed according to according to
prep.
1. As stated or indicated by; on the authority of: according to historians.

2. In keeping with: according to instructions.

3.
 business requirements. And, for a growing number of organizations, the success of the business depends on easy access to the information stored in unstructured data files.

While structured databases contain factual transaction-based information, and e-mail contains general communications, unstructured files contain mostly business context, including current business, marketing, product and sales plans, as well as preliminary drafts of those plans that may or may not ever be implemented. This context can be of great value to the corporation in the form of collective knowledge that needs to be shared across the corporation. However, it also presents a tremendous risk in the form of litigation An action brought in court to enforce a particular right. The act or process of bringing a lawsuit in and of itself; a judicial contest; any dispute.

When a person begins a civil lawsuit, the person enters into a process called litigation.
 exposure, regulatory mandates and ultimately, cost. In the event of a lawsuit, early versions of memos, speeches, press releases and product plans could be used against the corporation by showing intent.

To be effective, ILM strategies for unstructured data must take these factors into account. Ultimately, what organizations need is a single platform that can measure the business value of unstructured files based on their location, attributes, content and business semantics semantics [Gr.,=significant] in general, the study of the relationship between words and meanings. The empirical study of word meanings and sentence meanings in existing languages is a branch of linguistics; the abstract study of meaning in relation to language or . This platform must deliver the ability to distribute the enforcement of file policies to the file owners, i.e., the end-users who create, share and use the files on a daily basis. And finally, it must minimize the work required by either IT or the end user to implement the solution.

The bottom line is: Only through a systematic approach to managing unstructured files as a corporate asset can organizations realize the full benefits of Information Lifecycle Management.

www.deepfile.com

Jeff Erramouspe is co-founder and chief marketing officer of Deepfile Corporation (Austin, TX)
COPYRIGHT 2004 West World Productions, Inc.
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2004, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.

 Reader Opinion

Title:

Comment:



 

Article Details
Printer friendly Cite/link Email Feedback
Title Annotation:Special ILM Issue; Information Lifecycle Management
Author:Erramouspe, Jeff
Publication:Computer Technology Review
Date:Aug 1, 2004
Words:1463
Previous Article:Not Information Lifecycle Management but Information Value Management.(Special ILM Issue)
Next Article:Policy-based data management in ILM.(Special ILM Issue)(Information Lifecycle Management)
Topics:



Related Articles
ILM: the next wave.(First in / First out)(Information Lifecycle Management)
Business Continuity and ILM: a layered availability solution.(Special ILM Issue)(Information Lifecycle Management)
Policy-based data management in ILM.(Special ILM Issue)(Information Lifecycle Management)
Transparent capacity management.(Storage Management)
Virtual tape: a solid citizen in an ILM world.(Storage Management)(Information Life-Cycle Management )
Information lifecycle management: mastering complexity.(Storage Networking)
The year in storage: data protection led innovations.(Data Protection)
Building compliance, block by block.(Storage Management)(Information Lifecycle Management )
ILM: the promises and the problems.(Storage Management)(information lifecycle management)
ILM ... easier said than done.(SPOTLIGHT: ILM)(Information Lifecycle Management)

Terms of use | Copyright © 2009 Farlex, Inc. | Feedback | For webmasters | Submit articles