Printer Friendly
The Free Library
14,558,366 articles and books
Member login
User name  
Password 
 
Join us Forgot password?

Best practices for implementing data lifecycle management solutions.


What is the definition of high value data? One can argue that all data is mission critical at some point in its life. The idea behind data lifecycle management is that data has different values at different moments in time and should be managed differently in different phases of the life cycle. For instance, a law firm would consider the legal documents required for discovery and litigation An action brought in court to enforce a particular right. The act or process of bringing a lawsuit in and of itself; a judicial contest; any dispute.

When a person begins a civil lawsuit, the person enters into a process called litigation.
 in an active as very important before the final ruling is made, but afterwards af·ter·ward   also af·ter·wards
adv.
At a later time; subsequently.


afterwards or afterward
Adverb

later [Old English æfterweard]

Adv. 1.
 these documents diminish in value over time. However, there are exceptions, such as the landmark case landmark case Law & medicine A civil or, far less commonly, criminal action that has had an impact on a particular area of medicine.  of Rowe vs. Wade, where the possibilities of re-use of the data are significantly higher.

Data lifecycle management solutions enable IT managers to assemble the appropriate combination of storage devices, media types, and network infrastructure to create a proper balance of performance, data accessibility, easy retrieval cost, and data reliability based on the relative value of two to the business. Consider how insurance companies store client forms on-line for two years to comply with government regulations and ensure responsive customer service; but after the retention period is passed, this data is migrated to off-line tape storage.

There are several components of a data lifecycle management solution. First, you need to have an inventory of your existing data and storage resources. You should classify your data and associate it with your business requirements and relative importance. Finally, you need to have data management policies that decide how to best match data with the appropriate storage resource. Should data be stored on-line, near-online, or off-line and when should data be replicated, migrated, or deleted? The ability to intelligently and dynamically automate these decisions and actions is the heart of a data lifecycle management solution. Following, are the best practices of implementing data lifecycle management solutions.

Everything Old is New Again

It is very easy to simply state that data lifecycle management is simply Hierarchical Storage Management See HSM.  (HSM (1) (Hierarchical Storage Management) The automatic movement of files from hard disk to slower, less-expensive storage media. The typical hierarchy is from magnetic disk to optical disc to tape. ) born again. Why invent this new category called data lifecycle management? Vendors first developed HSM products in the mid-1990s in the mainframe environment for distributed computing (1) The use of multiple computers networked throughout a wide geographical area, or the world via the Internet, in order to solve a single problem. See grid computing.

(2) The use of multiple computers in an enterprise rather than one centralized system.
 implementations. In HSM implementations, data automatically moves from expensive hard disks to less expensive optical media or to tape, according to according to
prep.
1. As stated or indicated by; on the authority of: according to historians.

2. In keeping with: according to instructions.

3.
 specific policies.

HSM technologies have their shortcomings A shortcoming is a character flaw.

Shortcomings may also be:
  • Shortcomings (SATC episode), an episode of the television series Sex and the City
, though, which have slowed adoption. For instance, HSM products lack integration of data intelligence and scalability in a network environment, increasing management complexity in large disparate client/server environments A networking environment that is made up of clients and servers running applications designed for client/server architecture. See client/server. . Before administrators can build HSM migration rules, they first need to understand the enterprise's data usage patterns and storage resource availability. What is your total capacity, available and used? What are your most active files and which files consume the most storage space? This type of information is typically not readily available unless a storage assessment was recently completed. Consequently, administrators using HSM products would often create migration rules in a vacuum. A rule such as migrate all *.doc files (DOCument file) A file created in a version of Microsoft's word processing application prior to Microsoft Office 2007. Doc files use a .DOC extension and differ from text files (.  not accessed within the last 60 days may be appropriate fur one group of users, but may negatively impact productivity of another group or department.

Another issue stems from the fact that legacy HSM solutions were architected for static one-to-one relations between source and target volumes. Administrators were limited to manually configuring each dedicated HSM volume as a pre-defined source and/or destination device, and any migration rules could not be shared between multiple HSM servers. This has made HSM technologies impractical im·prac·ti·cal  
adj.
1. Unwise to implement or maintain in practice: Refloating the sunken ship proved impractical because of the great expense.

2.
 to implement for large distributed client/server environments. Finally, HSM technologies lack a global view of all storage resources in the network and instead require the administrator to constantly change the migration rules to adapt to the changes in the storage environment. For example, what if the original migration target is getting full? What if the administrator brings up a new storage device that can be used as migration target?

Data Lifecycle Management: The Need

Clearly, data lifecycle management at minimum represents a significant evolution of HSM techniques. IT managers are rethinking their migration strategies, fearing HSM was far too simplistic sim·plism  
n.
The tendency to oversimplify an issue or a problem by ignoring complexities or complications.



[French simplisme, from simple, simple, from Old French; see simple
 an approach. Today, many companies now want real-time access to their data for longer time periods. Consider how this requirement affects credit card transactions. In the past, credit card transactions were generally completed within a 120-day cycle. Because customers now have Web access to their credit card accounts, they now want the ability to review their transactions for the past year or perhaps longer. Even if customers don't use the data, it must be readily accessible or the value of the service is lost.

With the continuous reduction in the cost of DASD (Direct Access Storage Device) Pronounced "daz-dee." A peripheral device that is directly addressable, such as a disk or drum. The term is used in the mainframe world.

DASD - Direct-Access Storage Device
, IT administrators are storing more and more data on-line in order to speed access time and take advantage of new low-cost technologies such as ATA (1) (AT Attachment) The specification for IDE drives. See IDE.

(2) See analog telephone adapter.

ATA - Advanced Technology Attachment
 disk-based RAID. Customers want to implement tiered-storage solutions to automate data migration to these new low-cost storage devices. They want to migrate duplicate or inactive data from primary storage to secondary storage devices to reduce backup and recovery windows.

The key question for IT managers is how do they determine which data should reside on one storage device vs. another (i.e., highest performing, most highly available but costly storage vs. low-cost storage)? In fact, one of the toughest challenges with data lifecycle management is profiling the relative value or criticality of data and storage resources to the business. Only then can administrators match the right data on the right storage at the right time. By properly placing data on the appropriate storage according to business needs, IT can more effectively distribute data across multiple resources, which leads to improved storage utilization and reduced storage acquisition costs. Automate this process and IT can benefit from improved productivity levels as well.

Data Discovery and Identification

The right starting point Noun 1. starting point - earliest limiting point
terminus a quo

commencement, get-go, offset, outset, showtime, starting time, beginning, start, kickoff, first - the time at which something is supposed to begin; "they got an early start"; "she knew from the
 when implementing a data lifecycle management solution is to inventory your data. What are the different types of data files users create in your organization? What file types are most or least common? Do data types vary by user or departments? How important are these files for the business? How much space does the individual data type consume on each volume? Can you predict their growth based on trends?

A data lifecycle solution should scan each server and volume and its contents in the network and summarize sum·ma·rize  
intr. & tr.v. sum·ma·rized, sum·ma·riz·ing, sum·ma·riz·es
To make a summary or make a summary of.



sum
 the information in various reports. Moreover, the reports should provide the information at various levels, ranging from a global network perspective down to the individual volumes; allowing the administrator to drill down on specific volumes when necessary. This enables administrators to quickly and non-intrusively determine total capacity (available and used) by servers, directories, volumes and groups. Administrators are also able to report on the parameters of the data, such as space consumption by top N files and directories, file age distribution and data usage patterns, stale stale

horseman's term for the act of urination by a horse.
 file analysis, and more.

In the past, this data collection process was done manually using custom-developed scripts, pencil and paper pencil and paper - An archaic information storage and transmission device that works by depositing smears of graphite on bleached wood pulp. More recent developments in paper-based technology include improved "write-once" update devices which use tiny rolling heads similar to mouse , and custom spreadsheet that summarized the results. Traditional storage resource management (SRM (1) (Storage Resource Management) The management of the storage resources in an organization in order to avoid duplication of files and to determine space utilization across all servers. ) technologies requires agents to collect such information. However, newer SRM tools follow the agent-less approach. This new approach utilizes industry standard protocols such as CIFS (Common Internet File System) The file sharing protocol used in Windows. It evolved out of the SMB (Server Message Block) protocol in DOS, which is why the terms CIFS/SMB and SMB/CIFS are sometimes seen. The word "Internet" in the CIFS name has little relevance.  and NFS (Network File System) The file sharing protocol in a Unix network. This de facto Unix standard, which is widely known as a "distributed file system," was developed by Sun. See file sharing protocol and WebNFS.

NFS - Network File System
, as well as other network services such as Microsoft Active Directory and NIS Niš or Nish (both: nēsh), city (1991 pop. 175,391), SE Serbia, on the Nišava River. An important railway and industrial center, it has industries that manufacture textiles, electronics, spirits, and locomotives. . This allows the administrators to quickly access their storage environment, especially in environments where storage servers are installed without IT knowledge.

Data Classification and Valuation

The next step is to organize and classify the data and storage resources--across multiple storage volumes--into logical resource groups. A data lifecycle management solution should have the intelligence to automatically group together certain file types, such as image files (i.e., jpeg or tiff (Tagged Image File Format) A widely used bitmapped graphics file format developed by Aldus and Microsoft that handles monochrome, gray scale, 8-and 24-bit color. ) or audio/video files (i.e., mp3 or way). Administrators should also be able to create custom file groups using attributes such as file type, extension, age, size, last modified/accessed times, directory paths, or owners.

A major benefit of the file resource groups is the administrators' ability to categorize cat·e·go·rize  
tr.v. cat·e·go·rized, cat·e·go·riz·ing, cat·e·go·riz·es
To put into a category or categories; classify.



cat
 and prioritize pri·or·i·tize  
v. pri·or·i·tized, pri·or·i·tiz·ing, pri·or·i·tiz·es Usage Problem

v.tr.
To arrange or deal with in order of importance.

v.intr.
 their data according to organization-specific business rules, simplifying their data management policies without worrying about identifying and managing individual files or directories. For instance, an engineering company may group together all the CAD/CAM CAD/CAM
 in full computer-aided design/computer-aided manufacturing.

Integration of design and manufacturing into a system under direct control of digital computers.
 files not accessed within the last 180 days which are located in several project folders. Once the engineering team has completed the prototype of the new product design, the IT team may want to migrate this data from primary storage (e.g., SAN) to lower-cost storage (e.g., ATA-based RAID) in order to ensure only their most active projects utilized their high-performance storage. Another example might involve the administrator setting tip a different file group for all spreadsheets created by the Finance department and applying a more vigorous data replication strategy to this file group for business continuity purpose.

A data lifecycle management solution should also logically organize storage volumes into resource groups based on attributes such as storage type, cost her megabyte One million bytes, or more precisely 1,048,576 bytes. Also MB, Mbyte and M-byte. See mega and space/time.

(unit) megabyte - (MB, colloquially "meg") 2^20 = 1,048,576 bytes = 1024 kilobytes. 1024 megabytes are one gigabyte.
, capacity and make/model. These storage groups might logically bind together volumes that span multi-vendor storage systems, which greatly simplifies overall administration. For instance, an Administrator could create a storage group for all ATA-RAID devices and designate it as a migration target. This reduces the need to modify any existing policies when a new ATA-RAID device is brought online.

Policy Automation Engine

At the heart of a data lifecycle management solution is the policy engine, which incorporates these global resource groups into central migration policies. Even though the files and volumes reside in many different storage devices on the network, the policy engine executes policy actions on the data to achieve different storage objectives. If the objective is to optimize cost, then the policy engine will make the best data placement decisions based on the different levels of cost within the storage network, making the more cost-effective use of storage resources. As an illustration, the administrator would set a migration threshold (i.e., 70%) and if any volume within a high-cost storage group exceeds this threshold, the policy engine selects the less critical data and migrates it to another volume in the low-cost storage group, making more space available for additional high value data.

If the objective is to optimize utilization across multiple servers in a server farm, the policy engine selects and migrates data from over-utilized volumes to under-utilized volumes until capacity utilization Capacity Utilization measures the rate at which a firm makes use of their capital productive capacities, such as factories and machinery. Capacity Utilization generally rises when the economy is healthy and falls when demand softens.  is optimized across the network. Other policy actions might include replication of high value data on primary storage for disaster recovery, movement of data to automate storage consolidation, or deletion deletion /de·le·tion/ (de-le´shun) in genetics, loss of genetic material from a chromosome.

de·le·tion
n.
Loss, as from mutation, of one or more nucleotides from a chromosome.
 of inappropriate files.

Since no storage environment is static, a data lifecycle management solution cannot be static, which is a limitation for most custom storage management scripts. For example, a data lifecycle management solution needs to handle new files created by users or new storage purchased by IT without requiring administrator to reconfigure To change the status of something.  existing policies. The appropriate global resource groups should be automatically updated and any existing migration policies should be dynamically changed to apply to the new data and/or storage resources.

User/Application Impact

Ideally, the data lifecycle management solution should be completely transparent to applications and users who do not necessarily need to know where their data is stored as long as it is accessible. In a tiered-storage migration policy, data is typically moved from expensive hard disks to less expensive on-line storage, optical media, or to tape. Administrators should not have to inform the users that their files are in a new location, nor should they have to go to the client systems and change the file access paths. The users should not even know that their data has migrated to a less costly storage media.

A data lifecycle management solution needs to track where data is relocated to and must make the data available to the user and/or application as requested. One common technique is to separate the data from the file object. When the data is migrated, the file object in the local system still contains all the important attribute information about the file (e.g., file name, security information, etc.) but the data is now stored in another location. This file object now takes up much less space (1 disk allocation unit Same as cluster. ) instead of the full size the data, which may be multiple megabytes or more, When a user or application retrieves a file that has moved down the storage hierarchy The range of memory and storage devices within the computer system. The following list starts with the slowest devices and ends with the fastest. See storage and memory.

VERY SLOW Punch cards (obsolete) Punched paper tape (obsolete) FASTER
, the data lifecycle management solution retrieves the data from the migrated target location in background and returns the data to the user application with minimal latency (1) The time between initiating a request in the computer and receiving the answer. Data latency may refer to the time between a query and the results arriving at the screen or the time between initiating a transaction that modifies one or more databases and its completion. .

Backup Optimization

IT departments continually face the problem of expanding backup windows. This problem worsens as the total amount of data stored online increases. Administrators know that they do not need to backup all the online data all the time but lack the tools to implement such a strategy. Data lifecycle management solutions migrate less frequently accessed data from primary storage to secondary storage, so the administrator can now implement full-system backup policies only on the critical and active data remaining on the primary storage. This reduces the backup window and simplifies any required recovery.

Optimize Storage Utilization

Even the best IT departments struggle to achieve greater than 30-40% utilization across their entire network. Usually there are a couple of servers that are always near capacity limit while the others are significantly under-utilized. Instead of all the down-time, cost and overhead involved in expanding the storage capacity of the over-utilized storage (and storage expansion may not be feasible at all in some situations, requiring a new server replacement project), data lifecycle management solutions automate migration of data from over-utilized volumes into under-utilized volumes across DAS/NAS/SAN environments, which extends the life of primary storage and increases the utilization of under-utilized storage.

Tiered Online Storage

New storage devices are available that are pushing below $.02/MB--and prices are still coming down. However, most IT departments cannot take advantage of this cost savings because the manual steps involved with selecting and moving data to these devices makes the whole solution cost prohibitive pro·hib·i·tive   also pro·hib·i·to·ry
adj.
1. Prohibiting; forbidding: took prohibitive measures.

2.
. Data lifecycle management solutions enable these companies to automatically and transparently migrate less critical or inactive data to these lower cost devices; reducing the overall cost-per-terabyte and deferring future expansion requirements of high-cost primary storage.

In summary, next generation data lift, cycle management solutions, which offer a new approach to storage resource and automated data management, have arrived. By understanding the value of data and storage resources, data lifecycle management solutions match data to the most appropriate storage resources in order to reduce costs, consolidate and optimize resources, and improve operational efficiencies. A data lifecycle management solution should not involve a collection of ad-hoc tools. Instead, it should integrate the various components that include monitoring of storage utilization and data usage patterns, classification and valuation of resources, and policy-based automated data management actions. In the end, the policy engine plays a vital role by intelligently and adaptively making the best decisions about where to optimally place data on the storage network to ensure the right data is on the right storage at the right time.

www.arkivio.com

Albert Leung Albert Leung (Chinese: 林夕; Pinyin: Lin Xi; Cantonese: Lam Jik) (born 梁偉文 on December 7, 1961) is a Hong Kong lyricist, famous for his work with Andy Lau  is CTO (Chief Technical Officer) The executive responsible for the technical direction of an organization. See CIO and salary survey.  and Glenn Rhodes is director of product marketing at Arkivio, Inc. (Mountain View, CA)
COPYRIGHT 2003 West World Productions, Inc.
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2003, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.

 Reader Opinion

Title:

Comment:



 

Article Details
Printer friendly Cite/link Email Feedback
Title Annotation:Tape/Disk/Optical Storage
Author:Rhodes, Glenn
Publication:Computer Technology Review
Date:Jul 1, 2003
Words:2523
Previous Article:The future of tape looks: super fine.(Tape/Disk/Optical Storage)
Next Article:Where does an IP-SAN solution fit?(Special SAN Section)
Topics:



Related Articles
Optical RAID Complements Magnetic.(Industry Trend or Event)
Archival data has a new mission: Critical; it's not what it used to be.
Cradle-to-grave storage management now a reality: and not a moment too soon.
Tiered storage: new strategies match new demands and opportunities.(Storage Management)
Architecting a tiered data center: simple fundamentals bring great returns.(Storage Management)
Information lifecycle management: the next wave.(First In/First Out)
SAN-based data replication.(Storage Management)(Storage area networks)
Policy-based data management in ILM.(Special ILM Issue)(Information Lifecycle Management)
Tape-based WORM: the best choice for HIPAA-compliant storage.(Write-once read many; Health Information Portability and Accountability Act)
Enabling tiered storage through tape virtualization: delivering more performance, reliability and efficiency at lower cost.(HSM: Special Section)

Terms of use | Copyright © 2009 Farlex, Inc. | Feedback | For webmasters | Submit articles