New Holy Grail: information lifecycle management; Has it been found? Not yet.Understanding what happens to data throughout its lifetime is becoming an increasingly important aspect of effective data management. What happens to data as it ages? Does usage decline as data ages? Does the value of data increase or decrease as it ages? Why are we keeping more data longer than ever before? What conditions indicate when data should be retired? Do storage management requirements change as data goes through its life cycle? If data is the most valuable asset of so many businesses, why do we know so little about it? [ILLUSTRATION OMITTED] These important questions need answers so we can understand how data should be managed and where data should--ideally--reside during its existence. In particular, the probability of reuse reuse - Using code developed for one application program in another application. Traditionally achieved using program libraries. Object-oriented programming offers reusability of code via its techniques of inheritance and genericity. of data has historically been one of the most meaningful metrics metrics Managed care A popular term for standards by which the quality of a product, service, or outcome of a particular form of Pt management is evaluated. See TQM. for understanding optimal data placement and remains a key metric for effective HSM (1) (Hierarchical Storage Management) The automatic movement of files from hard disk to slower, less-expensive storage media. The typical hierarchy is from magnetic disk to optical disc to tape. (Hierarchical Storage Management See HSM. ) implementation. For the majority of data types, the number of references to data significantly declines as the data ages. This basic observation serves as the basis for more cost-effective storage management as it enables the movement of less active data to lower-cost levels of storage. The lower frequency of access as data ages has been a fundamental storage management principle for over 25 years. We are now witnessing a new effect on the life cycle of data: the amount of data now increases as it ages. Unlike the past, fixed content and archival storage have become the fastest growing areas of the storage industry. Storage demand grew at over 100% per year during the dot-com boom See dot-com bubble. of the late 1990s. Today, the storage industry is generating new data at a rate of approximately 50-70% per year. In addition, some of the current demand for storage is presently being consumed from existing and unused capacity that is the result of the excessive buying habits of the past several years. Regardless of the growth rate, the continual increases in the amount of digital data have made storage management more difficult and, as a result, more data is being accumulated for longer periods of time. Much of this data lives digitally without effective storage management services See SMS. (storage) Storage Management Services - (SMS) Software that enables network administrators to route backup data from various devices on a network to another device such as a server or a magnetic tape backup unit. . The percent of all digital data that has lost its value and, therefore, should be deleted is quickly declining as obsolete data is often "just kept around forever." In many cases, this approach is perceived to be easier than managing the data throughout its life cycle. The probability of reusing data typically falls by 50% after the data is three days old, or three days since its creation. Thirty days after creation, the probability of reuse normally falls below a few percentage points. E-mail and medical imaging applications represent good examples for the data aging One of the compliance testing applications put forth during the Y2K problem, in which years were added to a date to bring it up to or beyond the year 2000. See Y2K problem. profile described here. Keeping very low activity, archival and inactive data on spinning disks for long periods of time is not economical for environmental reasons (electrical consumption) and security issues, in addition to the tangible differential in storage prices between disk and tape per unit of storage purchased. Data Retention Requirements Change When the Nearline concept was becoming widely accepted in the 1990s, the common belief was that archival status was the last stop for data before deletion deletion /de·le·tion/ (de-le´shun) in genetics, loss of genetic material from a chromosome. de·le·tion n. Loss, as from mutation, of one or more nucleotides from a chromosome. or end-of-life. Then, one- to two-year data retention periods were viewed as a reasonable amount of time to keep digital data accessible. Fifteen years later, the game and rules are different. New government regulations, the Sarbanes-Oxley Act See SOX. , and HIPAA (Health Insurance Portability & Accountability Act of 1996, Public Law 104-191) Also known as the "Kennedy-Kassebaum Act," this U.S. law protects employees' health insurance coverage when they change or lose their jobs (Title I) and provides standards for patient health, requirements for transmission and retention of data have made us change the way we look at data as it ages. Several major health care providers are faced with generating and storing more than 500TBs of data over the next few years that will need to be managed for a person's lifetime plus seven years--a time period that could exceed 100 years. SEC rule 17a-4(t) mandates digital archiving requirements as they relate to storage. These include what type of storage format should be used, how long data must be retained, and where and how long duplicate copies of data must be stored. The back end of the data life cycle is swelling, not shrinking as was the case previously, and retention policies are now being based on data value and legal issues, not just reference activity. For lifetime data management. "It doesn't matter if the data is ever used: it does matter if the data is there." This change in the storage landscape calls for new management policies based on the value of data and requires that a universal, standard classification scheme for data needs to emerge. All data is not created equal. Life Cycle Management and Policies How does someone actually implement an information life cycle strategy? Is managing data for its lifetime realistically possible? It won't become a reality without some major enhancements to the existing levels of data management capability. Data is growing faster than our ability to manage it. As storage networks and SAN deployment continue to evolve, optimal data placement and movement between various levels of the storage hierarchy The range of memory and storage devices within the computer system. The following list starts with the slowest devices and ends with the fastest. See storage and memory. VERY SLOW Punch cards (obsolete) Punched paper tape (obsolete) FASTER will occur automatically without human involvement. As these functions move outboard Not built in. Outboard devices are external to the main unit. Contrast with inboard. See offboard. of the application servers, they will be implemented as either an in-band or out-of-band function in the storage fabric itself. Either of these implementations will likely be delivered by using blades or appliances. Advanced policy-driven SRM (1) (Storage Resource Management) The management of the storage resources in an organization in order to avoid duplication of files and to determine space utilization across all servers. (Storage Resource Management) software will be required and should evolve to measure reference patterns and trigger management policies that result in moving data, in conjunction with HSM or a similar function, to the most optimal storage location throughout its lifetime. In the future. SRM tools may become the optimal storage management function for assigning data values. Data Life Cycle Management Needs a Solution Ideally, the data life cycle management solution should be completely transparent to applications and to users who do not necessarily need to know where their data is stored as long as it is accessible. In a tiered storage A data storage system made up of two or more types of storage based on their access speed. For example, magnetic disk and tape or magnetic disk and optical disc are widely used in a tiered storage system. See HSM. migration policy, data is typically moved from expensive hard disks either to less expensive online storage, or to tape. Storage administrators should not have to inform users that their files are in a new location, nor should they have to go to the client systems and change the file location pointers. Ideally, the migration of data from one level of the storage hierarchy to another should be transparent. The users should not even know that their data has migrated to a less costly storage media. A data life cycle management solution needs to track the new location when data is relocated and must make the data available to the user and/or application as requested. One common technique is to separate the file's attributes from the actual data in the file. When the data is migrated, the file's attributes in the local system still contains all the important descriptive information about the file (new location, file name, security information. etc.) and the data is now stored in another typically lower-cost storage subsystem The part of a computer system that provides the storage. It includes the controller and disk drives. See storage system. . When a user or application retrieves a file that has moved down the storage hierarchy, the management software retrieves that file from the new migrated target location. Intelligent Storage Architecture It isn't well understood yet if the overall cost of the additional server I/O (Input/Output) The transfer of data between the CPU and a peripheral device. Every transfer is an output from one device and an input to another. See PC input/output. I/O - Input/Output traffic required to move data within the hierarchy is more than the cost of just leaving the data to reside indefinitely on higher-priced disk and not moving it at all. What we do know is that the overhead, or I/O tax, is very high. A new trend called draining the server--moving storage management functions off the server and into the storage fabric to minimize host resource consumption and improve storage management speeds--is emerging as a primary direction for the storage industry. The initial application expected to move outboard has been referred to as server-less backup and recovery. Representing a fundamental change in the way large data centers operate, server-less backup will allow businesses to perform a variety of operations such as full backup See backup types. , snapshots and incremental backups See backup types. (operating system) incremental backup - A kind of backup that copies all files which have changed since the date of the previous backup. The first backup of a file system should include all files - a "full backup". Call this level 0. at any time without consuming computing computing - computer and I/O bandwidth resources from application servers. With server-less backup, the server initiates the backup or recovery function but doesn't sit in the data movement path. The movement of data directly from disk arrays to and from automated libraries across a dedicated network for backup and recovery applications is now highly desirable. For recovery, the data moves directly from tape storage back to disk. This capability further leverages the SAN infrastructure by providing significant management benefits for storage administrators. After outboard or server-less backup, look for HSM to become a primary candidate to move into a SAN appliance or blade basically bringing HSM to life. As stated earlier, server-less or outboard storage management technologies will eventually progress beyond backup and recovery techniques to include mirroring, replication, snapshot (1) A saved copy of memory including the contents of all memory bytes, hardware registers and status indicators. It is periodically taken in order to restore the system in the event of failure. (2) A saved copy of a file before it is updated. copy and a variety of virtualization An umbrella term for enhancing a computer's ability to do work. Following are the ways virtualization is used. Hardware Virtualization Partitioning the computer's memory into separate and isolated "virtual machines" simulates multiple machines within one physical computer. functions. Advanced SRM products make possible proactive or anticipatory data movement that further optimizes the storage hierarchy. The capability of using one set of management tools and utility software through a single interface can enable storage administrators to effectively manage far more storage than ever before, finally shrinking the management gap between installed storage capacity and what can actually be managed. As storage becomes cheaper to buy, it becomes harder to manage. In parallel, the value of data is increasing irrespective of irrespective of prep. Without consideration of; regardless of. irrespective of preposition despite economic and other pressing global issues. As the value of data now changes significantly as it ages, storage management has now become a lifetime activity. The place where data is initially stored is not necessarily the same place where it will finally be stored. Everyone can state the problem of data life cycle management. Building and delivering a solution to this growing problem will take the best minds in the industry to resolve. Given the anticipated growth rates Growth Rates The compounded annualized rate of growth of a company's revenues, earnings, dividends, or other figures. Notes: Remember, historically high growth rates don't always mean a high rate of growth looking into the future. for digital storage, the time to begin has already passed. |
|
||||||||||||||||||||

Printer friendly
Cite/link
Email
Feedback
Reader Opinion