Cradle-to-grave storage management now a reality: and not a moment too soon.Understanding what happens to data throughout its lifetime is becoming an increasingly important aspect of effective data management. What happens to data as it ages? Does usage decline as data ages? Does the value of data increase or decrease as it ages? Why are we keeping more data longer than ever before? What conditions indicate when data should be retired? Do storage management requirements change as data goes through its lifecycle? If data is the most valuable asset of so many businesses, why do we know so little about it? These have become important questions needing to be answered in order to understand where data should ideally reside during its existence. In particular, the probability of reuse reuse - Using code developed for one application program in another application. Traditionally achieved using program libraries. Object-oriented programming offers reusability of code via its techniques of inheritance and genericity. of data is emerging as one of the more meaningful metrics metrics Managed care A popular term for standards by which the quality of a product, service, or outcome of a particular form of Pt management is evaluated. See TQM. for understanding optimal data placement and it is a key metric for HSM (1) (Hierarchical Storage Management) The automatic movement of files from hard disk to slower, less-expensive storage media. The typical hierarchy is from magnetic disk to optical disc to tape. (hierarchical storage management See HSM. ) to be more effective. For most all data types, the number of references to data significantly declines as it ages. This basic observation provides insight into more cost-effective storage management as it enables the movement of less active data to lower-cost levels of storage. The lower frequency of access as data ages has been a fundamental concept of the HSM (hierarchical storage management) concept for over 25 years. Storage demand grew at over 100% per year during the dotcom boom. The storage industry is presently generating new data at a rate of approximately 50% per year. Some analyst groups have recently pegged peg n. 1. a. A small cylindrical or tapered pin, as of wood, used to fasten things or plug a hole. b. A similar pin forming a projection that may be used as a support or boundary marker. 2. the growth rate at 25-30% but they are counting unit shipments, not demand as measured in terabytes. Much of the current demand is presently being consumed from existing and unused capacity resulting from excessive buying habits of the past several years. The continual increases, at any growth rate, in the amount of digital storage have made storage management more difficult and, as a result, more data is being accumulated for longer periods of time without effective management services. The percent of digital data that has lost its value and should be deleted Deleted A security that is no longer included on a specified market. Sometimes referred to as "delisted". Notes: Reasons for delisting include violating regulations, failing to meet financial specifications set out by the stock exchange and going bankrupt. is quickly declining as obsolete data is often 'lust kept around forever." In too many cases this approach is perceived to be easier than managing the data throughout its lifecycle. The probability of reusing data typically falls by 50% after the data is three days old or three days since its creation. Thirty days after creation, the probability of reuse normally falls below a few percentage points. Email and medical imaging applications represent good examples for the data-aging profile described here. Keeping very low activity, archival and inactive in·ac·tive adj. 1. Not active or tending to be active. 2. a. Not functioning or operating; out of use: inactive machinery. b. data on spinning disks for long periods of time is not economical for environmental reasons (electrical consumption), in addition to the differential in storage prices per unit of storage purchased between disk and tape. Figure 2 provides additional insight into numerous storage consumption and usage patterns. Storage Consumption Rules of Thumb, Observations and Estimates Given the picture of the storage industry painted by Figure 2, who could ever think that managing data for its lifetime is even remotely possible? Likely, it won't become possible without some major enhancements to the existing levels management capability. As we continue to observe, data is growing faster than our ability to manage it. As storage networks and, in particular, SAN deployment continues to evolve, optimal data placement among various storage technologies will begin to occur automatically without human involvement but will initially be host based (1) A system controlled by a central or main computer. A host-based system typically refers to a hierarchical communications system controlled by a central computer. (2) . Later these functions will move outboard Not built in. Outboard devices are external to the main unit. Contrast with inboard. See offboard. of the connected servers implemented as either an inband or out-of-band function in the storage network itself. Either of these implementations will most likely be delivered by using blades or appliances. Advanced policy-driven SRM (1) (Storage Resource Management) The management of the storage resources in an organization in order to avoid duplication of files and to determine space utilization across all servers. (storage resource management) software will begin to measure reference patterns and trigger management actions that result in moving data to the most optimal storage location throughou t its lifetime. Moving storage management functions off the server, called draining the server, and into the storage network to minimize host resource consumption and improve storage management speeds is emerging as a primary goal for the storage industry. The initial application targeted to move outboard has been server-less backup and recovery. Representing a fundamental change in the way large data centers operate, server-less backup will allow businesses to perform a variety of operations such as full backup See backup types. , snapshots and incremental backups See backup types. (operating system) incremental backup - A kind of backup that copies all files which have changed since the date of the previous backup. The first backup of a file system should include all files - a "full backup". Call this level 0. at any time without consuming computing computing - computer and bandwidth resources from application servers. With server-less backup, the server initiates the backup or recovery function but doesn't sit in the data movement path. The movement of data directly from disk arrays to and from automated au·to·mate v. au·to·mat·ed, au·to·mat·ing, au·to·mates v.tr. 1. To convert to automatic operation: automate a factory. 2. libraries across a dedicated network for backup and recovery applications is highly desirable. This capability further leverages the SAN infrastructure by providing significant operational benefits. Server cycles, I/O (Input/Output) The transfer of data between the CPU and a peripheral device. Every transfer is an output from one device and an input to another. See PC input/output. I/O - Input/Output paths and memory are all consumed with existing backup techniques. In addition, many non-mainframe users complain that about half of their backups simply don't work for a variety of reasons, mainly human error. Server-less backup enables IT departments to back up data anytime, creating a desirable "zero downtime The time during which a computer is not functioning due to hardware, operating system or application program failure. backup" capability with a nearly negligible Please [ improve this article] by rewriting this article or section in an . amount of overhead. To implement server-less backup, application software must enable APIs for users to activate the function. Though initial acceptance for server-less backup has been slow, the need for automated and dynamic server-less storage management is higher than most realize. After outboard or server-less backup, look for HSM to become a primary candidate to move into a SAN appliance or blade basically bringing HSM to life. As stated earlier, server-less or outboard storage management technologies will eventually progress beyond backup and recovery techniques to include mirroring, replication, snapshot (1) A saved copy of memory including the contents of all memory bytes, hardware registers and status indicators. It is periodically taken in order to restore the system in the event of failure. (2) A saved copy of a file before it is updated. copy and a variety of virtualization An umbrella term for enhancing a computer's ability to do work. Following are the ways virtualization is used. Hardware Virtualization Partitioning the computer's memory into separate and isolated "virtual machines" simulates multiple machines within one physical computer. functions. Advanced SRM products make possible proactive or anticipatory data movement that further optimizes the storage hierarchy The range of memory and storage devices within the computer system. The following list starts with the slowest devices and ends with the fastest. See storage and memory. VERY SLOW Punch cards (obsolete) Punched paper tape (obsolete) FASTER . Storage capable of using one set of management tools and utility software through a single interface can enable storage administrators to effectively manage far more storage than ever before, shrinking the management gap between installed storage ca-pacity and what can actually be managed at last. As storage becomes cheaper to buy, it becomes harder to manage. In parallel, the value of data is increasing irrespective of irrespective of prep. Without consideration of; regardless of. irrespective of preposition despite economic and other pressing global issues. As the storage management requirements for data change over time, storage management has become a lifetime activity. The place where data is initially stored is not necessarily the same where it will finally be stored. Building the solution to this growing problem will take the best minds in the industry to ever hope to solve. Projecting the historical growth rates Growth Rates The compounded annualized rate of growth of a company's revenues, earnings, dividends, or other figures. Notes: Remember, historically high growth rates don't always mean a high rate of growth looking into the future. for storage, the time to begin has already passed. [FIGURE 1 OMITTED] Storage Consumption Rules of Thumb, Observations and Estimates Average annual storage demand rate 50-60% (primary occurrence of data, all platforms) Amount of disk data stored 85% on Unix, Win2K and Linux systems (est.) Average disk allocation levels 60-80% for z/OS (eSeries mainframes using DFSMS) Average disk allocation levels 60-80% for iSeries (AS/400 servers) Average disk allocation levels 30-50% for Unix/Linux Average disk allocation levels 20-40% for Win2K/NT Ratio of block data to file data 1.5:1 Average annual disk drive 60% capacity increase Average annual disk drive <10% performance improvement (seek, latency and data rate) Increase in disk drive capacity 36,260x per actuator since the first disk drive in 1956 Increase in native tape cartridge 1,250x capacity since the first tape cartridge in 1984 Average multi-user server 25-40% utilization (% busy) Average tape cartridge utilization 60-85% levels for virtual tape systems Estimated range of disk data 400-750GB managed per administrator (distributed systems --Win2K, Unix, Linux) Estimated amount of disk data >30TB managed per administrator (z/OS, mainframe) Estimated range of automated tape 40TB-1EB data managed per administrator (Varies widely based (all platforms) on library size) Average CAGR of email message size 90% Annual growth rate of email spam ~350% Estimated percentage of SANs 75% that are homogeneous (the (Unix and Win2k same operating system) systems only) Average size of email message 50KB and attachments in 2002 Average size of email in 2007 650KB Number of emails sent 12,000,000 daily in 2001 Number of emails sent 35,000,000 daily in 2005 (est.) Percentage of all email traffic 62% that is spam (also known as bandwidth burning) Annual growth in all 80% Internet traffic Percentage of digital data 56% stored on single-user systems Percentage of digital data >80% stored on removable media (tape and optical) |
|
||||||||||||||||||

Printer friendly
Cite/link
Email
Feedback
Reader Opinion