PROFILING THE STORAGE HIERARCHY.
Estimates now indicate that half to two-thirds of the world's data is being "born digital," meaning that its original occurrence is in a digital format. By the year 2004, it is projected that as much as 14% of the known data in the world will have been captured in a machine-readable digital format. Nearly 86% of the world's data will remain on paper, microfiche, charts, graphs, pictures, various films, or other non-machine-readable formats. Best estimates indicate 10% of all digital data reside on disk or online storage while roughly 90% resides on removable and mass storage technologies. The age-old rule that 80% of the activity goes to 20% of the data still holds.
Today's digital data storage technologies map nicely into a hierarchy consisting of both fixed and removable media products. The removable products are making mass storage, long-term data archiving, and electronic data vaulting affordable realities. The notion of a hierarchy has been used for nearly twenty-five years by the storage industry to relate the various tradeoffs of storage products and subsystems. Faster and more expensive products occupy the high end of the hierarchy and slower access, higher capacity, and less expensive products occupy the lower levels.
Though many have predicted the hierarchy would ultimately evolve to become a seamless single-level of storage, achieving this goal is seldom a reality. Three key parameters have normally been used for selecting the optimal level for data placement in the storage hierarchy. They include 1) the size of the file or application, 2) the performance requirements and 3) the price of the subsystem. Now, a fourth area has joined the primary selection criteria--availability coupled with quality of service. Broad ranges continue to exist within all four parameters and some obvious access time and cost gaps remain within the hierarchy. Anyone can decide to put all data on disk storage and just "buy more storage" when needed. This is the easiest and possibly a more cost-effective methodology up to about the 200-250GB range at today's storage and personnel costs. Beyond that capacity range however, choosing and managing the best combination of storage devices and technologies becomes a more cost-effective strategy than not managing storage and just adding more hardware. Remember the cost of people increases each year and the price of storage falls each year.
Inside The Hierarchy
Memory products based on DRAM (solid-state) technology occupy the highest levels of the hierarchy and are the fastest and most expensive. Large OS/390 class enterprise servers also use the less expensive expanded storage level to hold ultra-high access data tables and virtual pages, but expanded storage cannot execute programs. The comparable trend in other servers is simply to add more main memory, which can execute programs. Solid State Disks first appeared in 1978 for the mainframe market and provide the highest I/O performance of any storage device. The use of DRAM technology removes the latency and seeks components from an I/O operation. This class of device resides on an I/O channel rather than an internal memory bus as does expanded storage. It appears to the operating system (Unix, Linux, NT, etc.) as an ultra-fast magnetic disk drive(s). Today, these devices approach 20GB capacities and offer fault-tolerant designs effectively addressing availability concerns.
Magnetic disk storage holds an estimated 95% of all of the world's mission critical data and these disks have clearly defined both a high-performance and high-capacity level in the hierarchy with price per megabyte and access density being the most apparent differences between these levels. Technological progress in magnetic storage on all fronts has been phenomenal by any measure and there appears to be no near-term end in sight. Coupled with widespread caching capability, disk is typically the most versatile level of the hierarchy. Backup and, in particular, recovery of critical disk storage in a timely manner leaves many challenges and presents developers with many opportunities for improvement.
As data becomes less active or moves to an archival state, it becomes too costly to leave the data on spinning magnetic disks for 24x365 consuming electricity, generating heat, and occupying increasingly costly real estate. Migrating the less-frequently used data to a lower cost and a lower performance level of storage becomes more attractive. In just a few years, we will look at the Intelligent SAN (ISAN) as a storage network with embedded hierarchical storage capabilities using intelligent metadata as being optimal. This will enable transparent data movement between the appropriate levels of the hierarchy independent of any server. We will move a step closer to a single-level storage concept that lets us get the right data to the right place at the right time.
Once a favorite technology for the future, optical disk storage now is centered on CD-ROM and DVD technology. Optical disk has been squeezed from the general storage hierarchy, as it has not kept pace with magnetic disk and tape developments in areal density, price, access time, and performance. WORM optical product shipments are now in significant decline. Future optical devices remain under development, as breakthroughs are usually "just around the corner." DVD offers some promise, though standards issues and low performance and transfer rates limit much of its potential to that of being a replacement for CD-ROMs.
Nearline defines the level of storage between disk and far-line or shelved storage by using robotics to store and retrieve media automatically and is typically quicker than human retrieval. Over thirty companies supply various forms of Nearline storage. The volumetric efficiency of Nearline exceeds all other technologies and its price per megabyte purchased remains lower by a factor of ten or more times compared to disks. Today, Nearline storage consists of robotics that presently accesses either magnetic tape cartridges or optical disks. Other media types are under consideration, including small form-factor magnetic disks. This level will contain 85% or more of the world's digitally stored data for the foreseeable future.
Nearline has some remaining limitations however. By using robotics, the time it takes to get to the first file or byte of data typically takes five to ten seconds for the media to be mounted on a read-write mechanism. This leaves a major access time gap between online disk with access times in the range of ten to twenty ms. and Nearline with initial access times measured in seconds.
Secondly, Nearline tape storage is best suited for sequentially accessed data. It was originally hoped that optical disk would fill the removable storage, random access requirement, but its slow progress keeps it from addressing this requirement on any broad scale. Backups, recovery, batch processing, and archiving applications are primarily created and accessed sequentially. Virtual tape, using a disk buffer appearing as tape drive to the attached server, is beginning to help address this issue though its real promise is not nearly fulfilled.
Finally, the overall cost of ownership to manage tape-based storage is often viewed as higher than disk and more labor-intensive. These three parameters, initial access time, sequential access only, and a perceived higher management cost represent the next frontier for the Nearline and mass storage providers to address.
Far-line, or manually retrieved shelf storage, still represents the vast majority of the world's non-digital data. The path to digitization is only slowing, not halting, its rate of growth. Managing and exploiting the hierarchy to its maximum benefit remains a challenge on a!! computing platforms, but as storage grows, the payoff for implementing an effective storage hierarchy is enormous. Automated libraries using magnetic tape, possibly small form-factor magnetic disks, the DVD and possibly other emerging storage mediums will become the foundation containing most of this mass-storage growth. New and emerging digital applications will continue to fuel a period of explosive growth for storage well into the next century as terabyte-plus databases, data warehouses, electronic voice, and video mail systems all drive up requirements.
We have now made the creation and manipulation of data, our most valuable resource, the primary focus of the new millennium. Computer power and storage capacity have been the most potent technologies driving the Internet Age and their role for the foreseeable future is in hand. This is about to change. Information is power but if it is immobile and, thus can't be readily moved to the right place, we have created "the worldwide gridlock." Our future vision begins to shift our attention toward communications capability as the next new driving force following the digital era we are well into. When this happens, the longstanding storage hierarchy should become the concern of the few, not the many.
|Printer friendly Cite/link Email Feedback|
|Title Annotation:||Technology Information|
|Publication:||Computer Technology Review|
|Date:||Mar 1, 2000|
|Previous Article:||CMD RAIDs The Market.|
|Next Article:||Open Systems Connectivity To Mainframe Storage Networks.|