The digital tsunami: a perspective on data storage: to meet demands, organizations will need to increase today's storage offerings 10 times. But how will such dramatic increases be addressed by technology and systems in the next few years?
* examines current issues and future possibilities in data storage
* compares alternative technologies for data storage
* identifies existing and emerging technologies for data storage
The rapid emergence of digital video is just one indicator of the growing markets for storage systems with much lower costs and much larger capacities. Other major demands come from the standard business modal for corporate data storage" and the growth of storage' service providers (SSPs) that are accumulating data from small business and consumer markets.
According to "How Much Information," a study by the University of California Berkeley School of Information Management and Systems, to meet escalating requirements, each of these markets will need to improve today's storage offerings by 10 times. By 2010, a 100-fold increase will likely be necessary, the study shows. As records and information managers must deal with increasingly larger volumes of records, keeping up with storage media trends is becoming more critical.
According to International Data Corp., in fiscal year 2003, Fortune 500 companies spent $7 billion on approximately 1,200 terabytes (1 terabyte = 1,000 gigabytes) of data stored on magnetic tape. Additionally, Jim Porter of Disk/Trend Inc. reports that companies have spent $40 billion to store 1,400 million terabytes of data on magnetic disk. Even though storage requirements have now started to increase at more than 100 percent annually, technology is improving in cost/performance at only 35 percent each year.
This creates a significant gap in terms of the cost/performance of storage systems required to meet the growing need. To allow that data to be stored without causing large cost increases, data storage must move from relatively expensive disks to lower-cost media. In addition, the cost/performance of these lower-cost media must continue to improve at an annual rate of nearly 100 percent to meet the continuing data growth.
This growth in data can be felt in personal terms also. In the eddy 1990s analysts were foretelling that each Fortune 500 company would have a terabyte under management in a few years. In 2000, experts such as Patti Tobin predicted that in 2003, thanks to the advent of digital movies on DVD, commercial music collections on CD-ROMs, and personal music or digital photographs stored on CD-Rs or CD-RWs, each computer-savvy home would have the equivalent of a terabyte of data under its roof. By 2006, as wireless and video technologies become ubiquitous, we should reasonably expect that individuals will move from having hundreds of gigabytes to a terabyte of personal storage on their desktop or laptop. The U.S. government--often a "canary in the mine" for new technology--has several data storage systems under development that will provide about 80 petabytes (80,000 terabytes) each.
This explosive growth of data, called the "Digital Tsunami" by Gary Ashton in his 1996 National Media Laboratory Report "Future Trends in Storage, Interconnectivity, and Data transfer," is fundamentally driven by the conversion of analog systems to fully digital systems (e.g., film cameras to digital cameras) and a leveling of computer resources available (e.g., each home computer is now equivalent in performance to early supercomputers).
But how will these dramatic increases ill data storage be addressed by technology and systems over the next several years?
Comparing the Media
The means of storing computer data has changed little over the past few decades and still consists of only about half a dozen different types of media and formats. These include silicon based dynamic and static random access memory (RAM), magnetic hard disk drives, optical disks (including write once read many [WORM], magneto/optic, CD-ROM, and DVD disk, and magnetic tape.
In any specific storage application, the preferred medium is selected because of a particular characteristic that is superior to that offered by the other options. RAM offers high data-transfer rates and high-speed access even though it is the most expensive. Hard-disk drives provide high-capacity (greater than 100 gigabytes) and moderate access times in an affordable package but, like RAM, are neither removable nor archival, even though in recent years the swap-out capability of drives and media has lead to a change in thinking about the removability of hard disk systems. CD-ROM provides a removable disk media of limited capacity (650 megabytes) and the DVD provides 5 gigabytes. Emerging formats of the WORM optical disk from Sony and Plasmon offer 30 gigabytes of archival storage in a 5.25-inch format.
The data-sharing function of the floppy is now facing obsolescence due to its slow data rate and low capacity. It is being replaced in sophisticated installations by portable universal serial bus (USB)-oriented drives and media. The most affordable means of storing digital data is provided by tape, and tape is usually selected for archiving large databases. The result is a mix of storage products from which the designer must chose to optimize system performance at the lowest cost. This results in a large variety of physical form factors and a need to interface among the various types of media.
An ideal solution would provide data storage with the following characteristics:
* Rapid access
* Fast read/write data rates
* Low cost per data byte
* Archivability (as an option)
* The option to be managed by an autoloader or jukebox
A storage medium that could offer the lowest cost and best performance--plus the specific set of the ideal characteristics--would replace most other types of storage in use today. A drive with removable, high-capacity, magnetic disk media potentially meets these criteria. For several reasons, however, it is not practical using today's data storage technologies. Simply put, magnetic storage is inherently temporary. Tapes, the highest-capacity removable media, are serial in nature and, therefore, slower to access. The areal density (amount of data per square inch) of optical storage is limited by the wavelength of light it uses.
Tape is (1) a more mature and stable technology (the first magnetic tape drives were introduced in the early 1950s); and (2) it is slower due to the serial nature of the search and retrieval. These two features make it a less attractive product for many industry insiders. Various experts have stated that "tape is dead" over the past 20 years. While there is a kernel of truth in this statement, it deflects the larger issue that remains for the foreseeable future: There is no lower-cost alternative than tape for the safekeeping of large amounts of data.
Magnetic tape has been fighting several battles over the past 12 years. It has succeeded in offering the lowest cost/storage alternative for a variety of computer users in a wide range of markets, from workstations, to workgroups, to department-sized systems, to enterprise (campus/nationwide) systems. This advantage varies as new disk and tape products are introduced, but, in general, tape remains about five to 10 times less expensive than disk.
Tape suffers from several features that are endemic to its character. First, it is a removable medium, meaning that to access any file on a dismounted tape, extra time is required to find and load the tape in the system. Second, unlike disk or RAM, tape is inherently sequential and imposes delays on retrieving any file housed on a physical data set. Third, tape has a transfer rate about four times slower than disk and, there fore, must be buffered. In the past, these deficiencies were largely inconsequential compared to the more important issue of lower cost. Magnetic disk systems have continued to improve, and their cost has continued to fall.
Smaller individual users demand more convenience, and the lower cost of tape for smaller systems has become unimportant relative to other system factors such as software and media handling features. This loss of net benefit has caused a significant sales decrease in low-end tape systems. For big systems (e.g., department and enterprise), however, the cost differential remains important because the data volumes are so much larger. The average price for a multi-terabyte-disk storage system is currently about $40 per gigabyte, while the average cost for large-system tape is about $7 per gigabyte.
Although tape has some disadvantages when compared to disk media, some of them are actually advantages in many applications. For example, the fact that tape is dismounted after use becomes an advantage to users who want to hold their data for long term retention. For archival requirements, users do not want to keep their valuable data on a spinning platform. The National Storage Industry Consortium (NSIC) is an industry-wide group that projects technology developments and user requirements into the future (usually about five to six years out). NSIC is comprised of such companies as IBM, Storage Tek, Seagate, Sony, Hewlett-Packard, and Quantum and has generated comparative information between tape and disk. Based on that information, NSIC expects tape to maintain better than a 3-to-1 advantage in the capacity/cost ratio when compared with hard-disk systems. However, currently there are several vertical markets--legal, financial, entertainment, government, geophysical, and medical imaging--that are already demanding WORM media without regard to the size of the data set. Each of these vertical markets, coincidentally, is generating large amounts of data and, in most cases, is already using extended data set management systems such as hierarchical storage management (HSM) and file storage management systems (FSMS). Many of these applications use WORM optical disks to meet archival requirements. This trend will likely continue as broadband communication encourages more data traffic deemed archivable by users.
WORM media is currently the preferred approach for those large data centers that do not erase data on tape. It is expected that the percentage of backup operations will continue to shift in favor of the large data set approach during the next few years, putting WORM systems in an even stronger position. Erasable media will continue to offer cost-benefits in selected back-up operations in the future.
Current Data Storage Technology Limits: A Brief Comparison
The "Storage Technology" graphic below provides a brief survey of existing and possible emerging technologies for data storage. This chart shows storage technologies in terms of cost of storage vs. the time needed to access the stored data. Flash memory is the most expensive and fastest storage while tape offers the lowest cost and the slowest access. Holographic storage, two-photon storage, and micro-mechanical storage devices are included, but the market availability of these is too far into the future to be considered for any near-term records management solutions.
The data storage density of devices based on traditional optical recording is limited by their physical ability to write data marks on a recording material of one wavelength. However, a major benefit of laser recording is its inherent ability to project a multiplicity of beams in close proximity, thus enabling high-capacity, multichannel recording.
Magnetic recording has two inherent disadvantages. One is that magnetic fields cannot be focused at much distance, so the read/write heads must be in close contact with the recording media. This leads to heights of about 1/1000 of a human hair, requiring smooth media surfaces and limiting operation to a rigorously controlled media environment. This prohibits use of removable media in practical disk systems and limits data mark size in magnetic tape systems. Finally, due to the small "fly height" and inherently reversible media, high data density magnetic disk storage is inherently incompatible with the need for both removable media and archival storage. The highest data density currently obtained by a system using removable, archival media is by green-laser-based optical tape systems, which are not yet available commercially. The specifications will offer a remarkable advantage over current storage solutions.
A Glimpse of the Future
Many articles about holographic storage have discussed its performance characteristics and limitations. Fundamentally, the technology is a page-oriented storage in which frames of perhaps 1,000 by 1,000 data points are imaged into a crystalline material and stored there by light patterns. Changing the input angle of one of the beams permits a new hologram to be stored in the same volume of material. This technology, then, could offer terabit storage capacities in small spaces. Steering beams to the various desired angles can be achieved mechanically in relatively low-cost systems. Terabit Storage Corp. expects large data capacities if adequate signal-to-noise ratio (SNR) can be obtained in a single module. Overlaying successive pages of information into the same physical media volume inherently reduces the SNR of previously stored data.
There are technological problems that limit the useful application of holographic storage. One of these issues is the lack of a suitable material for read/write applications. Holographic storage is essentially analog rather than digital, and data longevity and the desire for writing and then totally erasing the data at low input powers are opposing requirements in high-volume analog media.
Holographic storage could be preferred over disk if not for the prohibitive cost. Holographic page composer systems could be in production by 2005 with the performance characteristics mentioned, but the cost per module could be in the $2,000 range, or about $20 per gigabyte. This can be compared to the cost of $2 per gigabyte for 100-gigabyte magnetic disk drives expected in the same time frame (about $20 per gigabyte for full system cost).
Two-photon storage is a technology whereby the intersection of two optical beams in a volume storage media locates the data of interest. Specific characteristics are required in each beam. At a user price of several thousand dollars, a device of this performance is not expected to be competitive with hard-disk technology.
A limiting factor in two-photon systems is the need for short-pulse lasers in the pico-second range (i.e., 1 trillionth of a second). This requirement probably places the cost of such devices beyond commercial availability for the foreseeable future. A price of perhaps $2,000 at the optimistic 100-gigabyte capacity produces a cost of $20 per gigabyte. The technology is only competitive if ultra-short pulse lasers become available for less than $100, and that seems an unlikely prospect.
Though similar in form (e.g., cartridge, enclosure), optical recording on phase-change, write-once tape media differs somewhat from magnetic tape recording due to differences in the write/read optical head and its media. Like magnetic tape, the optical tape substrate consists of a thin Mylar[R] base. Marks made on the specially formulated, optically active media layer are permanent and unalterable, so the data on them cannot be replaced with new information.
Optical tape promises significant advances over magnetic tape in several dimensions. Based on technology demonstrations, data rates from 100 megabytes per second to 300 megabytes per second appear possible by 2005. Access times below 15 seconds have been shown. A capacity of 1 terabyte on a single 4-inch-by-4-inch cartridge has also been demonstrated, and 5 terabytes are projected by switching to the thinner base material now in use for magnetic tape. A total system-storage cost of less than $1 per gigabyte is projected for the first-generation products.
A major limiting factor for most optical memory systems is that achieving sufficient storage capacity requires moving the media relative to the optical head (such as in disks and tape). This results in slow access or providing a large, instantaneous recording field, which is impractical. A potential solution to this problem is the use of "spectral-hole burning" Here, multiple bits of information are written/read at the same location in the media by varying the write/read wavelength. Media capable of writing a thousand or more bits in the same location have already been demonstrated. Such a system potentially provides 1 gigabyte capacity, microsecond access times, and is not unduly expensive.
However, a limitation is that the stored data is slowly erased by thermal agitation and has to be refreshed every half an hour or less. The technology will probably not be cost competitive in modularities under several gigabytes and, although potentially useful, the technology is several years from any practical application.
Micro-Electro-Mechanical (MEMS)/Scanning Probe
Several data storage concepts based on a new micro-mechanical technology are now being researched. The development of atomic-force microscopes and nanometer-sized (1 nanometer = 1 billionth of a meter in width) probes has led to consideration of scanning probe-based data storage, which might be available by 2004 at the earliest. The technology is based on the premise that arrays of thousands of micron-sized probes, can be fabricated like printed circuitry. In the 2004 time frame, costs must be relative to anticipated dynamic random access memory (DRAM) at $500 per gigabyte and hard disks at $2 per gigabyte, and no greater than $50 per gigabyte to be competitive. This means $50 for a 1-gigabyte chip, a price point that seems possible. In large-scale production, price equality with hard disks is plausible. The 1 -gigabyte modularity, printed circuit board mounting, sub-millisecond access, and low power consumption make MEMS an attractive future technology.
Nano-scale recording is enabled by employing a precision electron beam source onto a disk at a density far beyond that achievable magnetically. This storage technology will easily out perform magnetic hard drives, rapidly surpassing the data-density limit of magnetic storage and, theoretically, will provide the ability to store many thousands of gigabytes on a CD-ROM-sized disk. In addition to offering far higher capacity and data rates, the new technology does not require a flying head--the mechanism that reads data from or writes data to a magnetic disk or tape--and, therefore, eliminates the possibility of "head crashes," the bane of existing hard drives. According to Disk/Trend Inc.'s Porter, an initial data density of 160 gigabytes per inch is expected by 2006, compared to the anticipated magnetic barrier of about 100 gigabytes per inch. The technology limit appears to be at least 500 times greater at 80 terabytes per inch. Data transfer rates well over a gigabyte per second should be achievable.
Other Revolutionary Storage Technology
In the next 15 years, scientists will have narrowed the width of lines etched into semiconductors to less than one-tenth of a micron, meaning that electrical signals running through those circuits will contain so few electrons that adding or subtracting a single one could make a difference in the computer's functions. To control the movements of small groups of electrons, researchers are developing quantum dots that can corral rambunctious electrons, allowing them to escape only when zapped by a precisely sized boost of energy from outside. Some of the possible technological developments that could benefit from a new generation of scientific advances over the next two decades include:
* Electron Trapping Optical Memory (ETOM)--A new kind of optical disk in development uses the principal of electron trapping to store data. Data is stored by changing the state of electrons in the media. First-generation ETOM devices were unveiled in 1992. There is potential to store 14 gigabytes of uncompressed data on a single double-sided 5.25-inch disk with a transfer rate of 120 megabytes per second. ETOM media can store both analog and digital data. The technology remains in the lab as the developer works on a cheaper laser and improved disk manufacturing techniques.
* Liquid Crystal Technology--Data is written into specialized crystal material at 793 nanometers. The technology has shown an ability to create optical data storage densities at 8 gigabytes per square inch--a world record--and has the potential of reaching 100 gigabytes per square inch. The application is for optical switching and routing and will enable sub-gigabit-per-second to near-terabit-per-second data stream bandwidth. Devices developed will provide high speed and capacity and will work for archival storage in conjunction with magnetic disk or tape.
* Surface-Enhanced Raman Optical Data Storage (SERODS)--This highly experimental technique uses a method of scattering laser light on a molecular surface and offers storage densities as much as 100 times greater than the conventional optical disk.
If the existing "digital tsunami" continues, as most believe it will, over the next several years, the size of a greater portion of data sets will expand beyond the ability of the older back-up model to maintain them. In these cases, a more affordable technology will be needed than can be provided by projected magnetic disk and tape offerings. New technologies that take advantage of optical or nanoscale features are most likely to offer reasonable alternatives in the next three or four years. These technologies will present some operational and obsolescence risks to users, but such risks will be more than offset by the greatly improved price and performance features.
Ashton, Gary. "Future Trends in Storage, Interconnectivity, and Data Transfer." National Media Laboratory Report. 1996.
International Data Corp. "Worldwide Tape Drive Market 2000-2004." Available at www.idc.com (accessed 11 November 2003).
Porter, Jim. "A Fast 15 Years." Insight. July/August 2001. Available at www.datareader.org/0103/idema.html (accessed 12 November 2003).
Tobin, Paul. "The Coming Storage Explosion: A Terabyte in Your Neighborhood." THIC Conference Presentation. 2000.
UC Berkeley School of Information Management and Science. "How Much Information?" Berkeley, CA: SIMS, 2000. Available at http://sims.berkeley.edu/edu/ research/projects/how-much-info (accessed 11 November 2003).
Joe Straub has worked in the high-performance computing and data storage industries for the past 20 years. He is the co-founder of SRA Corp., SaxpySappy, and LOTS. He is active in advanced research and development and consults for Imation, e-Phocus, and Lockheed/Martin. He may be contacted at email@example.com.