A survey of next generation storage system archrectures: beyond storage networking. (Storage Networking).
Storage Device Technologies vs. Storage Architectures
Advances in storage architecture should not be equated with advances in storage device technology. Storage device technologies continue to evolve, especially with respect to capacity. Every reader of this article is acutely aware of how disk drive density has steadily increased even while cost per bit has dramatically decreased. It appears that this trend will continue into the future. For example, researchers involved with IBM's Millipede project predict they will be able to hold several feature films on a postage stamp size memory card by 2005.
Although advances in storage device technology to date have been impressive, organizing and managing large amounts of ever expanding data continues to vex storage administrators. The answer to this dilemma lies in how the data is systematically organized or, in other words, in the storage system architecture. Like most technological innovation, storage architecture development passes through definite generations.
What Defines a Generation?
In his book The Innovator's Dilemma, Harvard professor Clayton Christiansen defines a "disruptive technology" as an innovation that eventually revolutionizes a market. Not to be confused with continual evolutionary changes of a technology, the advent of a disruptive technology marks the beginning of that market's next generation. Architectures, as systems of intertwined technologies, similarly go through revolutionary disruptions and generational changes. Computing architectures have moved from mainframe architectures through client-server architectures to Internet architectures.
In most cases, revolutionary trends, and hence generational changes, are much easier to spot in hindsight than during the early stages of the revolution. Christiansen documents that most enterprises based on prior technological generations miss the revolutionary change and fail to participate in the next generation. As evidence, note that few mainframe era computer manufacturers currently exist as independent entities.
Storage architectures have gone through similar generational changes, albeit at a slower pace.
First Generation Storage: Direct Attached Storage
First generation storage is typified as being tightly coupled to the computing system's processing unit via direct attached proprietary channels or standard buses such as SCSI and ATA. This is commonly referred to as a direct attached storage (DAS) configuration (Figure 1). These configurations were designed to focus first on performance and eventually on interoperability.
DAS configurations are primarily limited by the amount of storage that they can handle. Each channel or bus is limited to a relatively small number of devices, typically less than eight. The physical length of channel is also limited to, at most, a few meters. This means that any physical disaster affecting the processing unit likely affects the attached storage as well. Most distressing, from the storage manager's perspective, is the difficulty in reconfiguring the storage, most commonly to increase the amount of storage. Adding storage requires extensive periods of system downtime to physically attach the storage, configure it logically within the system and then reallocate the new storage among the various applications.
Second Generation: Storage Networking
The innovation that defines the second generation of storage architectures involved decoupling storage from processing units and interconnecting various storage devices via a dedicated storage network (Figure 2). One of the first companies to popularize storage networking was Vinca Corporation, starting in 1992. Jay Carlson, then president of Vinca, recalls: "Those were heady times, while most of the industry was struggling to properly terminate a SCSI bus, we were envisioning networks of distributed storage modules executing 'SANware."'
Like most disruptive technologies, storage networking was initially viewed as an interesting approach applicable to particular niche markets. Storage networking is now accepted as the preferred storage architecture with a market size of over $100 billion. Storage networking comes primarily in two flavors, Storage Area Networking (SAN) and Network Attached Storage (NAS). SANs typically make use of tightly coupled, block-based, low-latency Fibre Channel interconnect while NAS is typically a loosely coupled, file-based, higher-latency Ethernet interconnect. The selection of storage networking flavor is guided by the application it serves. In general, high performance transaction processing applications demand SAN configurations while lower performance, file access based applications do well with simpler to manage NAS configurations.
Is a New Generation Needed?
Storage networking addresses storage capacity issues satisfactorily but does not solve overall data management issues. Storage networks often require bolted-on external technologies to properly manage the storage. Management functions such as backup, archiving, cloning, replication, disaster recovery and snapshot copies must be acquired separately from third party vendors and suffer from lack of integration and inter-compatibility. The next generation of storage architectures should be able to address management functions as an inherent aspect of the storage architecture, not as an aftermarket add-on.
John Spiers, CTO of LeftHand Networks, elaborates: "Current network storage strives to provide an always bigger, better and faster box but doesn't inherently address fundamental issues of manageability, including scalability, availability, replication and storage clustering." Jim Pownell, CTO of Inflection Systems, agrees: "Data storage customers continue to encounter a lot of pain managing data, particularly over the long term."
Next Generation Storage Initiatives
There are several interesting initiatives underway in the information storage industry. Some are focused on device technology, some on enhanced performance interconnect and others on better manageability. The following is a survey of some interesting initiatives and trends that portend future storage architecture developments.
Enhanced Backup Solutions Initiative (EBSI). Storage industry futurist/visionary Michael Peterson has created a new organization, EBSI, to address advanced data protection issues. The EBSI's efforts will revolve around repairing what end-users have identified as "cracks in the armor" of current backup technology. According to Peterson, three primary backup issues top the concerns of end-users: meeting backup time schedules, confidence in the completeness of a backup routine, and the time it takes to restore data. Rather than a standards setting body, EBSI is intended to be a common ground between storage venders and endusers. Storage venders and end-users interested in participating in this initiative are directed to the EBSI website at www.enhancedbackup.com.
Network Unified Storage (NUS). NUS is being developed by LeftHand Networks of Boulder, Colorado. LeftHand Networks' Spiers believes that NUS is a revolutionary architecture because of its ability to effectively integrate both locally and remotely distributed standard storage. Spiers notes, "NUS is to storage as blades are to servers." NUS relies on a block-based protocol characterized as "iSCSI lite." This protocol removes the SCSI protocol's device specific handcuffs and adds extensive management capabilities. Although currently proprietary to Left-Hand Networks, Spiers indicates that he is working with the Storage Networking Industry Association (SNIA) to make the NUS protocol an open standard.
Direct Access File Systems (DAFS). The DAFS protocol is a new file-access protocol designed to take advantage of emerging RDMA (remote direct memory access) interconnect technologies such as InfiniB and, VI and iWARP. DAFS is an optimized method of shared data access designed to bypass performance bottlenecks inherent in current network-attached storage and direct-attached storage architectures. It is optimized for high-throughput, low-latency communication, and for the requirements of local file-sharing architectures.
Currently, both network-attached storage and direct-attached storage require the operating system to copy data from file systems buffers into application buffers. DAFS, in conjunction with its underlying specialized interconnect hardware, enables direct, memory-to-memory transfers between application servers and storage servers. The application server, in essence, treats the remote storage server semiconductor memory as though it were a mapped section of its own internal memory, with only minimal performance and latency degradations.
Object-Based Storage. Object-based storage (Figure 3) moves low-level storage functions into the storage device itself. This enhanced storage device is designated an Object-based Storage Device (OSD). An OSD is a network-attached storage device that presents an interface of arbitrarily named data objects of variable size rather than sequentially numbered fixed-size blocks. These objects are subsequently used to deal with data storage details, such as request scheduling and data layout. One or more specialized servers manage the associated metadata separately. By separating data storage, metadata storage and management, heavily accessed data channels can be optimized for high performance and distributed access.
Content Addressed Storage (CAS). A CAS is a storage system whose storage objects are identified by their contents, or by a part of their contents, rather than by their names or positions with a storage device.
Storage industry powerhouse EMC provides a CAS-based networked storage solution designed exclusively for fixed content called Centera. With the Centera approach, applications access data objects based on a globally unique address that is derived from the object. That address, because it is unique to content, enables Centera to guarantee both the authenticity and integrity of the object. The application requires no knowledge of the physical or logical placement of content.
Caching Systems. Caching is a familiar method used in microprocessors to mitigate performance and capacity differences between frequently used memory locations and less used memory locations. Most modem processing units use a hierarchy of caches, starting with a small, high performance cache on the silicon of the processor itself. This on-chip cache is augmented by a larger, slightly slower performance cache on the substrate of the processor's package, and a larger, even slower, performance cache located on the processor's motherboard. Data held on local disk drives is often similarly cached in the motherboard's general semiconductor memory by the file system.
Storage architectures based on caching extends the caching hierarchy even further. Rather than static storage, the local hard disk drives of a client machine are treated as a cache for a LAN-attached storage server. The LAN-attached server may be, in turn, a cached subset of the complete enterprise dataset held, preferably, in an off-site storage facility. Each higher level of cache is constructed of respectively slower and, hence, less costly storage and interconnect technology.
Storage Messaging. Storage messaging proponent Carlson views the ultimate future storage architecture as an infinitely scalable message-based system. Storage messaging applies Oxford's C.A.R. Hoare's seminal "Communicating Sequential Processes" concepts to storage. The result is the long anticipated convergence of data storage and data communication. Convergence is possible because any communication channel has an inherent storage capacity (bandwidth times latency) and, conversely, any storage device has inherent communications channel characteristics (capacity divided by access time).
Futurist George Gilder has coined the term "storewidth" to describe this convergence. He points out that while bandwidth is abundant, connectivity is virtually absent. "Storewidth" is the conversion of abundant bandwidth and heterogeneous petabytes into accessible information.
The foundation of storage messaging is the notion that any data item should be treated as a message. All data is always considered to be in transit from the point at which it is created to the point it is destroyed. In general, each message contains a source, a destination, metadata, as well as the data payload itself. After creation, the messages travel seamlessly from device to device, channel to channel, until they are eventually destroyed. These messages, then, essentially manage themselves.
Carlson further clarifies, "Storage networking addresses storage quantity issues; storage messaging addresses storage quality issues. Think of the way our brain works. The information in our brain is not a collection of static data. It is a collection of messages, each with the capability of linking to and associating with other messages. These interlinked, distributed messages then become a coherent thought."
A Completely Different Way of Looking at Storage
Perhaps what will truly revolutionize the storage market are not novel architectures but rather a completely new way of looking at data storage. EBSI's Peterson points out, "Storage is now about business values, not price per megabyte. The storage manager needs to turn network storage in to a business weapon. Companies should be generating profits from their storage; storage should be turned from a cost center into a profit center. It can be shown that an investment in storage produces, on average, a four times return on investment." It follows that storage is becoming an independent profit center of a company's most valuable tactical and strategic asset.
In many companies, the value of that company's intangible assets greatly exceeds that of its tangible assets. These intangible assets include the company's databases, business processes, customer goodwill, investor confidence and, most important, its employees. All of these assets are thoroughly enmeshed with the company's data.
Carlson agrees with this new view of data storage within an enterprise. "The value of a company's data needs to be recorded on its balance sheet and audited as any important financial asset is." Specifically, the storage manager should take on the characteristics of a business unit manager as opposed to an infrastructure manager.
If one were to ask ten storage systems visionaries about the future, one would likely receive ten very divergent opinions. Swedish Physicist Niels Bohr opined: "Predictions are hard to make, especially if they involve the future." The easiest prediction to make is that the amount of data under management will continue to grow while the cost per bit of storage media will continue to shrink. Managing storage will be an ever-increasing component of IT costs. Carlson also points out that the ratio of actual data with respect to metadata will continue to shrink. More and more of the data under management will be metadata. The good news is that the initiatives and architectures discussed in this article will substantially enhance the productivity of the individual data managers.
LeftHand Networks' Spiers predicts: "In the future, storage will be abstracted and virtualized to the point that complex management will be reduced to a simple point and click operation. Storage is only starting to catch up to where server management technology, such as clustering and consolidation, has been for some time."
Inflection Systems' Pownell also believes that storage will be much more virtualized in the future. "Future storage will be virtualized at a higher level, at a file level as opposed to today's server and device level. Any storage component should be able to go away and be replaced without affecting the data accessibility."
Alacritus CTO Roger Stager foresees, "Future innovations will move more data services into the SAN." Services such as backup, archiving, point-in-time copying and replication will be performed on SAN switches and edge devices rather than the host, as is currently common.
EBSI's Peterson points out that in a DAS environment, productivity can be measured as about 0.4TB per administrator. In a networked storage environment, productivity increases to about 3TB per administrator, or nearly an order of magnitude increase. The promise of next generation architectures is a productivity increase to at least 50Th per administrator.
Perhaps renowned economist Peter F. Drucker understood technological advancement best when he said: "The best way to predict the future is to create it." The future, then, is really in our own hands. To the technologist lies the responsibility of inventing the architectures that will carry data storage and management to the next generation, to the data manager lies the responsibility to elevate the recognized value of the enterprise's data to its deserved level.
RELATED ARTICLE: Companies to Watch
The bulk of the practical attention to next generation storage architectures these days seems to be centered in small, venture capital back startup companies. While most are still operating in "black" mode, interesting companies to watch can be identified by keeping an eye on the movements of storage industry vision lenders. The following is a sampling of a few such companies:
* Vision Leader: John Spiers, formerly of Maxtor and MiniScribe.
* LeftHand Networks has developed Network Unified Storage (NUS). NUS allows customers to build affordable, intelligent and highly manageable storage area networks that use their existing IP infrastructure.
* Vision Leader: Garth Gibson formerly of Carnegie-Mellon.
* Panasas is creating a highly scalable and manageable storage network by combining innovative distributed storage software (distributed file system) with low-cost, industry-standard hardware. The result: multi-gigabyte throughput performance and a dramatically reduced total cost of ownership for storage systems.
* Vision Leaders: Jim Pownell, and Dave Therrien formerly of StorageNetworks & Highground Systems.
* Inflection Systems is developing an exciting new data storage system.
* Vision Leaders: Roger Stager and Don Trimmer formerly of Intelliguard.
* Alacritus is replacing tape libraries with the latest disk subsystem technology.
Al Mudrow is CTO of Yotta Systems (Orem, Utah)
|Printer friendly Cite/link Email Feedback|
|Publication:||Computer Technology Review|
|Date:||Apr 1, 2003|
|Previous Article:||Bumping the glass ceiling: government regs, terrorism, email and tight budgets place new demands on storage and bandwidth. (Storage Networking).|
|Next Article:||BMC pulls the plug, competitors pick up the pieces. (Storage Networking).|