The dumb disk is dead: viva the smart disk!Microprocessor performance has largely kept pace with Moore's Law "The number of transistors and resistors on a chip doubles every 18 months." By Intel co-founder Gordon Moore regarding the pace of semiconductor technology. He made this famous comment in 1965 when there were approximately 60 devices on a chip. , with performance doubling roughly every 18 months. However, disk performance has not improved at anywhere near the same pace, causing today's CPU CPU in full central processing unit Principal component of a digital computer, composed of a control unit, an instruction-decoding unit, and an arithmetic-logic unit. to waste more time waiting for data to be read from disk into local memory. The additional microprocessor muscle has given most commercial applications only a partial boost, as predicted by Amdahl's law "Overall system speed is governed by the slowest component." By Gene Amdahl, chief architect of IBM's first mainframe series and founder of Amdahl Corporation and other companies. Amdahl's law applied to networking. of balanced system performance. The gap in raw performance between disk and the CPU/memory complex was huge when the first disks were designed and built 50 years ago, and their basic principles of operation have not changed. Disk capacity has been growing at a tremendous pace, about 60% per year, but the speed of data access has been growing at a slower pace. The average data-transfer rate, usually measured in megabytes per second (unit) megabytes per second - (MBps, MB/s) Millions of bytes per second. A unit of data rate. 1 MB/s = 1,000,000 bytes per second (not 1,048,576). (MB/s), has grown about 40% per year. The number of I/O (Input/Output) The transfer of data between the CPU and a peripheral device. Every transfer is an output from one device and an input to another. See PC input/output. I/O - Input/Output operations per second (IOPS IOPS Input/Output Per Second IOPS Input/Output Operations Per Second (server performance measurement) IOPS International Organization of Pension Supervisors IOPS Information Operations Planning System IOPS Internet Official Protocol Standards )--an important factor in the performance of most commercial applications--has grown at a mere 16% annual rate. That's a far cry from the 67% yearly improvement rate in CPU performance. As a result, database systems and many other applications try hard to avoid disk I/O and perform as much work in memory as possible. In fact, this method of disk-I/O avoidance is the primary reason 64-bit processors have become the norm in high-end servers--each application or database process can use hundreds of gigabytes of main memory to keep a local copy of its data. Even disk-array markers like market leaders EMC (1) (EMC Corporation, Hopkinton, MA, www.emc.com) The leading supplier of storage products for midrange computers and mainframes. Founded in 1979 by Richard J. Egan and Roger Marino, EMC has developed advanced storage and retrieval technologies for the world's largest companies. , Hitachi, and IBM (International Business Machines Corporation, Armonk, NY, www.ibm.com) The world's largest computer company. IBM's product lines include the S/390 mainframes (zSeries), AS/400 midrange business systems (iSeries), RS/6000 workstations and servers (pSeries), Intel-based servers (xSeries) are racing to increase their cache sizes as quickly as memory density and pricing allow--the more disk I/O that can be avoided, the better the system performs. With all of this work to avoid storing and retrieving data on them, why do we still use disk drives? Because the 50-year-old design remains the fastest, most cost-effective way to store changeable data persistently. For most desktop applications and many departmental and low-end enterprise applications, there's enough performance headroom to last several years. Of course, solid-state disks (pure-memory devices that simulate disk drives) provide blazing I/O performance, but they remain too costly to purchase and manage for most customers. Unless a break-through occurs in persistent storage, current performance trends in commercial data centers will force frequently accessed data to continue moving into memory devices, leaving disks to be used for "fixed-content" or "archival" data storage and to hold backup copies of memory-resident data. (Sound familiar? That's just how disk vendors currently compare Serial ATA drives to higher-performing SCSI SCSI in full Small Computer System Interface Once common standard for connecting peripheral devices (disks, modems, printers, etc.) to small and medium-sized computers. SCSI has given way to faster standards, such as Firewire and USB. or Fibre Channel disks.) A new kind of device that's much faster, denser, and less expensive per MB than the current 50-year old design is needed. Some of the notions being kicked around research labs include tiny surface-mount disk-arrays on a card; nano-electrical mechanical systems (NEMS n. 1. (Zool.) The ichneumon. ), which are like micro-machines in silicon; and newer types of flash memory. No practical products appear to be near the horizon, but the need grows ever more dire. But merely replacing disks with something faster won't be enough--it would only move the bottleneck to the storage network. Fibre Channel throughput has improved fourfold in the last ten years, but latency improvement hasn't kept pace. Even if it did, storage-network performance improvement would still be dwarfed by the thirteen-fold improvement pace of CPU performance over the same period. The key to performance improvement remains one that database developers already understand well: keeping data that's most needed as electrically close to the CPU as possible. The current trend toward networked storage has obvious cost and manageability advantages, but it also further separates processing from storage--only to transfer huge amounts of it over the network in order to find the "important stuff" that should be cached in server memory. What if, instead, some of that processing was performed much closer to the storage media? This is the central idea of the "smart disk." Also known as "data-centric computing," this is a model of storage where all of the application intelligence associated with a set of data is located on the storage controller directly attached to the disk containing that data. It has the added advantage of lengthening the time available for a good replacement for today's disk drive to be developed. A good first step in this direction is to off-load just a subset of functions from the main server--things like the detailed device control that has already been off-loaded to HBAs and NICs; transport-level protocols such as TCP/IP TCP/IP in full Transmission Control Protocol/Internet Protocol Standard Internet communications protocols that allow digital computers to communicate over long distances. , iSCSI, and others are also heading in the same direction. The more radical end goal advocated by some researchers is to move entire applications to the device controllers, using higher-level protocols such as CORBA (Common Object Request Broker Architecture) A software-based interface from the Object Management Group (OMG) that allows software modules (objects) to communicate with each other no matter where they are located on a private network or the global , COM (1) (Computer Output Microfilm) Creating microfilm or microfiche from the computer. A COM machine receives print-image output from the computer either online or via tape or disk and creates a film image of each page. +, Web Services/SOAP, RMI (Remote Method Invocation) A standard from Sun for distributed objects written in Java. RMI is a remote procedure call (RPC), which allows Java objects (software components) stored in the network to be run remotely. , or something else from the alphabet soup of distributed-programming acronyms to wire everything together. In effect, it's a more-extreme form of scale-out clustering. This vision is certainly a long way off, if it comes to pass at all. But as current architectures become more strained--and as developers and customers get increasingly accustomed to thinking, programming, and managing IT components in an aggregate, clustered way--data-centric computing will start to look less and less bizarre. www.illuminata.com David Freund is practice leader, Information Architectures, at Illuminata Inc. (Nashua, NH) |
|
||||||||||||||||||||

Printer friendly
Cite/link
Email
Feedback
Reader Opinion