SAN Cache: SSD In The SAN.Solid-State Disk (SSD See solid state disk. ) modules will be installed as shared file-caching facilities on Storage Area Networks (SANs). These SAN cache appliances will enable rapid growth of the SAN infrastructure by multiplying application performance and by supporting virtual storage addressing. Conversely, SAN architectures will expand the range of applications and environments that can exploit the full benefits of SSD by making it easy to share and easy to manage. Storage Area Network (SAN) discussions generally extol ex·tol also ex·toll tr.v. ex·tolled also ex·tolled, ex·tol·ling also ex·toll·ing, ex·tols also ex·tolls To praise highly; exalt. See Synonyms at praise. the benefits of sharing storage devices and subsystems--including disk drives, RAID subsystems, tape drives, automated tape libraries, and perhaps, other storage media such as CD-ROM CD-ROM: see compact disc. CD-ROM in full compact disc read-only memory Type of computer storage medium that is read optically (e.g., by a laser). or DVD DVD: see digital versatile disc. DVD in full digital video disc or digital versatile disc Type of optical disc. The DVD represents the second generation of compact-disc (CD) technology. . I would like to add another hardware category to the standard SAN architectural sketch: the Solid-State Disk (SSD) file cache appliance or "SAN cache" (Fig 1). Earlier articles have documented the impact of disk drive access density on I/O (Input/Output) The transfer of data between the CPU and a peripheral device. Every transfer is an output from one device and an input to another. See PC input/output. I/O - Input/Output performance ("Access Density--Key to Disk Performance" by Randy Kerns Kerns is a municipality in the canton of Obwalden in Switzerland. It has a population of c. 5,200. , Storage, Inc., Q2 1999) and the use of SSD to multiply application-server performance and scalability ("Disk I/O Performance Scaling: The File Caching Solution" by Michael Casey Michael Casey (born 1947 in Lowell, Massachusetts) is an Armenian-American poet. His first collection, Obscenities, was chosen by Stanley Kunitz for the Yale Series of Younger Poets. , www.soliddata.com/whitepapers/ file_caching.html). The latter article introduced a key distinction-- between block caching and file caching (see Sidebar) and described the benefits of server-attached SSD as a performance multiplier in transaction-intensive applications such as e-mail, messaging, and e-business. This article outlines the synergies between SSD and SAN technologies: SSD will become a key-enabling technology for SAN performance and scalability and SANs will enable enterprises add system integrators to exploit the full potential of SSD as a performance multiplier. These benefits will be most fully realized when the SAN infrastructure is automatically configured and managed as part of a virtual storage architecture. SSD As Enabling Technology For SANS When an intelligent solid-state disk subsystem is added to a SAN, it becomes a shared file caching facility for the application servers attached to the storage network. It also becomes available as part of the storage infrastructure and system integrators and ISVs can use the file cache to enable functionality and performance that would be impossible to achieve with mechanical disk drives alone. One key benefit is modular scalability of capacity and performance in the storage infrastructure. In a modular SAN architecture, a storage administrator or system integrator can configure the desired amount of capacity by adding disk drives or disk array modules. By adding SSD modules that are separate from the disk drive arrays, a SAN architect can independently "dial in" the desired amount of performance for transaction-intensive applications. Fig 2 illustrates this concept of managing capacity and performance as two separate dimensions with a separate "control dial" for each. Initially, this is a metaphor for a manual configuration process; however, the process will ultimately become an automated capability of virtual storage architectures. By exploiting an optimized mix of RAID and SSD modules, a SAN architecture can deliver cost-effective configurations for a wide range of applications--including those for which a single, monolithic approach is ineffective or needlessly expensive. Fig 3 positions a number of applications in terms of their high-water-mark requirements for response time and for bandwidth. SANs Enhance The Scope Of SSD SAN developments will make SSD file caching easier to connect and share, thus making SSD cost-effective for a wider range of operating environments In computing, an operating environment is the environment in which users run programs, whether in a command line interface, such as in MS-DOS or the Unix shell, or in a graphical user interface, such as in the Macintosh operating system. and applications. Fibre Channel provides a number of benefits, even when it is used simply as a replacement for SCSI SCSI in full Small Computer System Interface Once common standard for connecting peripheral devices (disks, modems, printers, etc.) to small and medium-sized computers. SCSI has given way to faster standards, such as Firewire and USB. connections in server-attached storage configurations (which is what most server vendors support today). These FC benefits include longer supported distances from server to storage and improved robustness and flexibility in hot-plug configurations. The benefits of Fibre Channel are more fully exploited in server clusters that employ FC connections between the server nodes and a shared storage facility. A cluster architecture can use a shared SSD to make key files available to all servers in the cluster; this provides shared, non-volatile storage (storage) non-volatile storage - (NVS, persistent storage, memory) A term describing a storage device whose contents are preserved when its power is off. Storage using magnetic media (e.g. while avoiding the mechanical access latencies that that are introduced by shared disk storage. High-availability cluster High-availability clusters (also known as HA Clusters or Failover Clusters) are computer clusters that are implemented primarily for the purpose of improving the availability of services which the cluster provides. software--such as VERITAS Cluster Server Veritas Cluster Server (also known as VCS) is a High-availability cluster software, for Unix, Linux and Microsoft Windows computer systems, created by Veritas Software (now part of Symantec). and Hewlett-Packard's MC/ServiceGuard--will dramatically increase application recovery speed by maintaining file system journals (write logs) and related data structures on a shared, high-speed file cache. As the industry moves toward switched-fabric SANs with heterogeneous server and operating system operating system (OS) Software that controls the operation of a computer, directs the input and output of data, keeps track of files, and controls the processing of computer programs. support, the deployment of SAN management software will enable enterprises to realize the benefits of file caching in a much broader range of applications. Whereas server-attached SSD is best suited to applications that can justify investment in a dedicated file-caching subsystem, a shared SAN cache can be used by applications that could not justify dedicated file caches. These include smaller NT and Linux servers, each of which might need only a fraction of the capacity provided by a robust SSD product. In a SAN, many servers can share one file cache and storage management software can allocate appropriate amounts to each server as the workload changes. The expanded range of applications will also include applications that only need a fast file cache for an occasional workload spike such as a month-end financial close that runs for 30 hours and needs to complete in less than three hours. It might be difficult to justify a dedicated file cache to speed up the month-end close, but it will be easy to justify allocating part of a shared SSD facility to that application for a few hours each month. The shared SAN connection, together with SAN management software will make this dynamic allocation easy enough for widespread adoption. Thus, file caching in the form of a shared SAN cache will be deployed to serve a much wider range of servers and applications. These developments will be further enhanced by virtual storage architectures. Virtual Storage Architectures The holy grail Holy Grail: see Grail, Holy. A very desired object or outcome that borders on a sacred quest. There are several Holy Grails in the computer business. of SAN evolution is the Virtual Storage Architecture (VSA VSA (in New Zealand) Voluntary Service Abroad ) in which SAN management software presents virtual disk volumes to the application servers and maps those virtual address ranges to physical storage devices connected through the SAN. Early examples include storage subsystems The part of a computer system that provides the storage. It includes the controller and disk drives. See storage system. and SAN domain servers developed by XIOtech (recently acquired by Seagate) and ConvergeNet (recently acquired by Dell Computer). Other server and storage vendors such as Compaq Computer and Sun Microsystems Sun Microsystems, Inc. (NASDAQ: JAVA[3]) is an American vendor of computers, computer components, computer software, and information-technology services, founded on 24 February 1982. are also developing virtual storage architectures. Ultimately, virtual storage architectures will accept high-level requests from a SAN administrator and will automatically allocate physical resources based on storage policies defined for each class of storage. For example, an administrator might use a storage control console to request creation of a 50GB virtual disk volume that can deliver 10,000 I/Os per second (at an 8KB block size) and a maximum response time of one millisecond One thousandth of a second. See space/time and ohnosecond. (unit) millisecond - (ms) One thousandth of a second, one thousand microseconds. A long time for a modern computer. . The management software would, then, configure the appropriate combination of physical resources available on the SAN such as fast disk, tape storage, and SSD. Virtual storage architectures will make SSD easy to configure and easy to use. For example, VSA software facilities and services will enable easy migration of data from slow storage to fast storage without disrupting application availability. SSD will also be a crucial component of the VSA infrastructure. Effective operation of a VSA requires very fast conversion of logical addresses to physical addresses. In a distributed SAN, the obvious place to store the address lookup tables An array or matrix of data that contains items that are searched. Lookup tables may be arranged as key-value pairs, where the keys are the data items being searched (looked up) and the values are either the actual data or pointers to where the data are located. will be a shared, high-speed SAN cache. In the future, as Storage Area Networks (SANs) are widely deployed and supported by sophisticated storage management tools--such as virtual storage architectures and policy--based storage management consoles--solid-state file cache will become an easily managed, shared facility on the SAN. As such, it will become attractive and cost-effective for architectural deployment in a wide range of applications. Michael Casey is the vice president of marketing at Solid Data Systems (Santa Clara Santa Clara, city, Cuba Santa Clara (sän`tä klä`rä), city (1994 est. pop. 217,000), capital of Villa Clara prov., central Cuba. , CA). File Caching With SSD For many transaction-intensive applications, it is possible to identify a small set of files that consume most of the I/O activity and place those files in a high-performance cache. This approach typically uses Solid-State Disk (SSD) for file caching and increases overall transaction performance by a factor of 200% or more on the existing servers. File caching differs from block caching in several respects. RAID cache--"block caching" in a RAID controller--is based on data blocks. Each block is identified by a SCSI block address (for example) and the RAID controller A disk controller card that supports one or more RAID configurations. Originally only for SCSI drives, RAID controllers have become very popular for PATA and SATA drives. See RAID. has no knowledge of which blocks are part of which file. It chooses what to cache (and what to flush from cache) based on historical usage statistics on the individual blocks (or disk tracks, in the case of EMC (1) (EMC Corporation, Hopkinton, MA, www.emc.com) The leading supplier of storage products for midrange computers and mainframes. Founded in 1979 by Richard J. Egan and Roger Marino, EMC has developed advanced storage and retrieval technologies for the world's largest companies. ). The caching algorithm looks at the usage history and tries to determine what will be needed next by the application. Since the controller cache is much smaller than the total amount of data stored on the disk drives, only a small percentage of the data can be kept in the cache. A given data block may reside in cache for only a few seconds or minutes before it is flushed from cache. The effectiveness of block caching depends on the application and, usually, an application will reach a point of diminishing returns--a point where adding more cache will not deliver much additional performance improvement. Once the application reaches that point, the next step is to start caching entire files based on an understanding of application structure. File caching depends on an understanding of the application structure--the identification of "hot files." Once the hot files have been identified, selected files are moved to the file cache--as a policy decision, not a statistical extrapolation (mathematics, algorithm) extrapolation - A mathematical procedure which estimates values of a function for certain desired inputs given values for known inputs. If the desired input is outside the range of the known values this is called extrapolation, if it is inside then . The hot files may reside on the SSD for days, weeks, or months. The "hit ratio" on data in the file cache is always 100%, since the entire file is always in the cache and available for access. Two conditions are necessary to make solid-state file caching a good bet: (1) the application server must be I/O bound Refers to an excessive amount of time getting data in and out of the computer in relation to the time it takes for processing it. Faster channels and disk drives improve the performance of I/O bound computers. See I/O intensive. ; (2) the I/Os must be skewed skewed curve of a usually unimodal distribution with one tail drawn out more than the other and the median will lie above or below the mean. skewed Epidemiology adjective Referring to an asymmetrical distribution of a population or of data : a small percentage of the files must drive a large percentage of the I/O activity. For example, in many e-mail and messaging applications, the message queues A storage space in memory or on disk that holds incoming transmissions until the computer can process them. See messaging middleware. represent a high percentage of the total I/O on a small percentage of the data. They are also very write-intensive files. This skewed I/O distribution is a feature of the application design and, thus, the application is a good fit for architectural adoption of solid-state file caching. Typically, a small, I/O-intensive fraction of the data is moved from cached RAID to a separate, non-volatile file cache. In suitable applications, addition of a solid-state file cache to an existing server configuration can boost throughput by a factor of four or even eight. This enables the system administrator to deliver the required performance and service levels without purchasing and managing several additional servers and their associated storage. |
|
||||||||||||||||||

Printer friendly
Cite/link
Email
Feedback
Reader Opinion