Clustered storage: improved utility for production computer clusters.With origins in the DARPA DARPA: see Defense Advanced Research Projects Agency. (Defense Advanced Research Projects Agency) The name given to the U.S. Advanced Research Projects Agency during the 1980s. It was later renamed back to ARPA. Strategic Computing program of the mid-1980s, Beowulf-class cluster computing Cluster Computing: the Journal of Networks, Software Tools and Applications is a journal for parallel processing, distributed computing systems, and computer communication networks. has evolved as the dominant approach for developing high performance computing systems. In recent years, once prevalent vector computers from the likes of Cray, Convex, Alliant, Fujitsu, and others, have given way to cluster computing configurations for even the most aggressive computing applications. Top500.org, which tracks the most powerful computing systems in the world, recently reported that nearly 70% of the fastest computer systems in the world are now labeled as cluster configurations. This trend has been steadily increasing over the last few years. Once the esoteric tool of PhD computer scientists, cluster computing technology has matured to become an integral part of many organizations' production computing environments. Approaches refined in traditional scientific computing applications (such as high energy physics research, weather prediction, and seismic data analysis) are the core of a new wave of production-oriented applications in areas such as drug discovery, circuit design and simulation, aerospace and automotive design Automotive design is the profession involved in the development of motor vehicles or more specifically road vehicles. This most commonly refers to automobiles but also refers to motorcycles, trucks, buses, coaches, and vans. and simulation, financial analytics, and digital media. [FIGURE 1 OMITTED] Concurrent with this evolution, more capable instrumentation, more powerful processors, and higher fidelity computer models serve to continually increase the data throughput required of these clusters. This trend applies pressure to the storage systems used to support these I/O-hungry applications, and has prompted a wave of new storage solutions based on the same scale-out approach as cluster computing. Anatomy of Production High Throughput Computing Applications Most of these high throughput applications can be classified as one of two processing scenarios: data reduction, or data generation. In the former, large input datasets--often taken from some scientific instrument--are processed to identify patterns and/or produce aggregated descriptions of the input. This is the most common scenario for seismic processing, as well as similarly structured analysis applications such as micro array data processing data processing or information processing, operations (e.g., handling, merging, sorting, and computing) performed upon data in accordance with strictly defined procedures, such as recording and summarizing the financial transactions of a , or remote sensing Deriving digital models of an area on the earth. Using special cameras from airplanes or satellites, either the sun's reflections or the earth's temperature is turned into digital maps of the area. . In the latter scenario, small input datasets (parameters) are used to drive simulations that generate large output datasets--often time sequenced--that can be further analyzed or visualized. Examples here include crash analysis, combustion models, weather prediction, and computer graphics rendering applications used to generate special effects special effects, in motion pictures, cinematographic techniques that create illusions in the audience's minds as well as the illusions created using these techniques. and full-feature animated films. These two data-intensive scenarios are depicted in Figure 1. Divide and Conquer To address these problems, today's cluster computing approaches utilize what is commonly called a scale-out or shared nothing approach to parallel computing. In the scale-out model, applications are developed using a divide-and-conquer approach; the problem is decomposed de·com·pose v. de·com·posed, de·com·pos·ing, de·com·pos·es v.tr. 1. To separate into components or basic elements. 2. To cause to rot. v.intr. 1. into hundreds, thousands or even millions of tasks, each of which is executed independently (or nearly independently). The most common decomposition approach exploits a problem's inherent data parallelism-breaking the problem into pieces by identifying the data subsets, or partitions, that comprise the individual tasks, then distributing those tasks and the corresponding data partitions to the compute nodes for processing. These scale-out approaches typically employ single or dual processor compute nodes in a 1U configuration that facilitates rack-based implementations. Hundreds (or even thousands) of these nodes are connected to one another with high-speed, low latency proprietary interconnects such as Myricom's Myrinet, Infiniband, or commodity Gigabit Ethernet switches. Each compute node may process one or more application data partitions, depending on node configuration and the application's computation, memory, and I/O (Input/Output) The transfer of data between the CPU and a peripheral device. Every transfer is an output from one device and an input to another. See PC input/output. I/O - Input/Output requirements. These partitioned applications are often developed using the Message Passing Interface (communications, protocol) Message Passing Interface - A de facto standard for communication among the nodes running a parallel program on a distributed memory system. MPI is a library of routines that can be called from Fortran and programs. (MPI MPI - Message Passing Interface ) program development and execution environment. The scale-out environment allows accomplished programmers to exploit a common set of libraries to control overall program execution and to support the processor-to-processor communication required of distributed high-performance computing applications. Scale-out approaches provide effective solutions for many problems addressed by high-performance computing. It's All About Data Management However, scalability and performance come at a cost--namely, the additional complexity required to break a problem into pieces (data partitions), exchange or replicate information across the pieces when necessary, then put the partial result sets back together into the final answer. This data parallel approach requires the creation and management of data partitions and replicas that are used by the compute nodes. Management of these partitions and replicas poses a number of operational challenges, especially in large cluster and grid computing environments shared amongst a number of projects or organizations, and in environments where core datasets change regularly. This is typically one of the most time consuming and complex development problems facing organizations adopting cluster computing. Scalable shared data storage, equally accessible by all nodes in the cluster, is an obvious vehicle to provide the requisite data storage and access services to compute cluster clients. In addition to providing high bandwidth aggregate data access to the cluster nodes, such systems can provide nonvolatile storage for computing checkpoints and can serve as a results gateway for making cluster results immediately available to downstream analysis and visualization tools for an emerging approach known as computational steering. Yet until recently, these storage systems--implemented using traditional SAN and NAS (1) See network access server. (2) (Network Attached Storage) A specialized file server that connects to the network. A NAS device contains a slimmed-down operating system and a file system and processes only I/O requests by supporting the popular architectures--have only been able to support modest-sized clusters, typically no more than 32 or 64 nodes. Shared Storage Architectures Storage Area Networks (SANs) and Network Attached Storage (NAS) are the predominant storage architectures of the day. SANs extend the block-based, direct attached storage model across a high-performance dedicated switching fabric to provide device sharing capabilities and allow for more flexible utilization of storage resources. LUN and volume management software supports the partitioning and allocation of drives and storage arrays across a number of file or application servers (historically, RDBMS (Relational DataBase Management System) See relational database and DBMS. RDBMS - relational database systems). SATA-based SAN storage and further commoditization Commoditization 1. A situation when illiquid financial contracts are changed or modified in a way that promotes trading and results in a more liquid market. 2. Making a product into a commodity. Notes: 1. of FC switching and HBAs has fueled recent growth in the SAN market--and more widespread adoption in high performance computing applications. Network Attached Storage (NAS) systems utilize commodity computing and networking components to provide manageable storage directly to users through their standard client interconnection infrastructure (100 Mbit or 1 Gb Ethernet) using shared file access protocols (NFS (Network File System) The file sharing protocol in a Unix network. This de facto Unix standard, which is widely known as a "distributed file system," was developed by Sun. See file sharing protocol and WebNFS. NFS - Network File System and CIFS (Common Internet File System) The file sharing protocol used in Windows. It evolved out of the SMB (Server Message Block) protocol in DOS, which is why the terms CIFS/SMB and SMB/CIFS are sometimes seen. The word "Internet" in the CIFS name has little relevance. ). This class of storage includes both NAS appliances and systems constructed from DAS and SAN components that export their file systems over NFS and CIFS. NAS devices deliver large numbers of file "transactions" (ops) to a large client population (hundreds or thousands of users) but can be limited in implementation by what has been characterized as the filer bottleneck. Figure 2 depicts traditional DAS, SAN, and NAS storage architectures. Each of these traditional approaches has its limitations in high performance computing scenarios. SAN architectures improve on the DAS model by providing a pooled resource model for physical storage that can be allocated and re-allocated to servers as required. But data is not shared between servers and the number of servers is typically limited to 32 or 64. NAS architectures afford file sharing to thousands of clients, but run into performance limitations as the number of clients increases. [FIGURE 2 OMITTED] While these storage architectures have served the enterprise computing market well over the years, cluster computing represents a new class of storage system interaction--one requiring high concurrency Operations that are performed simultaneously within the computer. For example, dual-core CPUs provide complete overlapping of two independent processes. See dual core, hyperthreading, multiprocessing, multitasking, multithreading, SMP and MPP. concurrency - multitasking (thousands of compute nodes) and high aggregate I/O. This model pushes the limits of traditional storage systems. Enter Scale-out Storage So, why not just "scale-out" the storage architecture in the same way as the compute cluster--i.e. with multiple file servers that support the large cluster? In fact, many organizations have indeed tried this. However, bringing additional fileservers into the environment greatly complicates storage management. New volumes and mount points are introduced into the application suite and developers are taxed with designing new strategies for balancing both capacity and bandwidth across the multiple servers or NAS heads. Additionally, such an approach typically requires periodic reassessment and rebalancing Rebalancing The process of realigning the weightings of one's portfolio of assets. Notes: For example, if your portfolio's proportion of stock has grown too large for your intended assets weightings and risk tolerance, you might rebalance by selling some stock and putting of the storage resources--often accompanied by system down time. In short, these approaches "don't scale"--particularly from a manageability perspective. Now, on the horizon, are a number of clustered storage systems capable of supporting multiple petabytes of capacity and tens of gigabytes per second aggregate throughput--all in a single global namespace with dynamic load balancing and data redistribution. These systems extend current SAN and NAS architectures, and are being offered by Panasas (ActiveScale), Cluster File Systems Cluster File Systems, Inc. (CFS) is the company that originally developed the Lustre distributed file system. CFS was a privately held company with offices in the United States, Canada, and China. CFS was founded in 2001 by Dr. Peter Braam. (Lustre lustre In mineralogy, the appearance of a mineral surface in terms of its light-reflecting qualities. Lustre depends on a mineral's refractivity (see refraction), transparency, and structure. ), RedHat (Sistina GFS See Google File System. GFS - Grandfather, Father, Son ), IBM (International Business Machines Corporation, Armonk, NY, www.ibm.com) The world's largest computer company. IBM's product lines include the S/390 mainframes (zSeries), AS/400 midrange business systems (iSeries), RS/6000 workstations and servers (pSeries), Intel-based servers (xSeries) (GPFS GPFS General Parallel File System GPFS General Purpose Financial Statements GPFS General Purpose Flight Simulator GPFS Gallery Parallel File System ), SGI (SGI, Sunnyvale, CA, www.sgi.com) A manufacturer of workstations and servers, founded in 1982 by Jim Clark. The company was founded as Silicon Graphics, Inc., but changed to its acronym in 1999. (CxFS), Network Appliance (SpinServer), Isilon (IQ), Ibrix (Fusion), TerraScale (Terragrid), ADIC (StorNext), Exanet (ExaStore), and PolyServe (Matrix). These solutions use the same divide-and-conquer approach as scale-out computing architectures--spreading data across the storage cluster, enhancing data and metadata operation throughput by distributing load, and providing a single point of management and single namespace A collection of names for a particular purpose. Typically, each name is unique. For example, tables in a relational database must all have unique names. A Windows workgroup that uses the original NetBIOS naming system requires a different "made-up" name for each computer and printer in for a large, high performance file system. Clustered storage systems provide: * Scalable performance, in both bandwidth and IOPS IOPS Input/Output Per Second IOPS Input/Output Operations Per Second (server performance measurement) IOPS International Organization of Pension Supervisors IOPS Information Operations Planning System IOPS Internet Official Protocol Standards * Uniform shared data access for compute cluster nodes * Effective resource utilization, including automatic load and capacity balancing * Multi-protocol interoperability to support a range of production needs, including in-place post-processing and visualization Summary The commoditization of computing and networking technology has advanced the penetration of cluster computing into mainstream enterprise computing applications. The next wave of technology commoditization--scalable networked storage architectures--promises to accelerate this trend, fueling the development of new applications and approaches that leverage the increased performance, scalability, and manageability afforded by these systems. Bruce Moxon is chief solutions architect at Panasas (Fremont, CA) www.panasas.com |
|
||||||||||||||||||

Printer friendly
Cite/link
Email
Feedback
Reader Opinion