High Performance Computing: past, present and future.
High Performance Computing: A Landscape in Transition
Only a few years ago, the typical supercomputer was built using custom silicon, proprietary high performance interconnects and specialized storage subsystems. Companies such as IBM, Amdahl, Cray and Fujitsu developed complex systems that took years to bring to market and cost millions to tens of millions of dollars. Such systems were available only to national and international government-funded research centers.
Compare that with the current trend in high performance computing: cluster computers based on personal computer architectures, commodity microprocessors, Gigabit Ethernet and standard networked storage architectures. These systems represent the growing wave of commodity supercomputers. They are developed and sold by some of the traditional high-performance computing vendors, such as IBM, high volume PC manufacturers, such as Dell and HP, and a new breed of cluster computing vendors that includes LinuxNetworks, Rackable Systems and RackSaver. These systems can be acquired for as little as a hundred thousand dollars, making them available to a wide range of government, industry and academic institutions.
Cluster Computing: To Infinity and Beyond
The growth of cluster computing has been fueled by a number of key technology developments in recent years. First is the rapid advancement of CPU technology, in accordance with Moore's Law. Proprietary vector processors have given way, first to RISC-based processors employed in HPC systems of the 1980s and early 1990s, and more recently to commodity Intel processors whose integer and floating-point performance improvements have outpaced more specialized processors. Second is the commoditization of high performance networking technology, which provides the interconnection network, required for cluster computers to communicate with one another. Last is the maturation of the software infrastructure required to orchestrate the activities of hundreds or thousands of cluster computers, making the task of effectively harnessing these hardware advancements accessible to a larger group of programmers.
This last area has been instrumental in simplifying the installation, management and use of cluster computing systems. Together, these developments comprise a layer known as cluster middleware. Cluster management and monitoring software is now a common component of cluster vendors' offerings. This software allows for remote management of the cluster computing resources, including provisioning of additional nodes, node imaging (remote loading of the operating system) and application provisioning. Distributed resource management (DRM) software, such as Platform Computing's LSF, provides a means for managing and monitoring the distribution of computing jobs across the cluster from a single point of control. Parallel programming libraries and compiler enhancements, such as MPI and OpenMP, and distributed debugging and monitoring tools support the development of parallel computing applications that are able to effectively leverage these cluster computer architectures.
The result of these activities has been a broadening of high-performance computing. Traditional scientific high-performance computing applications such as high-energy physics research, environmental science and weather prediction, aerospace engineering, seismic data analysis, and signal and image processing, have been joined by a new wave of industrial applications. Drug discovery, circuit design, automobile design, financial analyses and digital media applications all benefit from the use of large cluster computing configurations.
Scalable Storage: The Final Frontier
These technology advances have removed many of the impediments encountered by early adopters of cluster computing technology. Still, one crucial development wave remains to complete the accessibility of cluster computing: the development of scalable commodity storage architectures to complement the scalable computing and networking architectures (Figure 1).
[FIGURE 1 OMITTED]
[FIGURE 2 OMITTED]
Today's cluster computing approaches employ a scale-out or data parallel computing model. In this model, applications apply a 'divide-and-conquer' approach: the computing problem is decomposed into pieces by identifying individual data partitions that comprise an individual task. Each of those tasks is then distributed to one of the compute cluster nodes for processing. Program inputs and outputs are typically maintained in centralized datasets residing on long-term storage subsystems, so data parallel applications must typically scatter portions of the input dataset to each compute node, then gather partial results and combine them into the final output.
The scatter-gather process depicted in Figure 2 represents a typical bioinformatics sequence analysis processing scenario. Sequence analysis locates similarities between DNA or protein sequences that can be used to understand genetic similarities and differences from individual to individual and across species. It is at the core of current genomic research efforts. In this depiction, the genomic sequence database of interest is partitioned and distributed (scattered) amongst a number of cluster computing nodes. Query sequences are broadcast (copied) to the nodes, where they are compared with the respective target partitions. The partial results generated on each node are then combined (gathered) to form the final result set.
This scattering and gathering of data partitions is time consuming and susceptible to program errors or compute node failures, making the overall process difficult to manage. Additionally, the massively parallel nature of large cluster computing environments (with thousands of processing nodes) is such that the number of concurrent accesses to the file system can be very high. Moreover, many parallel applications process data in somewhat synchronous waves where all of the processors are performing I/O operations (sometimes to the same files) at nearly the same time. These two characteristics manifest themselves as two unique storage system requirements: high concurrency to shared data and high aggregate throughput.
In order to serve the needs of growing cluster computer installations, such storage needs to be scalable (both in capacity and in bandwidth) in order to balance activity across storage system components and ensure adequate performance. Traditional storage approaches alone, including direct attached storage (DAS), network attached storage (NAS), and storage area networks (SAN) are insufficient to address many of these key requirements.
Scalable Storage Approaches
In the last few years, a number of storage systems have been developed to complement cluster-computing solutions. These include parallel and distributed file systems (sometimes called clustered file systems), such as DFS, PVFS, Sistina's GFS, Silicon Graphics' CXFS and IBM's GPFS. Each of these attempts to eliminate storage bottlenecks by distributing data to a number of storage devices and partitioning file system access among a number of "metadata servers."
The most recent, and perhaps most promising, advances are coming out of an area known as Object Storage. Object storage provides a new way of distributing and parallelizing file systems to support highly scalable configurations--precisely those needed by cluster computing systems. An emerging standard, known as Object-based Storage Devices (OSD), has been developed by a technical working group within the Storage Networking Industry Association (SNIA), and is working its way through the American National Standards Institute (ANSI) standards process. This standard defines an object storage SCSI command set that can be implemented over iSCSI to implement intelligent network attached storage devices that form massively parallel storage systems. Panasas, a new network storage company deploying an object storage file system, recently demonstrated scalability for sequential I/O loads in excess of 10 gigabytes per second in cluster computing configurations. Additionally, the system posted record-breaking SPECsfs results indicative of the system's ability to support general networked file system loads.
Into the Future: Diskless Clusters
High performance shared storage architectures capable of effectively serving thousands of compute nodes are further changing the HPC landscape. In an effort to improve manageability and increase system reliability, a number of researchers are developing diskless clusters that exploit these shared storage architectures. Diskless clusters simplify the node architecture by removing local disk and other non-essential components, significantly reducing the probability of node failure. This approach further reduces per-node costs and increases node density to provide increased levels of price-performance. To exploit these improvements, new operating system approaches such as the Beowulf Distributed Process Space project (BProc) are under way to further simplify cluster management.
Proponents claim that such an approach provides higher availability and improves reliability, resource utilization and overall system performance. These benefits, together with a simplified management model, accelerate commodity supercomputing.
Bruce Moxon is manager, vertical marketing, for Panasas, Inc. (Fremont, CA)
|Printer friendly Cite/link Email Feedback|
|Title Annotation:||Storage Networking|
|Publication:||Computer Technology Review|
|Date:||Jan 1, 2004|
|Previous Article:||How to control proliferating storage options: three tips for reining in your storage environment.|
|Next Article:||Tiered storage cuts costs, improves business alignment.|