Printer Friendly

Clustered storage: improved utility for production computer clusters.

With origins in the DARPA Strategic Computing program of the mid-1980s, Beowulf-class cluster computing has evolved as the dominant approach for developing high performance computing systems. In recent years, once prevalent vector computers from the likes of Cray, Convex, Alliant, Fujitsu, and others, have given way to cluster computing configurations for even the most aggressive computing applications., which tracks the most powerful computing systems in the world, recently reported that nearly 70% of the fastest computer systems in the world are now labeled as cluster configurations. This trend has been steadily increasing over the last few years.

Once the esoteric tool of PhD computer scientists, cluster computing technology has matured to become an integral part of many organizations' production computing environments. Approaches refined in traditional scientific computing applications (such as high energy physics research, weather prediction, and seismic data analysis) are the core of a new wave of production-oriented applications in areas such as drug discovery, circuit design and simulation, aerospace and automotive design and simulation, financial analytics, and digital media.


Concurrent with this evolution, more capable instrumentation, more powerful processors, and higher fidelity computer models serve to continually increase the data throughput required of these clusters. This trend applies pressure to the storage systems used to support these I/O-hungry applications, and has prompted a wave of new storage solutions based on the same scale-out approach as cluster computing.

Anatomy of Production High Throughput Computing Applications

Most of these high throughput applications can be classified as one of two processing scenarios: data reduction, or data generation. In the former, large input datasets--often taken from some scientific instrument--are processed to identify patterns and/or produce aggregated descriptions of the input. This is the most common scenario for seismic processing, as well as similarly structured analysis applications such as micro array data processing, or remote sensing. In the latter scenario, small input datasets (parameters) are used to drive simulations that generate large output datasets--often time sequenced--that can be further analyzed or visualized. Examples here include crash analysis, combustion models, weather prediction, and computer graphics rendering applications used to generate special effects and full-feature animated films. These two data-intensive scenarios are depicted in Figure 1.

Divide and Conquer

To address these problems, today's cluster computing approaches utilize what is commonly called a scale-out or shared nothing approach to parallel computing. In the scale-out model, applications are developed using a divide-and-conquer approach; the problem is decomposed into hundreds, thousands or even millions of tasks, each of which is executed independently (or nearly independently). The most common decomposition approach exploits a problem's inherent data parallelism-breaking the problem into pieces by identifying the data subsets, or partitions, that comprise the individual tasks, then distributing those tasks and the corresponding data partitions to the compute nodes for processing.

These scale-out approaches typically employ single or dual processor compute nodes in a 1U configuration that facilitates rack-based implementations. Hundreds (or even thousands) of these nodes are connected to one another with high-speed, low latency proprietary interconnects such as Myricom's Myrinet, Infiniband, or commodity Gigabit Ethernet switches. Each compute node may process one or more application data partitions, depending on node configuration and the application's computation, memory, and I/O requirements. These partitioned applications are often developed using the Message Passing Interface (MPI) program development and execution environment.

The scale-out environment allows accomplished programmers to exploit a common set of libraries to control overall program execution and to support the processor-to-processor communication required of distributed high-performance computing applications. Scale-out approaches provide effective solutions for many problems addressed by high-performance computing.

It's All About Data Management

However, scalability and performance come at a cost--namely, the additional complexity required to break a problem into pieces (data partitions), exchange or replicate information across the pieces when necessary, then put the partial result sets back together into the final answer. This data parallel approach requires the creation and management of data partitions and replicas that are used by the compute nodes. Management of these partitions and replicas poses a number of operational challenges, especially in large cluster and grid computing environments shared amongst a number of projects or organizations, and in environments where core datasets change regularly. This is typically one of the most time consuming and complex development problems facing organizations adopting cluster computing.

Scalable shared data storage, equally accessible by all nodes in the cluster, is an obvious vehicle to provide the requisite data storage and access services to compute cluster clients. In addition to providing high bandwidth aggregate data access to the cluster nodes, such systems can provide nonvolatile storage for computing checkpoints and can serve as a results gateway for making cluster results immediately available to downstream analysis and visualization tools for an emerging approach known as computational steering.

Yet until recently, these storage systems--implemented using traditional SAN and NAS architectures--have only been able to support modest-sized clusters, typically no more than 32 or 64 nodes.

Shared Storage Architectures

Storage Area Networks (SANs) and Network Attached Storage (NAS) are the predominant storage architectures of the day. SANs extend the block-based, direct attached storage model across a high-performance dedicated switching fabric to provide device sharing capabilities and allow for more flexible utilization of storage resources. LUN and volume management software supports the partitioning and allocation of drives and storage arrays across a number of file or application servers (historically, RDBMS systems). SATA-based SAN storage and further commoditization of FC switching and HBAs has fueled recent growth in the SAN market--and more widespread adoption in high performance computing applications.

Network Attached Storage (NAS) systems utilize commodity computing and networking components to provide manageable storage directly to users through their standard client interconnection infrastructure (100 Mbit or 1 Gb Ethernet) using shared file access protocols (NFS and CIFS). This class of storage includes both NAS appliances and systems constructed from DAS and SAN components that export their file systems over NFS and CIFS. NAS devices deliver large numbers of file "transactions" (ops) to a large client population (hundreds or thousands of users) but can be limited in implementation by what has been characterized as the filer bottleneck. Figure 2 depicts traditional DAS, SAN, and NAS storage architectures.

Each of these traditional approaches has its limitations in high performance computing scenarios. SAN architectures improve on the DAS model by providing a pooled resource model for physical storage that can be allocated and re-allocated to servers as required. But data is not shared between servers and the number of servers is typically limited to 32 or 64. NAS architectures afford file sharing to thousands of clients, but run into performance limitations as the number of clients increases.


While these storage architectures have served the enterprise computing market well over the years, cluster computing represents a new class of storage system interaction--one requiring high concurrency (thousands of compute nodes) and high aggregate I/O. This model pushes the limits of traditional storage systems.

Enter Scale-out Storage

So, why not just "scale-out" the storage architecture in the same way as the compute cluster--i.e. with multiple file servers that support the large cluster? In fact, many organizations have indeed tried this. However, bringing additional fileservers into the environment greatly complicates storage management. New volumes and mount points are introduced into the application suite and developers are taxed with designing new strategies for balancing both capacity and bandwidth across the multiple servers or NAS heads. Additionally, such an approach typically requires periodic reassessment and rebalancing of the storage resources--often accompanied by system down time. In short, these approaches "don't scale"--particularly from a manageability perspective.

Now, on the horizon, are a number of clustered storage systems capable of supporting multiple petabytes of capacity and tens of gigabytes per second aggregate throughput--all in a single global namespace with dynamic load balancing and data redistribution. These systems extend current SAN and NAS architectures, and are being offered by Panasas (ActiveScale), Cluster File Systems (Lustre), RedHat (Sistina GFS), IBM (GPFS), SGI (CxFS), Network Appliance (SpinServer), Isilon (IQ), Ibrix (Fusion), TerraScale (Terragrid), ADIC (StorNext), Exanet (ExaStore), and PolyServe (Matrix).

These solutions use the same divide-and-conquer approach as scale-out computing architectures--spreading data across the storage cluster, enhancing data and metadata operation throughput by distributing load, and providing a single point of management and single namespace for a large, high performance file system.

Clustered storage systems provide:

* Scalable performance, in both bandwidth and IOPS

* Uniform shared data access for compute cluster nodes

* Effective resource utilization, including automatic load and capacity balancing

* Multi-protocol interoperability to support a range of production needs, including in-place post-processing and visualization


The commoditization of computing and networking technology has advanced the penetration of cluster computing into mainstream enterprise computing applications. The next wave of technology commoditization--scalable networked storage architectures--promises to accelerate this trend, fueling the development of new applications and approaches that leverage the increased performance, scalability, and manageability afforded by these systems.

Bruce Moxon is chief solutions architect at Panasas (Fremont, CA)
COPYRIGHT 2004 West World Productions, Inc.
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2004, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.

Article Details
Printer friendly Cite/link Email Feedback
Title Annotation:Storage Clustering
Author:Moxon, Bruce
Publication:Computer Technology Review
Date:Dec 1, 2004
Previous Article:Achieving simplicity with clustered, virtual storage architectures.
Next Article:12 questions for Phil Schwan.

Related Articles
Clustering Strategies For Web Environments.
Shared Data Clusters: Achieving Application Scalability And Availability With A SAN.
Clustering for high availability: but don't forget about your backups! (Storage Networking).
The next evolution in storage: clustered storage architectures.
Smart object-based storage cluster computing.
High Performance Computing: past, present and future.
Clustered network storage: part one; Smarter, faster, cheaper and easier.
Storage clustering.
Looking back.
Clustered network storage: part two; An evolution in storage.

Terms of use | Privacy policy | Copyright © 2018 Farlex, Inc. | Feedback | For webmasters