Printer Friendly

Storage clustering.

Storage clustering, also referred to as grid storage, is a new technology paradigm that pushes the scalability and efficiency boundaries of storage area networks (SANs) to new levels. Clustered storage is similar to clustered computing, providing an on-demand shared storage environment similar to the cluster model for compute resources. Storage clusters are made up of storage server farms linked together that work on similar tasks in a grid fashion by scaling out infrastructure as opposed to scaling up with larger and more powerful machines.

Clustered storage systems are typically made up of network-connected storage with an administrative function that manages a collection of physical disks. To a client or application server, this collection appears as a highly available block-level storage system that provides a large abstract pool of disks, or storage cluster. Storage clusters are accessible to all clients on the network. A client can create a volume in a cluster on demand to tap the entire capacity of the underlying physical resources. Furthermore, additional storage resources can be automatically incorporated into the storage cluster. Storage clustering provides clients with a virtual volume pool that can tolerate and recover from disk, server, and network failures.

[FIGURE 1 OMITTED]

[FIGURE 2 OMITTED]

Clustered storage offerings are common in the market today, but the majority of these solutions are based on legacy "scale-up" architectures. This legacy storage clustering technology requires a master controller or "head" to coordinate tasks among the nodes in the cluster. This architecture can introduce a single point of failure and presents a bottleneck that limits scalability--as you add storage servers they contend for the master controller's resources and performance is impeded. If the master controller function is in the I/O path (referred to as in-band) every I/O operation goes through a management software stack before it's stored on disk. This approach severely limits performance and can introduce reliability issues in the data path by performing master controller operations on I/O that the application may not be aware of, possibly resulting in unpredictable behavior. This is especially true when caching is involved. The challenge has been to develop a clustering technology that provides all the centralized management benefits from these legacy "scale up" technologies without sacrificing performance, reliability and availability.

Newly introduced storage clustering technology addresses these challenges by distributing the "master" or management functionality across multiple storage servers. These storage servers are in effect purpose-built master controllers. These storage servers, combined with distributed system software, are designed to be a peer in the storage cluster or grid. This storage clustering technology is defined as a parallel and distributed system that enables the sharing, selection and aggregation of storage resources distributed across multiple administrative domains. This implementation distributes the management task across multiple storage servers, eliminating any single point of failure and allowing the management function to scale as the cluster scales.

This distributed management function is not in the data path (referred to as out-of-band) and all I/O is performed directly from the server application to the disk. Distributed storage clusters can provide tremendous aggregate throughput by combining the strengths of multiple servers, aggregating processors and memory, with multiple SAN and network interfaces and a nearly unlimited number of storage servers. All components can be upgraded and serviced while the overall functionality and services of the cluster remain online. This distributed systems architecture also eliminates the single point of failure common with other clustering technologies by replicating data across storage servers, allowing the cluster to survive the loss of one or more storage servers.

Distributing the storage management intelligence also allows storage to be managed and synchronized across multiple geographic locations. Performing remote data replication with distributed management almost completely removes the burden of replication from host systems.

The storage cluster is an aggregate resource with a single system image regardless of the number of storage servers in the cluster, greatly simplifying cluster administration. Configuration and management tasks that would otherwise have to be repeated many times can be performed in a single location and automatically synchronize all cluster members. The cluster appears as a single storage system for attachment to servers.

This single system image functionality is often referred to as virtualization, where the application servers see a single storage image that represents the aggregate capacity of all storage server arrays. Virtualization provides the capability to aggregate the storage environment so that the administrator does not have to plan and provision storage on disparate arrays.

More advanced virtualization technology includes the ability to automatically load balance and allocate data across storage servers, optimizing performance and storage utilization. Dynamic load balancing eliminates system bottlenecks by ensuring uniform load distribution even in the face of component failure. If an application requires more storage re-sources, the distributed storage system is able to efficiently tie the additional resources together transparently and reallocate data on the fly. Storage servers can also be upgraded and serviced while the overall functionality of the cluster remains on-line and services provided by the cluster remain unaffected.

Summary

Managing large storage systems has historically been an expensive and complicated process. Often a single component failure can halt the entire system and require considerable time and effort to resume operation. Moreover, the capacity and performance of individual components in the system must be periodically monitored and balanced to reduce fragmentation and eliminate hot spots. This usually requires manually moving, partitioning, or replicating files and directories.

Distributed storage clustering or grid storage overcomes these limitations of legacy storage architectures, delivering a wide range of benefits:

* Scalability of performance, capacity and availability in small and modular increments

* Easy management of a single system image

* Non-disruptive data movement

* Higher utilization rates

* Lower hardware acquisition costs

* Ability to sustain failure of multiple elements without affecting data access

Distributed storage clustering greatly impacts the ability to build cost-effective SANs. By simplifying the management task and allowing for lower-cost SAN implementations, distributed storage clustering will allow the mid-tier market to take advantage of features available only with SANs such as high availability, snapshot, replication, and remote data movement.

John Spiers is CTO and visionary for LeftHand Networks (Boulder, CO)

www.lefthandnetworks.com
COPYRIGHT 2004 West World Productions, Inc.
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2004, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.

Article Details
Printer friendly Cite/link Email Feedback
Title Annotation:Information storage and retrieval
Author:Spiers, John
Publication:Computer Technology Review
Geographic Code:1USA
Date:Dec 1, 2004
Words:1022
Previous Article:Tape turning: protect against data loss.
Next Article:Achieving simplicity with clustered, virtual storage architectures.
Topics:


Related Articles
Clustering Strategies For Web Environments.
Understanding The Storage Paradigm Shift.
Shared Data Clusters: Achieving Application Scalability And Availability With A SAN.
Virtual storage and real confusion: a big disconnect between what vendors offer and what users want.
High Performance Computing: past, present and future.
Disk array storage considerations as part of TCO strategies.
Impact of key locality on database performance.
Clustered network storage: part one; Smarter, faster, cheaper and easier.
Achieving simplicity with clustered, virtual storage architectures.
Clustered network storage: part two; An evolution in storage.

Terms of use | Privacy policy | Copyright © 2018 Farlex, Inc. | Feedback | For webmasters