Printer Friendly
The Free Library
7,774,290 articles and books
Member login
User name  
Password 
 
Join us Forgot password?

Storage clustering.


Storage clustering, also referred to as grid storage, is a new technology paradigm that pushes the scalability and efficiency boundaries of storage area networks (SANs) to new levels. Clustered storage is similar to clustered computing, providing an on-demand shared storage environment similar to the cluster model for compute resources. Storage clusters are made up of storage server farms linked together that work on similar tasks in a grid fashion by scaling out infrastructure as opposed to scaling up with larger and more powerful machines.

Clustered storage systems are typically made up of network-connected storage with an administrative function that manages a collection of physical disks. To a client or application server, this collection appears as a highly available block-level storage system that provides a large abstract pool of disks, or storage cluster. Storage clusters are accessible to all clients on the network. A client can create a volume in a cluster on demand to tap the entire capacity of the underlying physical resources. Furthermore, additional storage resources can be automatically incorporated into the storage cluster. Storage clustering provides clients with a virtual volume pool that can tolerate and recover from disk, server, and network failures.

[FIGURE 1 OMITTED]

[FIGURE 2 OMITTED]

Clustered storage offerings are common in the market today, but the majority of these solutions are based on legacy "scale-up" architectures. This legacy storage clustering technology requires a master controller or "head" to coordinate tasks among the nodes in the cluster. This architecture can introduce a single point of failure and presents a bottleneck that limits scalability--as you add storage servers they contend for the master controller's resources and performance is impeded. If the master controller function is in the I/O (Input/Output) The transfer of data between the CPU and a peripheral device. Every transfer is an output from one device and an input to another. See PC input/output.

I/O - Input/Output
 path (referred to as in-band) every I/O operation goes through a management software stack (1) A stack that is implemented in memory rather than in hardware registers. See stack.

(2) A generic reference to a set of system programs or a set of application programs that form a complete system. See stack.
 before it's stored on disk. This approach severely limits performance and can introduce reliability issues in the data path by performing master controller operations on I/O that the application may not be aware of, possibly resulting in unpredictable behavior. This is especially true when caching is involved. The challenge has been to develop a clustering technology that provides all the centralized cen·tral·ize  
v. cen·tral·ized, cen·tral·iz·ing, cen·tral·iz·es

v.tr.
1. To draw into or toward a center; consolidate.

2.
 management benefits from these legacy "scale up" technologies without sacrificing performance, reliability and availability.

Newly introduced storage clustering technology addresses these challenges by distributing the "master" or management functionality across multiple storage servers. These storage servers are in effect purpose-built master controllers. These storage servers, combined with distributed system See distributed computing.

distributed system - A collection of (probably heterogeneous) automata whose distribution is transparent to the user so that the system appears as one local machine.
 software, are designed to be a peer in the storage cluster or grid. This storage clustering technology is defined as a parallel and distributed system that enables the sharing, selection and aggregation of storage resources distributed across multiple administrative domains (networking) Administrative Domain - (AD) A collection of hosts and routers, and the interconnecting network(s), managed by a single administrative authority. . This implementation distributes the management task across multiple storage servers, eliminating any single point of failure and allowing the management function to scale as the cluster scales.

This distributed management function is not in the data path (referred to as out-of-band) and all I/O is performed directly from the server application to the disk. Distributed storage Storing data in multiple computers or in computers that are geographically dispersed. This was an early term for storage that evolved into SANs and storage virtualization. See SAN and storage virtualization.  clusters can provide tremendous aggregate throughput by combining the strengths of multiple servers, aggregating processors and memory, with multiple SAN and network interfaces and a nearly unlimited number of storage servers. All components can be upgraded and serviced while the overall functionality and services of the cluster remain online. This distributed systems Distributed systems (computers)

A distributed system consists of a collection of autonomous computers linked by a computer network and equipped with distributed system software.
 architecture also eliminates the single point of failure common with other clustering technologies by replicating data across storage servers, allowing the cluster to survive the loss of one or more storage servers.

Distributing the storage management intelligence also allows storage to be managed and synchronized syn·chro·nize  
v. syn·chro·nized, syn·chro·niz·ing, syn·chro·niz·es

v.intr.
1. To occur at the same time; be simultaneous.

2. To operate in unison.

v.tr.
1.
 across multiple geographic locations. Performing remote data replication with distributed management almost completely removes the burden of replication from host systems.

The storage cluster is an aggregate resource with a single system image regardless of the number of storage servers in the cluster, greatly simplifying cluster administration. Configuration and management tasks that would otherwise have to be repeated many times can be performed in a single location and automatically synchronize See synchronization.  all cluster members. The cluster appears as a single storage system for attachment to servers.

This single system image functionality is often referred to as virtualization An umbrella term for enhancing a computer's ability to do work. Following are the ways virtualization is used.

Hardware Virtualization
Partitioning the computer's memory into separate and isolated "virtual machines" simulates multiple machines within one physical computer.
, where the application servers see a single storage image that represents the aggregate capacity of all storage server arrays. Virtualization provides the capability to aggregate the storage environment so that the administrator does not have to plan and provision storage on disparate arrays.

More advanced virtualization technology See VT. See also virtualization.  includes the ability to automatically load balance and allocate data across storage servers, optimizing performance and storage utilization. Dynamic load balancing The fine tuning of a computer system, network or disk subsystem in order to more evenly distribute the data and/or processing across available resources. For example, in clustering, load balancing might distribute the incoming transactions evenly to all servers, or it might redirect them  eliminates system bottlenecks by ensuring uniform load distribution even in the face of component failure. If an application requires more storage re-sources, the distributed storage system is able to efficiently tie the additional resources together transparently and reallocate Verb 1. reallocate - allocate, distribute, or apportion anew; "Congressional seats are reapportioned on the basis of census data"
reapportion

allocate, apportion - distribute according to a plan or set apart for a special purpose; "I am allocating a loaf of
 data on the fly. Storage servers can also be upgraded and serviced while the overall functionality of the cluster remains on-line and services provided by the cluster remain unaffected.

Summary

Managing large storage systems has historically been an expensive and complicated process. Often a single component failure can halt the entire system and require considerable time and effort to resume operation. Moreover, the capacity and performance of individual components in the system must be periodically monitored and balanced to reduce fragmentation and eliminate hot spots hot spots

acute moist dermatitis.
. This usually requires manually moving, partitioning, or replicating files and directories.

Distributed storage clustering or grid storage overcomes these limitations of legacy storage architectures, delivering a wide range of benefits:

* Scalability of performance, capacity and availability in small and modular increments

* Easy management of a single system image

* Non-disruptive data movement

* Higher utilization rates

* Lower hardware acquisition costs

* Ability to sustain failure of multiple elements without affecting data access

Distributed storage clustering greatly impacts the ability to build cost-effective SANs. By simplifying the management task and allowing for lower-cost SAN implementations, distributed storage clustering will allow the mid-tier market to take advantage of features available only with SANs such as high availability Also called "RAS" (reliability, availability, serviceability) or "fault resilient," it refers to a multiprocessing system that can quickly recover from a failure. There may be a minute or two of downtime while one system switches over to another, but processing will continue. , snapshot, replication, and remote data movement.

John Spiers John Spiers (born 1975) is an English melodeon, concertina and bandoneón player. He was born in Birmingham. His father was a Morris dancer. He attended John Mason School in Abingdon, and then went on to study genetics at King's College, Cambridge.  is CTO (Chief Technical Officer) The executive responsible for the technical direction of an organization. See CIO and salary survey.  and visionary for LeftHand Networks (Boulder, CO)

www.lefthandnetworks.com
COPYRIGHT 2004 West World Productions, Inc.
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2004, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.

 Reader Opinion

Title:

Comment:



 

Article Details
Printer friendly Cite/link Email Feedback
Title Annotation:Information storage and retrieval
Author:Spiers, John
Publication:Computer Technology Review
Geographic Code:1USA
Date:Dec 1, 2004
Words:1022
Previous Article:Tape turning: protect against data loss.(first in/first out)
Next Article:Achieving simplicity with clustered, virtual storage architectures.(Storage Clustering)
Topics:



Related Articles
Clustering Strategies For Web Environments.(Technology Information)
Understanding The Storage Paradigm Shift.(Technology Information)
Shared Data Clusters: Achieving Application Scalability And Availability With A SAN.(Technology Information)
Virtual storage and real confusion: a big disconnect between what vendors offer and what users want.(Industry Overview)
High Performance Computing: past, present and future.(Storage Networking)
Disk array storage considerations as part of TCO strategies.(TCO: Disk Arrays)(Total cost of ownership)
Impact of key locality on database performance.(Database Technique)
Clustered network storage: part one; Smarter, faster, cheaper and easier.(first in/first out)
Achieving simplicity with clustered, virtual storage architectures.(Storage Clustering)
Clustered network storage: part two; An evolution in storage.

Terms of use | Copyright © 2009 Farlex, Inc. | Feedback | For webmasters | Submit articles