Achieving simplicity with clustered, virtual storage architectures.
To answer that question, we examine two architectural techniques that are common in the blade server and application worlds--virtualization and clustering--that are starting to enter the mainstream in the storage universe. But, how is an enterprise to choose which combination of storage virtualization and clustering, if any, is applicable, especially for a universe of scalable blade server clusters acting as the base for highly resilient, powerful application clusters? Can complexity be managed, or must complexity be eliminated? This article will show that only the latter has business value, and only the combination of virtualization and clustering can achieve simplicity; neither technique, by itself, can provide optimal reduction of complexity.
Virtualization Without Clustering
Virtualization of storage has many different manifestations. However, virtualization by itself is not the entire answer to optimal simplicity. If scalability is still a function of individual, non-clustered elements, and those non-clustered elements must be managed as such, then optimal simplicity cannot be achieved. It is not sufficient to merely 'move' the complexity up one level; in order to achieve simplicity, it must be eliminated. Still, since many examples of non-clustered virtual storage arrays exist in the marketplace, and many storage professionals 'think of it' as clustering, its exploration is worthwhile.
Virtualization without clustering may be accurately described as a redundant technique rather than a clustering technique. In particular, there is no decision-making during non-clustered controller fail-over or failback; there is by definition only one option. While virtual disks may be created and used, pairs of controllers do not operate in a clustered fashion. There may be multiple pairs, but there is no cluster. In other words, the virtualization does not extend to virtualizing controller elements across the infrastructure. This is a distinct inhibitor to achieving simplicity. Given this, the impact to business may be severe. For example, in a data recovery situation, non-clustered controller failover necessitates at best the physical relocation and complex reconnection of blade servers--i.e. physically moving the blade servers to the data--or at worst, the reverse replication of data, since the replicated data cannot be directly accessed--i.e. physically moving the data to the blade servers. Blade server re-connection or reverse replication may take hours or even days, for large volumes of data. The expense and risk of lost time, productivity and lost transaction opportunity is significant in such situations.
In addition, another factor which inhibits simplicity is the fact that any blade server that uses the LUNs made visible by the pair of controllers must have this software installed and running in order for dual-controller failover to succeed. Virtualized but non-clustered also means that a given set of physical disks are managed and accessed by one and only one given, fixed pair of controllers. In addition, any given LUN is managed and made accessible by one and only one controller. Failover (of a controller to its paired counterpart) may occur as a planned event, typically triggered by an administrator, or as an unplanned event, due to any failure between the blade server (initiator) and the controller. The controller proceeds to redirect all traffic to its LUNs over to its counterpart, and then must inform the blade server(s) using those LUNs to alter their communication path(s), since the paired controller by definition has a different SAN address (e.g. a Fibre Channel worldwide name) than the failing controller. This process takes anywhere from several tens of seconds to minutes. The end result is that all blade server(s) using those LUN(s) from the failed controller now communicate with its paired counterpart. The reverse process is similar, although it is always initiated by administrator intervention.
By definition, this architecture does not lend itself to reducing complexity. In fact, this architecture forces the enterprise to decide, a priori, which blade server(s) to connect to which controller pair(s), and how many (and of what size and speed) disk drives to place behind each controller pair. Once selected, this cannot be changed without downtime and data loss. By definition, this process does not scale and is not optimal--it is 'best guess'. Since the typical application and OS cannot tolerate a change in LUN address, as advertised by the controller, host software is required to facilitate failover. Without this software, paired failover would result in loss of access to data volumes.
In addition, the placement of LUNs (volumes) across the network is also an a priori decision. LUN movement cannot be facilitated in a virtual but non-clustered architecture. In other words, a LUN created and accessed via pair A cannot be logically moved to pair B. If attempted, the result is loss of access to the LUN. This phenomenon is the leading cause of poor storage utilization. The premise is identical to the premise of placing LUNs on internal, direct-attached disks. Once placed, the LUN can never be moved without downtime and re-configuration of OSes and applications.
Clustering Without Virtualization
In clustering without virtualization, as opposed to the virtualized but non-clustered technique described above, the storage elements (controllers and drives) form a cluster. However, volumes created and used by that cluster are not virtual, but rather maximal (i.e. capacity of 2 TB) physical volumes that may employ the non-virtual technique of 'sparse allocation'. In contrast, virtual disks employ the technique of online expansion, which is a very simple operation and does not require physical monitoring of the allocated blocks inside a non-virtual sparsely allocated volume. The cluster consists of 2 to N non-virtual controllers, which are arranged in a rack such that tightly-coupled communication exists over a common (or, in some cases, redundant) backplane. In current practice, N is usually 4, 8 or at most 16. In addition, N is always a multiple of two--odd-numbered collections of controllers are not allowed by the architecture. Another name for this method is 'multiple 2-way (dual) clustering'. It is a collection of 2-way clusters. Since this collection of 2-way clusters is physically contained and tightly coupled within a single rack or enclosure, there is no protection of the system itself from failure. In essence, the entire system, although composed of several 2-way clusters, is at risk of compromise since the components cannot be physically distributed throughout distinct and separate locations. Non-virtualized clustering is a physically 'captive' technique.
In this method, distribution of non-virtual LUNs throughout the cluster is a priori. As in non-clustered virtual arrays, one must decide which controller to provision and assign a LUN through before its creation. When active, a LUN is accessed via one and only one controller. In the case of any failure between the blade server (initiator) and its controller, the failing controller will attempt to move management of the appropriate LUNs over to its 'partner' in the rack, i.e. the controller physically coupled to it, since the controllers are arranged in pairs. However, unlike the non-clustered virtual method, the non-virtual cluster can locate and move LUNs over to a controller other than its partner, if the partner is unavailable (down) or otherwise disabled and not participating in the cluster. This is the distinguishing characteristic of non-virtual clustering compared to non-clustered virtualization. This method also requires any blade server which desires access to the LUNs in the non-virtual cluster to install and operate specialized software to handle the failover from within the cluster, similar to the non-clustered virtual method.
The advantage that non-virtual clustering has over non-clustered virtualization is its inherent greater reliability and resistance to unplanned downtime events. Colloquially stated, 'if 2 is good, N is better'. However, non-virtual clustering presents several points of inflexibility, as the pool of storage in the non-virtual cluster cannot be accessed by other non-virtual clusters. There is no provision for dynamic linking, in the virtual cluster sense, from one non-virtual cluster to another. Replication is required, just as it is in paired-failover, to achieve higher levels of data availability, between multiple instances of non-virtual clusters. Therefore, non-virtual clustering presents the same business detriments and deterrents that paired-failover does in data recovery scenarios.
In addition, non-virtual clustering is non-optimal in that the entire (non-virtual) cluster can suffer loss of availability if the rack (or cabinet) in which all the physical components are installed is compromised. Non-distributed clustering also fares no better than paired-controller pseudo-clustering in terms of its resistance against planned downtime.
Virtualization With Clustering
In contrast to the two methods--non-clustered virtualization and non-virtual clustering--described above, a virtualized clustering architecture provides N virtualized controllers and virtual disks within a clustered system. Virtualized clustering results in optimal simplicity--simplicity of provisioning, simplicity of management, and simplicity of operation. In addition, virtualization with clustering provides the necessary time to debug failures, since the design not only allows non-stop operation but in fact is designed assuming failures will occur; to keep things simple, the system merely removes the offending element from the cluster; since all services (paths, ports, controllers, links, disks, replication) are virtual, operation is not affected.
Virtualized clustering delivers distinct business advantages (hard dollar operational cost and time savings) over the other two techniques. A clustered design enables flexible investment, deployment, and management of storage resources. Within the cluster, several virtualization techniques--of capacity, performance, and location--and intelligent control eliminate the complexity and static nature of the other two methods by abstracting physical complexity into a highly intuitive and dynamic management environment. For example, since all physical storage resources are virtualized in a cluster, administrators manage LUNs associated with blade servers or applications instead of specific physical arrays, RAID groups and drives, saving time and reducing the need for a large storage administration staff. Virtualized clustering is also inherently designed for maximum data integrity and minimal recovery time in the case of site failures. It eliminates traditional design restrictions that result in outages of hours or days, enabling automatic use of dynamic virtual links to alternative storage clusters or single node sites for immediate return-to-operations.
Distributed clustering specifically seeks to increase flexibility in adding modules, replacing modules, and changing operations within individual modules, without disruption to the processes that are active within the cluster. For example, a distributed clustering storage system is characterized by the fact that each of the active components (in this case, storage controllers) is an independently operable entity. Said differently, each storage controller within a distributed cluster provides storage volume (LUN) access without requiring that another paired or partner controller exist.
In summary, these techniques represent the current 'state of the art' in storage architecture. As such, given its range of applications and business requirements, an enterprise must determine the 'best fit' to minimize cost and complexity, while maximizing business value and efficiency. Clearly, virtualized clustering offers the optimum levels of resiliency, responsiveness, and scalability in the widest range of environments. While specific arrays that perform non-clustered virtualization and non-virtual clustering exist, these arrays are inherently limited both from an architectural point-of-view as well as--most importantly--the view of the business, trying to achieve maximum efficiency and value for its time and money. In other words, achieving simplicity. For, over time, simplicity always wins out; unnecessary complexity will always prove to be a business inhibitor over the long haul.
Rob Peglar is vice president and chief technologist at XIOtech (Wildwood, MO)
|Printer friendly Cite/link Email Feedback|
|Title Annotation:||Storage Clustering|
|Publication:||Computer Technology Review|
|Date:||Dec 1, 2004|
|Previous Article:||Storage clustering.|
|Next Article:||Clustered storage: improved utility for production computer clusters.|