Printer Friendly
The Free Library
14,506,237 articles and books
Member login
User name  
Password 
 
Join us Forgot password?

Clustered Servers And Redundant I/O Ports.


X 86-based servers deliver 99.9% uptime--two nines short of the goal. Clustering x86-based servers add a fourth nine and as site demand climbs, clusters can be expanded with compute nodes or migrated to SAN topologies with virtual cluster nodes to meet increasing transaction rates and expanding databases. The reliability of the individual servers within a cluster can be enhanced with redundant I/O ports (1) (Input/Output port) A pathway into and out of the computer. See port.

(2) (Input/Output port) In a PC, an address used for input or output. See PC I/O addressing.
. This additional level of fault tolerance See fault tolerant.

(architecture) fault tolerance - 1. The ability of a system or component to continue normal operation despite the presence of hardware or software faults. This often involves some degree of redundancy.

2.
 reduces the number of cluster transitions and, hence, transition-related downtime The time during which a computer is not functioning due to hardware, operating system or application program failure. . Decreasing the number of 30 to 60 second cluster transitions may seem like a meager mea·ger also mea·gre  
adj.
1. Deficient in quantity, fullness, or extent; scanty.

2. Deficient in richness, fertility, or vigor; feeble: the meager soil of an eroded plain.

3.
 availability improvement, but remember the downtime budget is only five minutes a year and that more reliable cluster nodes also shrink the amount time when system responsiveness is sluggish due to a failed node. Multiple server I/O ports also improve system performance by spreading the I/O (Input/Output) The transfer of data between the CPU and a peripheral device. Every transfer is an output from one device and an input to another. See PC input/output.

I/O - Input/Output
 workload across multiple channels.

Unfortunately, most x86 operating systems Operating systems can be categorized by technology, ownership, licensing, working state, usage, and by many other characteristics. In practice, many of these groupings may overlap.  do not support the dynamic storage reconfiguration that is required to realize the availability and performance benefits of multiple server I/O ports. These OSs were not designed for storage to disappear on one I/O channel See channel.  and reappear reappear
Verb

to come back into view

reappearance n

Verb 1. reappear - appear again; "The sores reappeared on her body"; "Her husband reappeared after having left her years ago"
 on another.

Consider the simple cluster configuration shown in Fig 1. Two nodes with redundant I/O ports (Host Bus Adapters--HBAs) are connected to dual active controllers via redundant switches and four RAID-protected logical volumes are connected to redundant controllers over redundant back-end fibre loops. The entire I/O subsystem from the servers to the disks is bullet proof.

Server 0 can be set up to balance the I/O workload across its two I/O ports; for example, it could be configured to access Volume A over data path 0 (the path emanating from HBA (Host Bus Adapter) See host adapter.  0) and Volume B over data path 1. If any element in path 0 fails (HBA 0, Switch 0, Controller 0, or any of the links interconnecting them), Volume A is physically accessible using the path 1. The plumbing is in place for both servers to access all of the volumes over multiple paths. However, as previously mentioned, most x86 OSs cannot exploit this high availability Also called "RAS" (reliability, availability, serviceability) or "fault resilient," it refers to a multiprocessing system that can quickly recover from a failure. There may be a minute or two of downtime while one system switches over to another, but processing will continue.  configuration because they cannot cope with dynamic path reconfiguration. If Server 0 were configured to access Volume A over path 0 and if Volume A suddenly appears on path 1, the file system will recognize Volume A on path 1 as a different logical volume than Volume A on path 0. In fact, some x86 OSs such as Windows NT (Windows New Technology) A 32-bit operating system from Microsoft for Intel x86 CPUs. NT is the core technology in Windows 2000 and Windows XP (see Windows). Available in separate client and server versions, it includes built-in networking and preemptive multitasking.  require an access control mechanism to be installed on this configuration; otherwise, each server will access the volumes that it owns over both I/O channels and will see each logical volume with two different addresses.

Alternate Path Software (APS)

Alternate path software resides in the server(s) and enhances system performance and data accessibility. APS automatically balances the I/O workload across multiple I/O data paths to maximize path and storage resource efficiency. APS checks volume IDs and recognizes volumes that appear on multiple I/O channels as the same volume. When this occurs, APS presents a single image of the volume to the file system. While APS implementations vary in their level of sophistication so·phis·ti·cate  
v. so·phis·ti·cat·ed, so·phis·ti·cat·ing, so·phis·ti·cates

v.tr.
1. To cause to become less natural, especially to make less naive and more worldly.

2.
, most products designed for x86 servers use a primary path/secondary path scheme. The system manager designates a primary I/O path between servers and logical volumes and an alternate path in case the primary path fails. In more advanced APS products, the software will perform this task and automatically balance the volumes across the multiple paths. In either case, APS will detect path failures and redirect I/O to the secondary path transparently to the OS. APS acts as a filter driver that provides a level of indirection Not direct. Indirection provides a way of accessing instructions, routines and objects when their physical location is constantly changing. The initial routine points to some place, and, using hardware and/or software, that place points to some other place.  between the OS and the I/O paths.

With APS installed on the servers in our previous example (Fig 2), path 0 can be designated the primary path between Server 0 and Volume A and path 1 as the secondary path. Now, if any element in path 0 fails, APS will detect the presence of the path failure and redirect I/O that the OS intended for path 0 to path 1 (HBA 1 Switch 1 Controller 1 Volume A). The OS will continue issuing I/O to Volume A using the path 0 address and APS will redirect the I/O using the path B address. When the path failure is repaired, APS will sense the restoration and return to using path 0 for I/Os addressed to Volume A.

An example of APS is PATHPilot from Mylex. PATHPilot keeps applications running through fault detection, automatic I/O path failover and failback, and I/O rerouting. It enhances system performance through automatic volume load balancing The fine tuning of a computer system, network or disk subsystem in order to more evenly distribute the data and/or processing across available resources. For example, in clustering, load balancing might distribute the incoming transactions evenly to all servers, or it might redirect them . PATHPilot presents logical volumes to the OS over multiple fault-tolerant optical or copper pipes for continuous access to e-commerce data.

Fabrics

Fabrics connecting compute nodes and storage arrays can be designed to resemble miniature telephony networks and to provide a comparable level of service. Fabrics are composed of point-to-point links and switching elements that enable any-to-any connectivity and alternate source-to-destination routing. Fabrics are intrinsically more robust and scaleable than systems with directly attached storage due to the logical and physical isolation of switches, and fabrics scale linearly as nodes are added--bandwidth automatically increases, assuming switch ports are available. Links and fabric elements can be replicated to supply surplus capacity for peak demand and to provide alternate paths through the fabric so that data transmissions can bypass path failures. Fibre Channel media has a relatively low error rate and the protocol provides robust error detection and flow control mechanisms. Fabrics are capable of supporting the five nines objective.

Dual-Active Controllers

Dual-active RAID controllers operate in tandem Adv. 1. in tandem - one behind the other; "ride tandem on a bicycle built for two"; "riding horses down the path in tandem"
tandem
 to share the workload; double I/O capacity, and make storage pools failure tolerant. If one controller fails, its port address fails-over to its partner, which assumes the entire workload, thus insuring continuous access to data. Storage arrays are designed so that fail-over and fail-back transitions occur transparently to operating systems. To insure data integrity after the transition, advanced RAID controllers mirror data across controller caches and provide a mechanism that insures that the applications never access dirty cache pages. To provide complete I/O subsystem redundancy for e-commerce applications, fibre-to-fibre RAID controllers offer redundant back-end disk channels.

Cache Mirroring

Caching dramatically improves I/O performance in e-commerce applications. Data written to a cache is vulnerable to loss until it is made permanent on disk. Since controllers acknowledge writes as "complete" as soon as they are stored in cache, applications are oblivious to cached data that is lost due to power or controller failure. By maintaining a redundant copy of cached writes that have not been written to disk, cache mirroring protects against data loss.

Cache Coherency Managing a cache so that data are not lost or overwritten. For example, when data are updated in a cache, but not yet transferred to its target memory or disk, the chance of corruption is greater. Cache coherency is obtained by well-designed algorithms that keep track of the cache.  

In SAN topologies with multiple paths to data, mirrored caches require synchronization (1) See synchronous and synchronous transmission.

(2) Ensuring that two sets of data are always the same. See data synchronization.

(3) Keeping time-of-day clocks in two devices set to the same time. See NTP.
 to insure I/O always access the current state of data regardless of the path used to get to data. Advanced RAID controllers "lock" targeted areas of the disk pool to prevent simultaneous access to blocks from multiple systems. Before "unlocking" blocks, caches are flushed, returning stored data to a consistent state. I/Os are serialized similar to the way that a DBMS (DataBase Management System) Software that controls the organization, storage, retrieval, security and integrity of data in a database. It accepts requests from the application and instructs the operating system to transfer the appropriate data.  controls access to database records.

Channel Failover

Fibre Channel was designed for high availability environments requiring fault tolerant The ability to continue non-stop when a hardware failure occurs. A fault-tolerant system is designed from the ground up for reliability by building multiples of all critical components, such as CPUs, memories, disks and power supplies into the same computer.  links. The new generation of RAID controllers implement fiber interconnects on the front-end (to servers) and the backend (to disks). This enables redundant I/O paths to servers and to disks. In dual active controller configurations, a pair of Fibre Channel Arbitrated Loops connects the disks to both controllers. If a loop failure prevents access over one of the channels, the controllers use the surviving loop to access the entire disk pool.

Kevin Smith is the senior director of business management and marketing for external products at Mylex (Fremont, CA).
COPYRIGHT 2000 West World Productions, Inc.
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2000, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.

 Reader Opinion

Title:

Comment:



 

Article Details
Printer friendly Cite/link Email Feedback
Author:Smith, Kevin
Publication:Computer Technology Review
Date:May 1, 2000
Words:1311
Previous Article:InfiniBand Gathers Powerful Support.
Next Article:Mirroring Your Way To A Fault-Tolerant Storage System Beyond RAID 5.
Topics:



Related Articles
Time To Consider Fibre-To-Fibre RAID Controllers.(Industry Trend or Event)
Make Sure Your CLUSTER Passes MUSTER.(Microsoft Cluster Server network software)(Product Support)
RED HAT UNVEILS LINUX CLUSTERING SOLUTION.(Product Announcement)
Directors: The Enterprise SAN Building Blocks.(Technology Information)
LINUX NETWORX TO UNVEIL ICE BOX CLUSTER APPLIANCE AT SC2001 TRADE SHOW.
Employing IP SANs for Microsoft exchange deployment.(Special SAN Section)
Array Networks debuts new TMX family of Application Front End (AFE) appliances.
NEC ships first Itanium 2 processor-based blade server.(Express5800/1020Ba Blade, NEC Solutions America)
Achieving simplicity with clustered, virtual storage architectures.(Storage Clustering)
SUNY at Buffalo Center of Excellence blazes new trails in scientific research with high performance computing clusters and extreme...

Terms of use | Copyright © 2009 Farlex, Inc. | Feedback | For webmasters | Submit articles