Directors: The Enterprise SAN Building Blocks.They provide more availability and bandwidth than an aggregation of fabric switches The promise of Storage Area Networks (SANs) is to implement high availability Also called "RAS" (reliability, availability, serviceability) or "fault resilient," it refers to a multiprocessing system that can quickly recover from a failure. There may be a minute or two of downtime while one system switches over to another, but processing will continue. storage enterprise-wide. Connectivity for SANs is not limited to servers, departmental subsystems, and workgroup clusters, but to the wide area network and metropolitan area network as well. SANs fit the need for enterprise storage connectivity because they allow for both performance today and growth tomorrow. However, the full promise of the SAN can only be delivered by a director, a storage network switch that provides enterprise-level high availability, scalability, performance, and management. A switching device that cannot deliver on all four criteria consistently--even during component failure--is not a director. The Director Solution A sort of "uber-switch", the director-class switch A fault-tolerant Fibre Channel switch that typically has a high port count and may serve as a central switch to other fabrics. See Fibre Channel. enables the enterprise SAN. A director in the center of the network allows core-to-edge SANs to benefit from 99.999 percent uptime while scaling equally in both performance and connectivity as devices are added or reconfigured on the fabric. A director simplifies the design of the fabric by providing more availability and bandwidth than a complex aggregation of fabric switches can provide. In theory, relatively large fabrics can be configured con·fig·ure tr.v. con·fig·ured, con·fig·ur·ing, con·fig·ures To design, arrange, set up, or shape with a view to specific applications or uses: by connecting multiple, low-port count fabric switches. But they require many interswitch links that create inherent bandwidth bottlenecks in the enterprise SAN backbone. They rapidly increase overall SAN complexity and latency (1) The time between initiating a request in the computer and receiving the answer. Data latency may refer to the time between a query and the results arriving at the screen or the time between initiating a transaction that modifies one or more databases and its completion. which can cause application failures that are difficult to pinpoint. An appropriate analogy would be comparing the relationship between controller-based RAID and JBOD (Just a Bunch Of Disks) A group of hard disks in a computer that are not set up as any type of RAID configuration. They are just a bunch of disks. JBOD - Just a Bunch Of Disks . IT departments invest in fully protected RAID storage products because, like directors, they deliver higher availability, scalability, and performance. JBOD, like fabric switches, are appropriate for less critical device connections away from the core of the fabric. In order to accommodate enterprise storage growth, directors offer aggregate throughput in excess of 3,200 MB/sec, which is the minimum level of bandwidth required for an enterprise SAN backbone. The 800 or 1,600 MB/sec provided by multiple fabric switches are not high enough to deliver the level of service required to construct a core enterprise backbone. Director Class Criteria In order to be sure that your switch is director-class requires some analysis. High availability is basic, and high availability can be defined as "five-9s" (99.999%) uptime. This works out mathematically to [less than]5 minutes of downtime The time during which a computer is not functioning due to hardware, operating system or application program failure. per calendar year without sacrificing performance (Table 1). Availability at this level cannot be achieved by linking fabric switches, which have too many possible points of failure to achieve a 99.999% level of availability. Director-level availability requires redundant power and cooling, fully hot-swappable components, redundant logic with automatic failover Invoking a secondary system to take over when the primary system fails. Up-to-date copies of all required data and applications are maintained on the secondary system in order to respond immediately if the primary system becomes unusable. Also called "fallover." See replication. , and, finally, non-disruptive concurrent code load and activation, the highest level of availability. Directors differ from fabric-switches not just in terms of port count. The simple redundant power and cooling found in some fabric switches are not enough to ensure high availability. The fault tolerant The ability to continue non-stop when a hardware failure occurs. A fault-tolerant system is designed from the ground up for reliability by building multiples of all critical components, such as CPUs, memories, disks and power supplies into the same computer. features of a director make it the only choice for enterprise backbone SAN connectivity. One of the most important differences with a director is automatic failover. Prior to switched SANs, the impact of a failure was typically limited to a single server or a small, homogeneous The same. Contrast with heterogeneous. homogeneous - (Or "homogenous") Of uniform nature, similar in kind. 1. In the context of distributed systems, middleware makes heterogeneous systems appear as a homogeneous entity. For example see: interoperable network. cluster. In a switched SAN fabric, however, a large number of heterogeneous servers can be affected by a switch failure. For example, a 16-port fabric switch experiencing a motherboard Also called the "system board," it is the main printed circuit board in an electronic device, which contains sockets that accept additional boards. In a desktop computer, the motherboard contains the CPU, chipset, PCI bus slots, AGP slot, memory sockets and controller circuits for the failure could trigger path failover on 12 or more servers. The failure would be even more invasive in a multi-switch fabric. A director is designed for continuous operation with minimal performance degradation at all times. All critical components are redundant with automatic failover and can be repaired through hot swapping (hardware) hot swapping - The connection and disconnection of peripherals or other components without interrupting system operation. This facility may have design implications for both hardware and software. while switching operations continue at full performance levels. For example, if a control processor, memory module or message path controller falls, the director continues to function with no performance degradation and the failed component can be replaced while switching operations continue as usual. And even if a port card fails, the performance impact is limited to the four ports housed on that card, which can be hot swapped To pull out a component from a system and plug in a new one while the main power is still on. Also called "hot plug" and "hot insertion," hot swap is a feature of USB devices, allowing an external drive, network adapter or other peripheral to be plugged in without having to power down the for fast repair. If a component fails in a typical fabric switch, the entire switch must be removed and repaired or replaced. This is unacceptable due to the downtime involved. Even if the failure can be isolated to a single port circuit, the entire fabric switch must be taken out of service for repair, meaning all ports are offline for the duration. Finally, the director meets the high-availability yardstick defined as SAN Quality of Connection Class 5. SAN Quality of Connection (QoC) guidelines guidelines, n.pl a set of standards, criteria, or specifications to be used or followed in the performance of certain tasks. , established by Strategic Research Corp., provide a way to institute data-service guarantees through levels of simple and concise storage architecture definitions (Table 2). Due to the business-critical nature of an enterprise SAN, only QoC Class 5--which mandates no single point of failure--meets the high connectivity and performance requirements. While multiple, linked fabric switches are cost effective solutions for workgroups and departments that are not running mission critical applications, fabric switches, even multiple fabric switches, by definition, can only reach QoC Class 3, 99.9% available--a potential of 5,000 path-minutes of downtime per year. Multiple fabric switches are not robust enough to form an enterprise SAN backbone. Scalability. A director class switch allows users to optimize all available ports on the system. A single director scales from four to 32+ ports, providing more available ports than a fabric switch. In a typical configuration, a 32-port director could be used to connect up to 14 dual-ported servers to a large RAID or tape library. This offers the attached devices full duplex (Computers) arranged so that the information may be transmitted in both directions simultaneously; - of communications channels between computers; contrasted with Passive and Active Although resistors may be used, a "passive" backplane adds no processing in the circuit. . Multiple directors can also be attached to build higher port-count backbone fabrics while attaching into departments and workgroups via loop switches, thus preserving backbone bandwidth and integrity. Multiple fabric switches, due to their low port counts, sacrifice connectivity, performance, or port count as the enterprise SAN grows. In fact, fabric switches can lose half their ports and bandwidth to interswitch links when they are clustered to create higher -port-count fabrics (see Figure). Performance. Director-classes must perform in failure conditions. This is possible in a 32-port director switch rather than a 16-port fabric switch since the director does not suffer from the interswitch congestion The condition of a network when there is not enough bandwidth to support the current traffic load. congestion - When the offered load of a data communication path exceeds the capacity. that multiple fabric switches incur. Directors are designed for direct data connectivity from port to port with predictable latency of less than two microseconds. With multiple fabric switches, data must repeatedly travel from switch to switch via interswitch links. Even products marketed as directors that are actually a multi-stage cluster of fabric switches in a single chassis suffer from this type of increased and unpredictable latency. Current directors provide 3,200 MB/sec or more of non-blocking aggregate bandwidth and the highest level of performance available. Even without taking into account high availability issues, clustered switches can only achieve 800 or 1,600 MB/sec. Enterprise-wide Management. In order to realize the promise of a SAN, a director must also work with enterprise-wide, fabric management software. A centralized cen·tral·ize v. cen·tral·ized, cen·tral·iz·ing, cen·tral·iz·es v.tr. 1. To draw into or toward a center; consolidate. 2. management infrastructure that manages high-end directors, fabric switches, and loop-connectivity devices permits enterprise SAN management administrators to remotely manage all components of the SAN from one console. IT administrators can set up, configure See configuration. (software) configure - A program by Richard Stallman to discover properties of the current platform and to set up make to compile and install gcc. Cygnus configure was a similar system developed by K. , and control all of the interconnected directors and fabric switches in the enterprise, saving hours of administration time and cost. Enterprise SAN management software can also deliver high levels of access and security--including allowing multiple authorized users authorized user Radiation physics A person who, having satisfied the applicable training and experience requirements, is granted authority to order radioactive material and accepts responsibility for its safe receipt, storage, use, transfer and disposal to change and configure zones and zone sets throughout the fabric--without fear of data corruption Data corruption refers to errors in computer data that occur during transmission or retrieval, introducing unintended changes to the original data. Computer storage and transmission systems use a number of measures to provide data integrity, the lack of errors. . Enterprise SAN management software must allow for centralized management and nondisruptive service of the SAN through monitoring and user configurable failover parameters. It also allows for single console management--either through a local console A terminal or workstation directly attached to the computer or other device that it is monitoring and controlling. or remotely. Remote management frees the administrator from geographical restrictions for activities such as downloading software updates, and viewing and using diagnostic data. Enterprise SAN management software must allow IT departments to proactively manage fault and error conditions so they can be isolated and corrected as quickly as possible. Phone home and e-mail notification must happen even before the IT department knows that there is a problem. Management software must also feature efficient fault isolation to help avoid downtime and minimize enterprise impact. Finally, it must be interoperable The ability for one system to communicate or work with another. See interoperability. with any SNMP-based tool as well as enterprise management applications. The management software dictates the quality of SAN security. Security is a significant concern as SANs by their definition enable the consolidation and sharing of resources. Ideally, directors should support three levels of security for "bullet proof" SAN operation. LUN (logical unit number) security is the lowest level of physical security currently implemented. LUN level security is the ability for the storage device to physically limit host access only to servers connected to a physical storage port. This prevents servers from accessing storage resources not allocated to them, which could potentially corrupt valuable data. Fabric level security, more commonly referred to as zoning, is the ability of the fabric elements (switches and directors) to selectively limit the access of devices connected to the fabric. Zoning is somewhat analogous analogous /anal·o·gous/ (ah-nal´ah-gus) resembling or similar in some respects, as in function or appearance, but not in origin or development. a·nal·o·gous adj. to the concept of "virtual LANs Also called a "VLAN," it is a logical subgroup within a local area network that is created via software rather than manually moving cables in the wiring closet. It combines user stations and network devices into a single unit regardless of the physical LAN segment they are attached to and " in the networking world where a fabric can be logically subdivided into multiple sub-fabrics. The highest level of security is application level security. This level typically consists of a middle-ware application residing between the application and I/O (Input/Output) The transfer of data between the CPU and a peripheral device. Every transfer is an output from one device and an input to another. See PC input/output. I/O - Input/Output driver that provides for the coordination and assignment of storage resources between multiple applications. This approach is software vendor specific and requires that all servers in the SAN use the same security software package. Edge Connectivity And Management A single director or multiple directors enable efficient connections at the core of the fabric. Fabric switches and loop switches provide connectivity to the edge of the fabric. These are optimized for connecting storage devices on a Fibre Channel loop into high performance switched fabrics without compromising the performance or protocol integrity of the fabric. Edge switches provide loop configuration flexibility and scalability options. They integrate loop connectivity into a large, switched fabric SAN. Using such switches, a director can also extend enterprise SAN management to the edge devices it controls, allowing for seamless fabric management and enterprise-wide control. Director Criteria At-A-Glance * 99.999 percent availability-equivalent to downtime of five minutes a year or less and necessary for all e-business and business-critical applications. * Fully redundant logic/firmware which includes automatic failover. * QoC Class 5 standards, providing a level of connection and performance that meets the standards for enterprise data delivery. * Fully hot swappable See hot swap. components-including port cards-with no network disruption disruption /dis·rup·tion/ (dis-rup´shun) a morphologic defect resulting from the extrinsic breakdown of, or interference with, a developmental process. * Non-disruptive code loading and activation. * Proactive, enterprise-wide fabric management. * Full duplex, non-blocking scalability from 3,200 MB/sec of non-blocking aggregate bandwidth * Interoperability The capability of two or more hardware devices or two or more software routines to work harmoniously together. For example, in an Ethernet network, display adapters, hubs, switches and routers from different vendors must conform to the Ethernet standard and interoperate with each other. with other storage management software and framework management applications. * Security that protects data integrity. Mark Henderson This article is about the lighting designer Mark Henderson. For the snow plow driver in the Snow Plow Game, see History of the New England Patriots. Mark Henderson is a Tony Award winning lighting designer. is the senior product manager at McDATA (Broomfield, CO).
Table 1
Acceptable
Class Availability Downtime/Year
Class 1 90% 36.5 days, or unspecified
Class 2 99% 87.6 hours
Class 3 99.9% 8.76 hours
Class 4 99.99% 52.56 minutes
Class 5 99.999% 5.256 minutes
Classes of Availability
Table 2
Acceptable
Availability Downtime/Year
Class 1 90% Unspecified
Class 2 99% 50,000 path-minutes
Class 3 99.9% 5,000 path-minutes
Class 4 99.99% 500 path-minutes
Class 5 99.999% 50 path-minutes
Classes of Availability
[Graph omitted] |
|
||||||||||||||||||||

Printer friendly
Cite/link
Email
Feedback
Reader Opinion