Improving e-Business Server Availability.
High Reliability Through Redundancy
Single-home servers (with one interface to the network) represent a single point of failure that significantly affects overall server availability. Enterprises use several approaches to improve reliability. The most common is to provide redundancy for the critical components in a server.
Better servers ship with RAID controllers that provide disk subsystem resiliency and include redundant power supplies and cooling systems. Unfortunately, after building in all that redundancy, customers often use only one adapter to provide connectivity to the network and if it fails or loses the link, all users lose connectivity. Customers seeking higher levels of reliability should be sure to invest in redundant teams of interface devices.
High Availability, High Performance
Use of redundant devices is far from the only way to enhance reliability. Advanced network driver functions can significantly improve the ability to provide uninterrupted service.
Managers implement these capabilities by using an intermediate driver for Microsoft Windows NT 4.0, Microsoft Windows 2000 Server, Novell NetWare 5.X or Red Hat Linux. Three popular options include link aggregation, load balancing, and fault tolerance.
Link aggregation is a method of combining multiple physical network links into a single logical link. For example, two Network Interface Cards (NICs) can be combined into one team or group. The network and software running across it will perceive these two NICs as one virtual connection. If one goes down, the other can still handle the traffic.
When one or more channels in a group fail, the software automatically detects the failure and rebalances the traffic across the remaining links without a loss of data. After someone restores the failed link, the system automatically reconfigures to use all active network links. This load balancing is transparent to the end user who experiences no downtime.
A fault tolerant NIC team eliminates the single points of failure. The fault-tolerant team provides dynamic failover access across multiple redundant connections to the network. When a bad cable, a lost link, or a failed adapter causes a failure on the primary network interface device (NID) link, the intermediate driver software will switch to the secondary adapter.
Industry-standard servers, such as the PowerEdge family of products from Dell, support a number of vendor-proprietary NIC implementations for link aggregation, load balancing and fault tolerance. These include Cisco Fast EtherChannel and Gigabit EtherChannel, Intel Advanced Network Services, 3Com Dynamic Access, and Alteon Fault Tolerance. Additionally, the IEEE approved the 802.3ad port aggregation standard in March 2000. This standard offers increased bandwidth and failover between links in a group of devices and is expected to be adopted by all vendors, ensuring interoperability.
Users can realize increased throughput and availability with the use of real-time automatic load balancing and failover. The amount of bandwidth that users can add depends on the number of NIC ports and PCI slots in the server.
Network Components. It is important to understand the role of key network components in maintaining connectivity and availability. The network interface device (NID) provides a connection point into the network. The data is then transferred from the transmitting station (a PC or a server) to the destination as specified by the destination's Media Access Control (MAC) address, which is a unique number programmed in the adapter when it's built.
A device driver is a program that translates between a device and the programs that use it. The driver software connects an NID to the network protocols. The computer can then use an NID to send and receive data over a network. The OS defines the interface between the network protocol and the driver. Microsoft systems use the Network Driver Interface Specification (NDIS). These drivers support the use of multiple protocols within one system.
Miniport and Intermediate Drivers. The NDIS miniport driver is a family of networking driver standards that includes LAN, WAN, and intermediate driver standards. They use the interfaces and functions provided by the NDIS wrapper, which performs common processing, while the miniport driver handles hardware-specific interactions.
The NDIS intermediate driver processes data from the network protocols.
Load-Balancing Methods. Users can implement network load balancing on a server to control only the outgoing traffic or both outgoing and incoming information and can select different algorithms to balance the traffic. The algorithms include round robin, MAC address, IP address, and IP address and TCP port address.
Round-Robin Algorithm. In a round-robin implementation, the intermediate driver selects an NID port for each packet, starting with the first port in the network group. The next packet is sent over the following port, and so on. Then the round robin starts over again with the first NID after the last NID port in the group is used. Round robin is a simple algorithm, but it guarantees that the traffic load is equally distributed across all the network links while minimizing CPU processing.
MAC Address Algorithm. An alternative to the round robin technique is the use of the MAC, or Layer 2, address. This algorithm communicates with the MAC to check for errors and identify the NIC. All frames reach the destination in order since all frames in a session go out over the same link.
However, the algorithm balances MAC addresses rather than traffic and the load may not be equally balanced across all links. In theory, one link could reach 100 percent utilization while other links in a group have low utilization. In practice, most clients in a client/server environment use comparable amounts of bandwidth when connecting to the server with teamed NIDs.
IP Address Algorithm. The destination IP, or Layer 3, address can be used instead of the MAC address as a port selection method for outbound transmission. The benefits of channel assignments based on the IP address are that client sessions across a router will be assigned to different ports in a team, and packets will reach the client in order.
IP Address and TCP Port Address Algorithm. The TCP port address of the packet can be used in addition to the destination IP address to ensure that different sessions from a client are assigned to different ports in the team. As a result, the IP clients as well as the application socket layer share the load equally.
Grouping a primary adapter and one or more secondary adapters in a logical or virtual team of adapters creates a fault-tolerant team. Failover from the primary to the secondary adapter requires that the secondary adapter take the MAC address of the failed adapter. Failover automatically happens when the system can no longer detect activity or a link on the primary adapter. The failover time depends on the time the NIC takes to switch MAC addresses. For effective fault tolerance, this time must be fast enough to prevent application session timeouts.
Multiple NICs in servers offer the benefit of traffic segmentation and failure isolation. However, this configuration is not fault tolerant and cannot scale easily. In addition, no bandwidth is available beyond what each NIC can provide. To use multiple network ports in a server more effectively, an enterprise can create a logical or virtual adapter by grouping together multiple physical adapters linked by an intermediate driver. The software stack in the OS treats such teams as one logical adapter. If a link or physical adapter fails, traffic dynamically rebalances over the remaining adapters on a load-balancing team or shifts from the primary adapter to the secondary adapter for a fault-tolerant team.
Highly Available Network Strategy
Clearly, high availability and high performance in the IT infrastructure are mandatory. The IT strategies for a highly available network include network link aggregation, load balancing, and fault tolerance, especially on the server. Link aggregation scales the available bandwidth by grouping multiple physical links together to form a single logical or virtual link. The redundant links provide both load balancing and fault tolerance. Traffic flows are redirected around a failed MC or cable without interrupting applications.
Rich Hernandez is a senior engineer with the Server Networking and Communications Group within Dell's Enterprise Systems Group (Round Rock, TX).
|Printer friendly Cite/link Email Feedback|
|Title Annotation:||Technology Information|
|Publication:||Computer Technology Review|
|Date:||Feb 1, 2001|
|Previous Article:||Applications For Shared Data Clusters.|
|Next Article:||Is It Prime Time For TCO?|