Consolidate storage without losing control.
Booming Despite the Bust
While other technology markets take a "sit tight" approach, those of us in the SAN business are enjoying a relatively aggressive market, where investments in networked storage are translating directly into corporate advantages. Where customers are spending on technology, there's a good chance the motivation is to accommodate ever-expanding storage and related connectivity and management. As the migration from direct-attached storage to network-attached storage continues--as well as the expansion of existing SANs--challenges are arising related to size and scope, marked by a shift from deployment-oriented to production-oriented issues.
Now that SANs are becoming widely deployed, the challenge is how to make them scale, meet a broader set of requirements, and continue to improve on the initial consolidation and utilization ROI. One of the most realistic ways to measure a SAN's success is to gauge the day-to-day performance of the application(s) that the SAN supports. Most storage administrators are bound (albeit sometimes quite loosely) to the availability and performance of the business-level application and/or service that the SAN hosts. In the data center, SAN performance is typically expressed through a number of metrics taken directly from the SAN devices themselves, which translates roughly to the overall effectiveness of application-level support. SAN device performance metrics may come from the host, such as CPU utilization, from the storage array, such as disk-seek times and from the network, such as port utilization statistics. Traditionally, much more administrator time is spent evaluating the performance at the host and array, where much more tangible metrics are available, while the network has yet to be considered, either a suspect for degradation or as an opportunity for overall improvement. The main reason for this is the limited amount of useful network performance information available in most SANs, and the false impression that a network has latent bandwidth when it is in fact congested, degrading application-level performance.
New Focus on Congestion
All networks are inherently susceptible to congestion. When the purpose of a network is to allow for the communication among devices that share resources, this in itself presents the opportunity for congestion. This is especially the case in SANs, where there are typically many hosts (or initiators) communicating with relatively fewer storage ports (targets). Many are under the false impression that a network built with a switch or switches that are "non-blocking" with the ability to service all ports at line rate, will never be the culprit when overall SAN performance falters. But no amount of raw switch performance can overcome the fundamental contradiction that arises from two initiators requiring access to the same target. The problem of network congestion is only now becoming more apparent, as SANs continue to stretch their limits to scale and virtualization and blade-server computing become more popular. As storage resources are virtually pooled together and CPUs clustered, it will seem that all resources are virtually shared, aggravating the likelihood of congestion. Enterprise Storage Group founder and senior analyst Steve Duplessie has said that, "Traffic congestion is not talked about enough in the storage network market, but it's a very ugly secret that will rear its head at the worst possible time as networks get bigger."
Most network protocols, including Fibre Channel, use standards-based methods for media access and the transmission of information among systems. Fibre Channel protocol, the dominant in storage network fabrics today, uses a very carefully architected system of fabric device communications and credit-based transmissions in order to ensure network stability and traffic integrity. Unlike Ethernet/IP networks, when presented with the potential for congestion, particularly in the new "any-to-any" SAN model, with more traffic from inbound ports (i.e. initiators) than a destination port (i.e. target) can handle, Fibre Channel switches do not drop frames. Instead, the network uses a "back-off" technique to squelch all inbound traffic bound for the same destination to a level where the target can accommodate all requests.
Translated to the device level, typical switches equally share storage mapped to a storage port across all servers requiring access. Even conventional director-class switches with Virtual Outbound Queue (VOQ)-based architectures manage all traffic bound for the same outbound port from the same queue. This results in a lower aggregate throughput for the switch, and the false impression that the network is "underutilized." Even where administrators note a port utilization of, say 160MBps on a 200M Bps link and consider the network underutilized, there is likely network congestion holding back the performance of SAN applications. This presents a dilemma for managers of growing storage networks who must yield the best performance possible from an infrastructure meant to consolidate resources and increase utilization levels.
The Right Kind of Smarts
In response to the fundamental and lurking problem of congestion are new levels of intelligence that offer visibility and control within the storage network in order to both decipher where network congestion is compromising application performance as well as dictate which resources will receive what level of network service. While talk continues about intelligent fabrics and switches, much of what that intelligence refers to is merely the integration of storage services such as replication, copy and even such heavy lifting tasks as volume management. These integrated services are a compelling consideration, and another topic entirely, but they don't add real network-level intelligence in the classic sense. On the other hand, solutions are beginning to emerge built around a similar VOQ-based switching core, but with real connection-level intelligence necessary to both present traffic patterns through management and offer control over network resources such as bandwidth.
Architecturally, true connection-level intelligence in the SAN requires a separate queue for each connection, rather than simply for each outbound port. Consider, for example, a configuration where three servers, A, B and C are all communicating with storage array 1. When the total aggregate bandwidth on the switch port connected to array 1 is exceeded, the available bandwidth to all three servers is likely to be compromised. This can be especially risky when a performance-sensitive application is impacted by a bandwidth heavy (even temporarily), but otherwise non-business critical application. It is possible to avoid these situations by isolating the application environments in separate SANs, or even VSANs (virtual SANs). However, the greatest storage network value comes from consolidation, increasing the likelihood of a many-to-one network profile. Also, as mentioned earlier, the trends towards the abstraction of storage due to virtualization will only make it more difficult to physically arrange the network configuration in order to protect from congestion and deliver the necessary application service levels.
Building in connection-level intelligence dramatically changes the way traffic in this example is managed by the switch, creating an opportunity for the network administrator to not only view traffic patterns in the context of each individual connection (instead of merely aggregate, port-based statistics), but also dictate how the switch should service each connection. The administrator can, in effect, create very specific service levels for network services in order to ensure necessary application-level performance requirements. The administrator of such a connection-oriented storage network would be able to analyze statistics for each connection A-1, B-1, and C-1, rather than the aggregate statistics for the storage port 1. With this insight into the network connectivity profile, and an understanding of historical traffic patterns and usage requirements, the administrator can take control of individual switch resources, such as bandwidth, and allocate resources on a per-connection basis. If server C is hosting an application, or running a SAN service that is performance-sensitive and requires a minimum bandwidth of 80MBps any time it reads (or writes) from storage port 1, a SAN solution based on connection-oriented switching with bandwidth control would guarantee that bandwidth as a network service. So, with connection-level intelligence in the network, the SAN not only can differentiate between the traffic from multiple sources within the same port, but also has the ability to treat each stream differently. Once this basis for connection-level intelligence has been established as a core capability fabric-wide, an administrator can introduce many other dimensions, such as time-of-day settings to alter the network profile based on the changing enterprise priorities on a regular pattern or on-the-fly to react to an urgent corporate priority.
Storage integrators are beginning to realize that network connectivity should no longer be relegated to a SAN project line item, but should represent ways in which proposals can be unique. A SAN built on embedded connection intelligence contributes value throughout the lifecycle of the SAN. In the proposal stage, integrators can confidently configure a solution and back it up with performance service levels. An integrator's proposal that includes network service-level guarantees (including and surpassing standard industry guidelines for disk allocation, memory utilization, over-subscription ratios, etc.) empowers customers to confidently invest in a solution that with more than adequate service levels to meet any corporate mandate for SAN service. Connection intelligence also allows routine evaluations using network performance profiles that indicate opportunities for performance improvements as well as pending requirements for updates to the configuration. Often, storage integrators will evaluate the current state of the network as they prepare to add disk, expand volumes or introduce software that can impact traffic patterns. Gaining intelligent insight and taking control saves integrators from over-provisioning the network (usually in the form of extra hardware) and creating unnecessary capital and management costs.
As the storage network cloud continues to clear, new solutions will continue to transform the SAN from mere "storage space" into a far more strategic network resource capable of powering high-performance applications on a global scale. At the same time, connection-level intelligence is exposing the susceptibility to congestion inherent in every SAN. VARs and storage integrators that build the right kind of intelligence into the storage network will provide a foundation for far more valuable storage virtualization, tiered storage and network-hosted storage services in the long run.
Eric Blonda is director of product marketing for Sandial (Portsmouth, NH)
|Printer friendly Cite/link Email Feedback|
|Publication:||Computer Technology Review|
|Date:||Sep 1, 2003|
|Previous Article:||Data replication considerations.|
|Next Article:||Occasionally connected computing architectures.|