Maintaining quality of service for WAN storage over IP.
IP is a mature, widely used network protocol that's emerging as the network of choice for storage-over-WAN applications. It wasn't too many years ago when IP was used only for file sharing and messaging between users. As more open systems platforms were deployed and IP enhancements were implemented, IP began to become the network of choice for data transfer to remote peripheral applications (even mainframe-based). Now, some of the largest banks in the U.S., as just one example, run their entire banking operations via IP networks. Until recently, the exception to this has been business continuity (disk mirroring) and disaster recovery (tape backup/restore) applications. These large, block-oriented, synchronous/semi-synchronous applications traditionally had to be tightly coupled directly to the processor owning the data. But improved QoS is now enabling widespread use of IP networks for wide area storage applications.
[FIGURE 1 OMITTED]
This article should give you a better understanding of the issues involved in Fibre Channel over IP QoS for storage applications, and introduce you to solutions that address these issues.
If your organization uses a dedicated, private network, there are distinct steps you and your consultants can take to design your network and select the right products that will enable you to manage these QoS issues. You should take care to design your network down to the application level--which is a big topic all by itself (and thus beyond the scope of this article). In particular, you should consider how you can design your network to use the least amount of bandwidth--the most expensive part.
If you use a public network, the QoS issues we discuss here are largely out of your control. You're guaranteed a certain QoS in the service level agreement (SLA) you have with your provider. However, by understanding the issues presented here, you'll be in a much better position to negotiate that QoS in the first place. (In other words, forewarned is forearmed!) For example, storage applications require less than 1% packet loss, and you will find that many providers simply cannot provide that level of service.
Storage Traffic Is Demanding
Applications such as disk replication and remote tape backup are high-speed streaming applications that can generate data streams exceeding 100 megabytes per second (MBps). When translated into networking terms, that's approximately 800 megabits per second (Mbps). Network speeds can be virtually unlimited, but the costs can be very prohibitive to end users, who typically have 3-5 year contracts for bandwidth that require monthly payments. Costs vary depending on the parameters the application requires. Some of these parameters include network speed (in Mbps), length of network route, packet loss, and jitter to name a few. All of these parameters are considered when defining a service level agreement (SLA), to which the provider must adhere or the end user can impose penalties.
QoS Issues for Storage Over IP Networks
There are several issues any enterprise IT organization should be aware of to sustain high-speed storage over an IP network. In large part these issues are manageable, or are becoming more so, as new advances in storage networking are developed.
Networks that people believe are error-free do, in fact, have errors. In the world of IP, lots of things can cause network impairment or stress when it comes to wide area storage networks, including packet loss, jitter, and latency. Packet loss, for example, is common in any discussion of IP networks--in fact, it's a long-known IP fact-of-life. But what some might consider "normal" levels of packet loss are simply not acceptable when it comes to storage applications over wide area networks.
The existence of packet loss does not negate the viability for storage traffic over IP. There are technologies to reduce the impact of packet loss on storage applications by performing error recovery and retransmission at the IP packet level, resulting in less retransmitted data and quicker recovery. Taking advantage of such technology is key to successful storage application performance, especially when bridging the wide area network.
Data Integrity Is Critical
One of the most important issues regarding using IP networks for storage over wide area networks is reliable data transfer, or data integrity. Making sure the data is received and delivered intact creates a strong component of QoS, even though it is not associated with "standards" on IP networks. (One must remember that there is a QoS that has to transfer from the server HBA, through the fabric and the WAN network, and back again.)
IP networks are known for operating in a "send and forget" mode. That is, the sending device assumes the data will arrive at its destination, so it moves on to the next task without checking on the success or failure of what was sent previously. Error recovery in this case is usually an email from the intended recipient asking that the data be resent.
For storage over WAN, this is obviously unacceptable. The integrity of the data transfer is tightly coupled to the application, wherein the application has preset timers that tick away waiting for a response from the storage device. If the timer expires, the application will resend the data block. If network errors occur in large numbers, the performance of the application will drop dramatically. If errors to a specific storage device are persistent enough, the application may flag the storage device as being "dead" and no longer attempt data transfers to it--not good!
To solve the issue of data integrity, a technology called "cyclical redundancy check" (CRC) is employed. In order for CRC checking to be effective in a networked storage environment, the storage router must calculate the CRC as the data block is being received from the sending device and append the CRC to the data block for transfer across the network. The receiving storage router then calculates a CRC on its own and compares it to the CRC in the data block. If they match, data integrity is assured. If they don't match, an error is flagged, the data block is discarded, and a retransmission of the original block takes place from the sending device. The CRC process creates an extremely high level of data integrity assurance.
The most advanced storage routers available today (such as those available from my company, CNT) perform error recovery and retransmission at the IP packet level, resulting in much less retransmitted data and much quicker recovery. CNT's solution also uses an adaptive recovery algorithm that accommodates the kinds of loss common in most IP networks. So, while packet loss will reduce overall throughput, its impact can be made much smaller compared to other forms of recovery or other storage networking products.
In our extensive experience in storage networking at CNT, we've found the threshold of packet loss to be a maximum of 1% for an effective storage network when using our products and technology. Less than 1% packet loss can usually yield a workable storage network. More than 1% causes more packet error retries and its accompanying latency variability than is viable for most storage applications.
An Important New QoS Capability: Rate Limiting
While both packet loss and data integrity are critical factors that can affect QoS, another factor, as mentioned previously, is the tendency of storage applications to "hog" bandwidth on a shared network resource. Storage applications can generate intense data rates--so much so that, if allowed, storage traffic can consume all the network bandwidth to which it is connected.
An IP technique that provides users the ability to utilize shared network capacity with multiple applications (or users) is called "rate limiting." It is specifically useful in storage over shared IP networks.
[FIGURE 2 OMITTED]
[FIGURE 3 OMITTED]
SLAs needed for high-end data traffic applications can increase costs for some organizations beyond their means. However, if the network resource can be shared among several applications, it becomes an expense that is more manageable, since it can be spread across several IT initiatives, users, or applications.
If a single network resource can be "allocated" or sized according to the different application needs, it can suffice for those multiple applications. However, when storage is introduced, that single network resource, which may share web access, client/server traffic, and FTP traffic, could be easily "run over." That means there is too much traffic and the network circuit becomes over-subscribed. When a circuit becomes oversubscribed, some if not all of the applications will have trouble completing, if they ever do in fact complete.
Rate limiting is a parameter that can be set at an end point to limit the amount of network capacity a given application can use. So, for instance, if a single 1Gbps IP resource is supporting three applications that would consume its capacity in normal usage, each application would be assigned only that amount of resource it is "sized" to use. Using this technique, then, provides a quality of service and SLA for each individual user (or application). Rate limiting therefore protects the network from being oversubscribed when storage traffic is introduced.
Using rate limiting will augment the ability for users to apply multiple layers of QoS for expensive, shared-network resources. Figures 1 and 2 illustrate how assignment of network capacity can be divided within a single network resource.
Storage Router Advancements
Several key technologies have been developed over the years to deal with network impairment issues in wide area networking in the mainframe environment. Recently, CNT has incorporated many of these features into a "new, flexible, and highly manageable platform, the UltraNet Edge 3000 storage router, making these capabilities available for the first time to the broader storage networking market.
This new storage router platform offers user several advancements to enhance QoS:
* The ability for applications to share bandwidth--which is, by far and away, the most expensive aspect of any wide area storage network
* Rate limiting, as discussed above
* Client traffic prioritization, which provides user-specified application priority for network access
The new management capabilities of this storage router help IT organizations drive down the cost of their storage infrastructure while meeting service level agreements to their internal customers. By far, the single largest cost of a remote storage solution is the cost of bandwidth. The CNT UltraNet Edge 3000 minimizes that investment by means of state-of-the-art hardware compression technology. Via the GUI, customers can allocate portions of their available bandwidth to specific applications during production hours, or during different times of day, ensuring quality of service across a shared network. These new capabilities help IT organizations reduce storage infrastructure costs while providing required data availability and protection. In essence, this new storage router technology enables what we call "flexible QoS."
Storage QoS Can Be Effectively Managed
While IP service providers have made gigantic improvements in the quality of service in their backbone networks, there is still more improvement to be made. For e-mail or Web browsing, the quality is pretty good: for more demanding high-performance storage networks, it pays to be careful.
Network traffic for daily client/server communications is usually sparse, intermittent and of brief duration, and is thus relatively impervious to congestion, packet loss, latency instability, and other issues commonly inherent to IP networks.
Storage traffic, on the other hand, is comprised of large chunks of densely packed data moving over long periods of time. These characteristics make storage traffic susceptible to network inconsistencies that can dramatically decrease data throughput. If affected storage traffic shares the same network with normal IP communications, then both types of network traffic are negatively impacted.
Most storage applications are more bandwidth intensive and latency intolerant than other applications that can use IP circuits. Applications such as OLTP databases simply cannot tolerate the latency and jitter that web or e-mail users consider common.
But, by taking advantage of the latest storage networking technology, as discussed above. IT organizations can ensure a predictable, manageable quality of service for their wide area storage applications.
Brian Larsen is senior director of Connectivity and Extension Products at CNT (Minneapolis, MN)
|Printer friendly Cite/link Email Feedback|
|Title Annotation:||Storage Networking; wide area network; Internet Protocol|
|Publication:||Computer Technology Review|
|Date:||Feb 1, 2004|
|Previous Article:||Virtualization's new voice: virtualization plays an important role in an overall data management strategy.|
|Next Article:||Heterogeneous SANs: the "Circe" of storage.|