Engineering challenges to storage system protocols: diagnosing problems involving multiple protocols presents complex engineering challenges.
The first noticeable trend in the storage market is its rapid migration from DAS to SAN-attached or network storage. By many estimates, about 70% of the current market is DAS, with about 30% externally attached systems. However, in four years those numbers are expected to reverse in favor of external-attached systems. From a test and measurement perspective, this evolution will pose significant challenges at the system level.
Another trend in the storage industry is the move from parallel interfaces to serial interfaces. Emerging technologies like Serial ATA (SATA) and Serial Attached SCSI (SAS) are coming alongside established Fibre Channel Arbitrated Loop (FC-AL) for storage device level interface. Networked storage interconnects are currently dominated by Fibre Channel Switched Fabric; however, SCSI over TCP/IP (iSCSI) is gaining momentum for low-cost solutions that do not require the high performance provided by FC fabrics.
FC is defined for 1, 2, 4 and 10Gbits/sec speed categories, and iSCSI uses the Ethernet physical layer with 1 and 10Gbits/sec speed categories. It is expected that when 10-Gigabit Ethernet is widely used, iSCSI will be competing for a larger share of the SAN connectivity market against 10-Gbits/sec Fibre Channel.
Serial connections in storage architectures make sense for several reasons. First, serial connectivity is more cost effective, using less copper on the backplane. More importantly, parallel connections become far less efficient as traffic rates increase, making serial connections more scalable. At high speeds, assembling each byte correctly in reference to the clock becomes more difficult, particularly for parallel connections where the clock is running on a separate line. Although there are techniques currently available for getting more speed from parallel connections, the long-term answer is to switch to serial interfaces.
Finally, InfiniBand was expected to make a big splash as a new storage protocol for IO connections as well as for clustering (server-to-server or CPU-to-CPU messaging). However, with the emergence of PCI-Express and faster Ethernet-based connections, the adoption of InfiniBand is continually declining. While InfiniBand will enjoy some acceptance, particularly by server companies using it for clustering, acceptance will likely be limited.
DAS Protocol Evolution
DAS has been in use for a long time as a cost-effective local storage architecture. DAS devices are usually configured in Redundant Array of Independent Disks (RAID) mode (either by hardware, in Host Bus Adaptors (HBA), or by software) for fault tolerance and better performance. Currently, there are two primary protocols used in this architecture at the disk device level: Parallel ATA (PATA) and parallel SCSI--and both are rapidly being replaced. For lower cost implementations, PATA is being replaced by SATA, and parallel SCSI is giving way to SAS.
Next generation SATA will support 3Gbits/sec wire rates yielding 300MB/sec maximum raw throughput, with future generations planned beyond 3Gbits/sec. A point-to-point host-to-target connection, SATA, can connect multiple devices using port multipliers. Although SATA does not inherently support dual-ported disk interface, several proprietary solutions are emerging. SATA-based RAID storage systems have already started shipping.
The latest specifications for SCSI, referred to as Ultra 320 (U320) supports 320MB/sec of maximum raw data throughput. A peer-to-peer interface U320 can support 16 devices on a single bus. For fault tolerance and shared data configurations, dual-ported disk devices are available. SCSI is defined for high performance applications.
SAS is challenging traditional SCSI. Standards have been planned for higher bandwidth connections up to 6Gbits/sec for the future. Also, the physical layer of SAS is defined as compatible with SATA 3Gbits/sec physical interface.
Both SAS and SATA devices can share a connectivity network using fan-out and edge expanders. SATA commands are transmitted over SAS connections as a "tunneling" protocol called SATA Tunneling Protocol (STP). SAS devices and systems will be in volume production as early as next year.
Fibre Channel Loop protocol is used within a storage system for very high performance and large spindle configurations. These loops, consisting of up to 127 devices, can run at 1- or 2-Gbits/sec serial rates, providing 200MB/sec (full duplex) or 400MB/sec (full duplex) maximum raw bandwidth. Fault tolerance and increased performance can be achieved using dual ring configurations. A 4Gbits/sec definition has been adopted by the Fibre Channel standards organization for future implementation.
SAN/NAS Protocol Evolution
The remote storage concept evolved first for disaster recovery applications with remote backup. The remote storage elements are connected over LAN/WAN networks and perform file-level operations. SAN evolved to support distributed processing with shared storage. To share a storage element across different computing elements, connectivity to the storage elements must be available over a network that can move block data very quickly.
A switched architecture satisfies this application and, as a result, Fibre Channel (FC) was adopted. In almost all SAN implementations today, FC switched fabric is the most dominant protocol used.
FC fabric is a point-to-point switched protocol operating at 1 or 2 Gbits/sec serial rates. The switched architecture provides bandwidth multiplication for high performance storage networks. Fibre Channel switched fabric usually consists of several end-point devices (hosts and targets) and several layers of switches.
SATA, SAS and FC-AL are the protocols of choice for internal connectivity within a storage system, whereas external connectivity between hosts and storage systems are FC switched fabric or iSCSI (IP storage) connections. FC switched fabric is a point-to-point protocol, whereas iSCSI is a networked protocol using classical TCP/IP connectivity layer However, in the first wave of iSCSI adoption, it is likely to be used as a point-to-point connection to avoid many issues (e.g. security, reliability, bandwidth sharing, etc.) related to network connections.
The Storage Network Industry Association (SNIA) is attempting to standardize the interlaces so that different components can interact without compatibility problems. But the important thing to realize is that even when the interfaces between different components are standardized, there are multiple protocols (FC, SATA, SAS, FC Loop) that may be mixed and matched throughout NAS and SAN architectures. This mix of protocols requires test and measurement tool manufacturers to provide engineers with products that will enable them to "see through" different protocols to efficiently manage storage systems and easily identify and correct potential problems.
Bus vs. Switched or Loop Protocols
As mentioned in the previous two sections, storage protocols of the future am moving away from "bus" architecture. Test instruments which can monitor traffic related to all the components in a bus will be able to monitor only the traffic related to the end points in a point-to-point connection.
To monitor interactions between multiple components in a switched fabric, multi-channel analysis is required. This is a paradigm shift in test instrumentation resulting in usage and pricing changes.
Protocol analyzers are a vital tool for engineers to diagnose problems fur all the major protocols from the storage devices to storage systems and applications. These tools are not a part of the network, but sit unobtrusively on the network to silently monitor the data--bit by bit.
At the lowest levels in all serial protocols, frames or packets carry higher-level protocol information. During analysis, all the serial bits must be captured, but only the relevant information should be presented to the engineer. For example, if the user suspects frame-level problems, only the low-level serial data should be presented as it flows through the wires. But if the user must analyze ATAPI-level commands (in SATA interface), the presentation layer should elevate the low-level packets to the ATAPI command level and show a transaction at that level.
There are a few important features required in serial protocol analyzers that would enable engineers to be very productive.
In SAN environments, debug can be very complex, with layers of switches and several end-point devices. To isolate faults in a short time in such a network, it would be essential to monitor several point-to-point links at the same time and collect protocol data in a time-correlated manner between all the links.
However, it should be noted that multi-link (multi-channel) analyzer equipment can get prohibitively expensive unless a modular approach is chosen, such as a solution that is enabled by cascaded analyzers. Using the cascaded approach, independent analyzers used in different locations can be brought together for a large configuration debug--practically at no additional cost.
Serial protocols carry idle frame patterns when there is no useful information transferred between the connected devices. This is to maintain a clock reference as well as to indicate that the physical connection is alive and well. Whereas idle frames are essential for physically maintaining a connection, they are not important for higher-level protocol layers that are concerned with actual information exchange.
At high speeds of greater than 1Gbit/sec, the idle frames associated with these serial protocols, if captured in an analyzer, can fill up the capture or trace buffer very quickly--in fact, useful information in this capture may be very minimal. That makes it essential to filter out all unnecessary information relative to the protocol layer being debugged and store only essential information in trace buffer.
For example, if SCSI over Fibre Channel is the layer of interest, READ SCSI commands from a host to a particular disk may be the only data the engineer wants to view. All other data can be filtered out, leaving only these commands in the memory. With effective filtering, the available trace buffer can be used very efficiently.
This is very important if traffic from multiple channels is to be collected in the trace buffer. In a multiple channel scenario, one channel may be carrying more information than another channel. If fixed memory is allocated per channel, the memory used by the channel with less information transfer is not utilized properly. So, memory pooling between several channels can be implemented to increase trace buffer utilization.
Triggering is another feature that enables troubleshooters to narrow down a problem. Sometimes it is difficult to guess the right trigger condition, and engineers may have to do some amount of "trial and error" captures to narrow down to the problem area. If multiple analyzers can be hooked up to the same observation point, setting each for different trigger conditions will reduce the time of sequentially doing multiple runs. This is an expensive solution. However, if a single analyzer can allow for several independent and parallel trigger events to be set, it would greatly enhance the usage of an analyzer.
Setting trigger conditions is not always easy. In some instances, parallel sequences might help reduce the time to debug. In other complex situations, it may be impossible to set trigger conditions. Some problems may be very infrequent with long gaps in between. To debug these problems, the only solution might be to collect data for several hours or days at a stretch and look for problem areas for further drilldown analysis. In these situations, it is not cost effective to collect data in semiconductor memory. Spooling trace data to disk as the data is being collected is one of the best ways to cost effectively implement a solution for catching intermittent problems.
Finally, the visual presentation itself must enable easy viewing of specific information. Presentations are graphical, textual, or a combination of both. The key is in making it easier for engineers to view exactly what they need to locate a problem. Dimming certain colors to highlight a particular field enables faster packet-by-packet viewing by only looking for a particular color.
The purpose of the presentation layer is to use the monitor screen area to present maximum information for the layer in which the debug is being attempted. Engineers can start with the highest layer (for example in a FC fabric, this layer may be FCP or SSCI-over-layer) to narrow down to a problem FCP exchange. The exchange can then be drilled down to expose what sequences were involved. If the problem is evident, the analysis stops at this layer. If not, the engineer can drill further down to frame level for each of the sequences for that exchange.
In conclusion, as networked storage becomes more complex, using multiple protocols in different areas of the network, engineers will face many challenges in designing, installing, and maintaining storage environments--from devices to very complex SAN and clustered networks. Protocol analyzers must adapt to this changing environment to provide scalable products and enhancements that enable engineers to keep storage architectures running efficiently-regardless of protocol choice.
Srikumar Chandran is senior director of storage business at CATC (Santa Clara, CA)