Implementing PCI Express for storage.Worldwide production of information is growing rapidly and putting enormous pressure on storage capacity. Many organizations are dealing with this flood of new data by consolidating applications and data on a limited number of more powerful servers and higher capacity storage arrays. This, in turn, creates the need for higher I/O (Input/Output) The transfer of data between the CPU and a peripheral device. Every transfer is an output from one device and an input to another. See PC input/output. I/O - Input/Output speeds. Storage array performance is dependent on fast internal busses. Busses like PCI (1) (Payment Card Industry) See PCI DSS. (2) (Peripheral Component Interconnect) The most widely used I/O bus (peripheral bus). have developed extensions to increase capacity, but often adversely impact cost and complexity. A newer solution is PCI Express A high-speed peripheral interconnect from Intel introduced in 2002. Note that although sometimes abbreviated "PCX," PCI Express is not the same as "PCI-X" (see PCI-SIG and PCI-X for comparison). As a result of the confusion, "PCI-E" or "PCIe" is the accepted abbreviation. technology, which brings special application and implementation considerations to bear on storage system design. Data flow, quality of service (QoS), software modifications, silicon component selection criteria and board layout are worthy of review for this next-generation PCI bus PCI bus - Peripheral Component Interconnect . Storage Array Data Flow The data path between clients and storage arrays is a journey across diverse networking devices and I/O busses. The weakest link in this network is typically an I/O bottleneck A lessening of throughput. It often refers to networks that are overloaded, which is caused by the inability of the hardware and transmission lines to support the traffic. It can also refer to a mismatch inside the computer where slower-speed peripheral buses and devices prevent the CPU or a source of non-recoverable errors. For some, the PCI bus has been a weak link and PCI Express will alleviate their issues with bandwidth and robustness. Many clients, servers, and storage arrays implement PCI busses, creating five or more locations where PCI Express can improve the data flow. Data Path: Figure 1 shows a network connecting clients to a storage array, as well as the locations of PCI busses and where PCI Express migrations are likely to occur. In this example, clients on an Ethernet LAN (Local Area Network) A communications network that serves users within a confined geographical area. The "clients" are the user's workstations typically running Windows, although Mac and Linux clients are also used. access multiple heterogeneous servers. The servers also connect to the storage array through an alternate interface. The data flow between a client and a storage array is described in the following, and references Figure 1: * The process begins with a client request to a general purpose server on the LAN. The client may be an individual PC or a workstation with specific data and application demands. The client is a CPU CPU in full central processing unit Principal component of a digital computer, composed of a control unit, an instruction-decoding unit, and an arithmetic-logic unit. and chipset with a PCI bus, which interfaces to an Ethernet controller connected to the LAN. * The server runs applications and manages data for clients. The server implements a similar architecture as the client's CPU/chipset combination. However, the server uses faster speed PCI-X (PCI eXtended) An enhanced PCI bus technology originally developed by IBM, HP and Compaq that is backward compatible with existing PCI cards. PCI and 32-bit PCI-X slots are physically the same, and PCI cards can plug into PCI-X slots. busses to interface its Ethernet controller to the LAN. [FIGURE 1 OMITTED] * The server requires a connection to the storage array independent of the LAN. The server connects to a channel adapter that manages the communication with the switched backplane An interconnecting device that has sockets for printed circuit boards to plug into. Passive and Active Although resistors may be used, a "passive" backplane adds no processing in the circuit. . The server uses another PCI-X bus to host a network controller that could support an Ethernet, iSCSI, Fibre Channel (FC) or custom bus. * The channel adapter sends the message to the switch on an iSCSI, FC, Infiniband or custom bus. In some cases, the switch may be more of a router, understanding how to send messages to specific devices as opposed to just steering messages to a particular port. The channel adapter handles messaging tasks such as TCP/IP TCP/IP in full Transmission Control Protocol/Internet Protocol Standard Internet communications protocols that allow digital computers to communicate over long distances. offload To remove work from one computer and do it on another. See cooperative processing. , data queuing and bus translation. * The storage array controller is often a special function server implementing PCI-X busses. One PCI-X link will host a network controller connected to the switch/router. The storage array controller runs applications, caches data and helps ensure the data integrity of the system. For low to midrange midrange Epidemiology The halfway point or midpoint in a set of observations; for most data, MR is calculated as the sum of the smallest observation and the largest observation, divided by 2; for age data, one is added to the numerator; a midrange is usually systems, the storage array controller may perform RAID functions. * The storage array controller sends data requests to the disk adapter on PCI-X bus. The disk adapter controls the disk array though an FC channel link and typically perform RAID functions. * The disks implement an FC-AL (Fibre Channel-Arbitrated Loop) See Fibre Channel. FC-AL - Fibre Channel-Arbitrated Loop. link, where AL is the arbitration loop. The arbitration loop is a link that connects all the disk drive nodes together and manages with a token-acquisition protocol. In this example, requests from the client to the storage array travel across five PCI/PCI-X busses. The PCI/PCI-X busses are located in the client, the applications server and the storage array controller. In the second half of 2004, these PCI/PCI-X busses will begin to migrate to PCI Express. In addition, some switch/router interfaces in the storage array may transition from Ethernet, iSCSI, Fibre Channel or custom to PCI Express. PCI Express offers cost advantages, scalability, and full-duplex operation. RAS (1) See network access server. (2) (Remote Access Service) A Windows NT/2000 Server feature that allows remote users access to the network from their Windows laptops or desktops via modem. See RRAS and network access server. improvements can increase data integrity, a key criteria in storage. PCI Express enhances reliability by implementing differential pairs Differential pair is a pair of conductors with special characteristics, used for differential signaling. Examples of the differential pair include:
tr.v. de·cod·ed, de·cod·ing, de·codes 1. To convert from code into plain text. 2. To convert from a scrambled electronic signal into an interpretable one. 3. that embeds the clock in the data signal, alleviating signal/clock line timing skew (1) The misalignment of a document or punch card in the feed tray or hopper that prohibits it from being scanned or read properly. (2) In facsimile, the difference in rectangularity between the received and transmitted page. . PCI Express supports two levels of Error Correction Codes Noun 1. error correction code - (telecommunication) a coding system that incorporates extra parity bits in order to detect errors ECC telecommunication - (often plural) the branch of electrical engineering concerned with the technology of electronic (ECC (1) (Error-Correcting Code) A type of memory that corrects errors on the fly. See ECC memory. (2) (Elliptic Curve Cryptography) A public key cryptography method that provides fast decryption and digital signature processing. ) checking for both Data Link Layer and Transaction Layer errors. With parallel busses, a bus failure can bring down all the boards connected to the bus. As a point-to-point bus, a PCI Express link failure may be isolated from other boards so portions of the system continue to function and remain available. To aid serviceability (system) serviceability - The ease with which corrective maintenance or preventative maintenance can be performed on a system (e.g. by a hardware service technician). Higher serviceability improves availability and reduces service cost. Serviceability is one component of RAS. , PCI Express supports features such as hot plug, power budgeting and power management. [FIGURE 2 OMITTED] QoS for Storage Storage arrays maintain data for a wide variety of servers hosting a range of applications. PCI Express offers Quality of Service (QoS) features to provide higher bus bandwidth to priority data types. Figure 2 shows a conceptual example of storage array implementing PCI Express. A switch is connected to three servers as well as a root complex that manages the storage array. PCI Express implements Virtual Channels and Traffic Classes to provide a flexible control mechanism to shape data flow. Each server is transmitting three data types, which are assigned a priority in terms of a Traffic Class (TC). There are eight TCs with TC0 the lowest priority and TC7 the highest. The Administration Application Server assigns Error Signaling messages its highest priority, TC6. Since the primary responsibility of a storage array is to maintain data integrity, error-signaling messages receive the highest priority so error recovery sequences can be launched as quickly as possible. Data Backup is assigned to TC1 and Power Management is assigned TC0. Data types maintain the same TCs throughout the system. At each node, TCs are assigned to a Virtual Channel (VC). VCs provide a means for the application to allocate bus bandwidth. In Figure 2, the root has three VCs and assigns timeslots as follows: one time slot Continuously repeating interval of time or a time period in which two devices are able to interconnect. for VC2, followed by two time slots for VC1, then another time slot for VC2 and finally two time slots for VC0. The sequence repeats itself continuously. VC2, which is mapped to TC6 for error signaling, has two of the seven time slots. The two time slots are separated in the cycle so the latency is no greater than three time slots. Time slots can be assigned in one of three tables to allow applications to define flexible weighted round-robin arbitration schemes. These tables accommodate 32, 64, and 128 entries. Software Modifications PCI Express is software compatible with PCI and PCI-X systems. This means that existing operating systems Operating systems can be categorized by technology, ownership, licensing, working state, usage, and by many other characteristics. In practice, many of these groupings may overlap. and device driver software will function properly in a PCI Express system without change. However to take advantage of additional features of PCI Express, software needs to be rewritten. In particular, system designers should consider taking advantage of PCI Express enhancements to interrupts, quality of service, power management and error correction and detection capabilities. The implementation of these features involves various levels of hardware and software modifications. PCI Express supports inband Message Signaled Interrupts Message Signaled Interrupts, in PCI 2.2 and later and PCI Express, is an alternate form of interrupt from the traditional pin-signalled system; instead of asserting a given IRQ pin, a message is written to a segment of system memory. (MSI MSI: see integrated circuit. (1) (MicroSoft Installer) See Windows Installer. (2) (Medium Scale Integration) Between 100 and 3,000 transistors on a chip. See SSI, LSI, VLSI and ULSI. ), similar to PCI-X. This mechanism reduces interrupt servicing latency as well as eliminates interrupt signal lines on the printed circuit board. Quality of service has been discussed, and new software is required to configure this capability as well as to determine the arbitration schemes for virtual channels and ports. These mechanisms help manage data flow for specific transaction types. Power budgeting features include mechanisms to query add-in cards for power requirements, so software can determine whether the new card can be supported from power delivery and cooling perspectives. Power management provides the means to place PCI-enabled devices into different power states (fully active, standby, sleep, off, etc.) depending upon the state of the storage array. In addition, software elements supporting hot-plug are defined by the PCI Express specification. PCI Express supports more extensive error detection, signaling and logging than predecessor PCI busses. New software is required to respond intelligently to this additional error information. To take full advantage of new PCI Express features, significant software coding is required. However, the backward legacy support of PCI Express allows system developers to migrate to these new features at their desired pace. Also, many of these features are system level, which may have limited impact on the basic functionality of endpoints that were originally designed for PCI and PCI-X. Component Selection The selection of PCI Express enabled devices encompasses data bandwidth, system architecture and usage model considerations. System designers should ensure their PCI Express subsystems are well-balanced and the required data traffic profile is realized. Considerations: A storage array may employ a PCI Express topology with four components. Figure 2 shows three PCI Express enabled components: root complex, switch, and four endpoints corresponding to the four channel adapters. If any endpoint is not PCI Express capable, a bridge is required such as a PCI Express to PCI/PCI-X bridge. The selection criteria for root complex, switch, endpoints and bridges components are driven by data bandwidth, system architecture and usage model considerations. The data bandwidth needs of different priority transactions of the storage array dictate requirements around port configuration, arbitration mechanisms and maximum payload (1) Refers to the "actual data" in a packet or file minus all headers attached for transport and minus all descriptive meta-data. In a network packet, headers are appended to the payload for transport and then discarded at their destination. size. System architecture and usage model determine the applicability of features such as peer-to-peer transfers, hot plug capability and power consumption. The first order task is to properly provision priority bandwidth to meet guaranteed latency specifications. For example, error signaling transactions have a higher priority than data backup transactions. To ensure a PCI Express port complies with priority data flow requirements, port capability is computed using simple calculations involving link speed, width (lanes) and maximum payload size as well as more complex modeling of port and virtual channel arbitration. Port arbitration can occur in two components, switches and the root complex. In Figure 2, the switch performs port arbitration for the four application servers to control the traffic flow between its four ingress An entrance. Contrast with "egress," which means exit. See ingress traffic. See also Ingres 2006. ports and the root complex. A root complex with multiple ports performs port arbitration for peer-to-peer transactions and for access to a common egress See ingress. port such as system memory. An examination of the virtual channel port arbitration schemes, such as weighted round robin, is advised to ensure sufficient data flow for priority transactions. The next task is to size non-priority data flow and to budget for incidentals such as Data Link Layer retries re·tries v. Third person singular present tense of retry. . At this point, it may be useful to consider whether the priority transactions are bursty Refers to data that is transferred or transmitted in short, uneven spurts. LAN traffic is typically bursty. Contrast with streaming data. in nature. This warrants a special analysis of the arbitration schemes to confirm low-priority data flow is sufficient to ensure functional correctness. [FIGURE 3 OMITTED] Finally, storage array system architecture and usage model drive other component selection criteria. System architecture may require peer-to-peer transaction support, such as direct communication between two application servers across the switched backplane. The usage model may support hot swap To pull out a component from a system and plug in a new one while the main power is still on. Also called "hot plug" and "hot insertion," hot swap is a feature of USB devices, allowing an external drive, network adapter or other peripheral to be plugged in without having to power down the of cards to increase the availability of the storage array. For some appliances, power consumption may be a key concern, especially if the system includes a mix of root complexes, bridges, switches and endpoints. Selection: First generation root complexes, switches and bridges typically support one or two virtual channels with eight traffic classes. For most storage arrays, this configuration offers ample traffic shaping Using methods to keep traffic flowing smoothly in a network. Although the term is often used synonymously with "traffic engineering," traffic shaping deals with managing the network moment to moment, whereas traffic engineering refers to the overall strategies employed in a network. capability; high and low priority transactions may be split between the two virtual channels, with the high priority channel assigned greater bandwidth than the low priority channel. Designers should select components with balanced capabilities. For example, connections between endpoints and switches default to the lowest common denominator low·est common denominator n. 1. See least common denominator. 2. a. The most basic, least sophisticated level of taste, sensibility, or opinion among a group of people. b. for the number of virtual channels and maximum payload size. In other words Adv. 1. in other words - otherwise stated; "in other words, we are broke" put differently , an endpoint can only transfer data as fast as the switch can handle, and vice versa VICE VERSA. On the contrary; on opposite sides. . A switch supporting flexible port widths is useful for handling various combinations of ports and lane widths to match different endpoints. In future generations of PCI Express components, root complexes with two or more ports will act as a switch. From a topology perspective, a discrete switch component is eliminated, saving board space. On the other hand, the flexibility to choose among a wide variety of discrete switches and features is forgone. PCI Express provides system designers greater flexibility to shape the data traffic flow of their system. During component selection, system designers balance the features and capabilities of various components to ensure bandwidth, architecture and usage model requirements are supported throughout the system. Board Layout Laying out fewer signal lines isn't always easier. Although PCI Express boasts greater I/O bandwidth per pin than prior PCI bus family members, the faster bit rate for PCI Express necessitates using different layout techniques. PCI design guides recommend routing signal lines along a similar path with trace lengths matching to within 25 mils. With PCI Express, the two signals comprising a lane must match trace lengths to within 5 mils. This stringent matching specification helps maintain the signal integrity of this differential pair. The impact is that "bumps" must be added to PCI Express signal lines to add additional trace length to compensate for bends. Figure 3 illustrates trace length compensation, showing PCI Express and PCI bus routing from the chipset pins. In A, the PCI Express signal lines have additional bumps to match trace lengths, whereas these bumps are not required for PCI bus lines. For example, signal Z and signal Y make two bends after leaving the device pin and signal Z cuts the corner more sharply than signal Y. Signal Z traverses a slightly shorter distance which necessitates the addition of four bumps after point C for trace length compensation. Similarly, signal Y takes a shorter path than signal Z at point D and two compensation bumps were added to its trace. The need to carefully match trace lengths has several repercussions repercussions npl → répercussions fpl repercussions npl → Auswirkungen pl . First, layout designers need to spend additional time counting bends and planning the placement of compensation bumps nearby. Differential pairs require length bumps to be placed near bends to minimize line-to-line signal skew. Therefore, trace compensation is needed throughout signal traces and cannot be 'ganged' together in one location. Adding bumps is a manual process today, but in the future, one can anticipate layout tools to assist bump placement. Second, layout designers quickly learn to relax the spacing between PCI Express signal lines so that bumps can be easily added intermittently without impacting the fan-out of the overall PCI Express bus. This may effectively double the line-to-line spacing (shown on the left side of Figure 3, part A). Third, layout engineers also need to ensure trace lengths between lanes match to within 15 mils. This tolerance is less forgiving than PCI and also requires layout designers to pay attention to an additional set of constraints. Fourth, layout designers are discouraged from switching layers when routing PCI Express signal lines. This restriction necessitates greater up-front planning of component placement on the board. Although PCI Express busses have fewer traces than prior PCI busses, the strict trace length matching specifications make board layout a time intensive task. Conclusion PCI Express addresses the need for bus performance in storage arrays, capable of supporting 10Gbit/sec networks and beyond. Additional features for quality of service and power management provide for even more reliable storage arrays. PCI Express offers reduced complexity, although system designers must do additional work to capture all the benefits offered by new PCI Express features. PCI Express is a major departure from prior PCI busses. Although backward compatibility See backward compatible. (jargon) backward compatibility - Able to share data or commands with older versions of itself, or sometimes other older systems, particularly systems it intends to supplant. has been maintained, the impact of PCI Express on I/O bandwidth, system architecture and usage models for storage arrays will be significant. www.intel.com Craig Szydlowski is strategic marketing engineer at Intel Corporation (company) Intel Corporation - A US microelectronics manufacturer. They produced the Intel 4004, Intel 8080, Intel 8086, Intel 80186, Intel 80286, Intel 80386, Intel 486 and Pentium microprocessor families as well as many other integrated circuits and personal computer networking (Santa Clara Santa Clara, city, Cuba Santa Clara (sän`tä klä`rä), city (1994 est. pop. 217,000), capital of Villa Clara prov., central Cuba. , CA) |
|
||||||||||||||||||||

Printer friendly
Cite/link
Email
Feedback
Reader Opinion