Gigabit Ethernet and transport offload: transport offload engines help relieve TCP processing burden for Gigabit Ethernet. (Connectivity).New processors arrive every few months with ever-increasing clock speeds and processing power. A new generation of Ethernet technology arrives every few years. The Internet continues its exponential growth Extremely fast growth. On a chart, the line curves up rather than being straight. Contrast with linear. . How do these three trends fit together and what changes can we expect to see in Ethernet technology as a result? We start with three laws The Three Laws may refer to:
* Metcalfe's Law "The value of a network increases exponentially with the number of nodes." By Bob Metcalfe, founder of 3Com Corporation and major designer of Ethernet. A network becomes more useful as more users are connected. A primary example is the Internet. : The value of a network grows as the square of the number of users. * Moore's Law "The number of transistors and resistors on a chip doubles every 18 months." By Intel co-founder Gordon Moore regarding the pace of semiconductor technology. He made this famous comment in 1965 when there were approximately 60 devices on a chip. : The number of transistors that can be integrated on a chip doubles every 18 months. * Gilder's Law: Bandwidth rises three times faster than computer power. The explosive growth of the Internet is explained by Metcalfe's Law and by the possible number of logical network interconnections, which grows with the number, N, of Internet nodes, as Nx(N-1)/2. Over a period of about 15 years Ethernet technology has increased in speed from 10Mbps to today's 10Gb Ethernet technology, a factor of 1,000 increase. Over a similar timeframe, advances in silicon technology, driven by Moore's Law, have allowed the CPU clock See clock. frequency in the average PC to increase from roughly 25MHz (MegaHertZ) One million cycles per second. It is used to measure the transmission speed of electronic devices, including channels, buses and the computer's internal clock. A one-megahertz clock (1 MHz) means some number of bits (16, 32, 64, etc. to 2.5GHz, corresponding to roughly a factor of 100 increase in processing power. So, after 15 years of progress, the Internet is enormously larger, networks are 1,000 times faster, but processors are only about 100 times more powerful. The difference in these last two growth rates Growth Rates The compounded annualized rate of growth of a company's revenues, earnings, dividends, or other figures. Notes: Remember, historically high growth rates don't always mean a high rate of growth looking into the future. is a result of Gilder's Law and tells us that networking speed is increasing faster than processing power. As this difference continues to widen, a new technology, transport offload, is needed to close the gap and will in tur n enable new and exciting uses for Ethernet. Transport Offload The Internet relies on the combination of the Transmission Control Protocol and the Internet Protocol See Internet and TCP/IP. (networking) Internet Protocol - (IP) The network layer for the TCP/IP protocol suite widely used on Ethernet networks, defined in STD 5, RFC 791. IP is a connectionless, best-effort packet switching protocol. or TCP/IP TCP/IP in full Transmission Control Protocol/Internet Protocol Standard Internet communications protocols that allow digital computers to communicate over long distances. . The majority of Internet traffic Internet traffic is the flow of data around the Internet. It includes web traffic, which is the amount of that data that is related to the World Wide Web, along with the traffic from other major uses of the Internet, such as electronic mail and peer-to-peer networks. is carried using TCP/IP packets over Ethernet. In theory, Gigabit Ethernet An Ethernet standard that transmits at 1 Gbps. Used mostly to connect high-end workstations and servers as well as for network backbones, Gigabit Ethernet transmits full duplex from point to point using switches and half duplex in a shared environment (CSMA/CD) using a hub. provides a full-duplex bandwidth of up to 2Gbps. Unfortunately, to reach this limit with a general-purpose processor requires dedicating nearly all of the host processing power to processing TCP/IP packets. TCP/IP packet processing requires roughly a MHz of CPU clock frequency for every Mbps of throughput. A full-duplex throughput of 2Gbps for TCP/IP over gigabit Ethernet requires running a general-purpose processor at 2GHz, just to handle the TCP/IP packet processing. Moore's Law and Gilder's Law taken together show that general-purpose processors cannot handle future demands for TCP/IP packet processing. The problem is to find a technology to process TCP/IP packets more effectively and more efficiently. The key new technology that solves this problem is transport offload. A transport-offload engine or TOE is a dedicated engine, typically a silicon chip, that relieves the CPU CPU in full central processing unit Principal component of a digital computer, composed of a control unit, an instruction-decoding unit, and an arithmetic-logic unit. of the host system from processing TCP/IP packets. Transport Offload Applications Once TOE technology relieves a host of the burden of processing TCP/IP packets, the combination of TOE and Ethernet becomes a transparent high-speed, serial, packet-switched, low-cost interconnect. TOE technology will then enable a vast range of new applications for Ethernet. For example, Greg Papadopoulos, CTO (Chief Technical Officer) The executive responsible for the technical direction of an organization. See CIO and salary survey. at Sun Microsystems, said, "What we're seeing right now is a convergence of exponentials: Moore's Law meets Gilder's Law meets Metcalfe's Law. Call it the Net Effect. Fortunately, the Net Effect makes it not just possible but practical to redefine storage. It's not a disk drive anymore; it's a Net-based service with standard protocols as its open interface. As we move forward, that's going to be the model for just about everything we do." TOE technology is the key that allows TCP/IP over Ethernet to provide this type of transparent and ubiquitous service. IP storage is an example of a Net-based storage service that uses a standard protocol, iSCSI. iSCSI is a new protocol that is driving the initial need for TOEs. In turn, TOE technology is enabling the development of other new protocols that will follow iSCSI. For nearly 20 years Ethernet has steadily increased in speed from 10Mbps to 10Gbps and, as a result, an enormous Ethernet infrastructure evolved. It is extraordinarily difficult to unseat an incumbent technology as strongly and deeply entrenched en·trench also in·trench v. en·trenched, en·trench·ing, en·trench·es v.tr. 1. To provide with a trench, especially for the purpose of fortifying or defending. 2. as Ethernet, and playing on this strength, TOE technology acts to extend Ethernet into areas where it has not been used before. Performance, including the need for full linespeed throughput, is a key feature of TOE technology. Low latency is also very important in creating and enabling new uses for Ethernet. Cluster computing and direct-attached storage (DAS) are examples of existing applications that benefit from low-latency interconnect. When you add low latency, comparable to that of InfiniBand for example, to TOE technology, Ethernet becomes an interconnect technology that can be used anywhere and everywhere from backplanes to long-haul optical links. TOE Architecture Challenges A TOE is typically a single silicon chip or part of a larger chip. There are several types of TOEs just beginning to become available, but we can group them into a few simple categories. TOEs may use a programmable processor or hardwired logic for the engine. The architectural tradeoffs between these two approaches include scalability (to higher operating frequency), flexibility, latency and power consumption. TOEs that use hardwired logic are more efficient and are capable of scaling to higher speeds, but harder to design. A processor-based TOE is easiest to design, flexible, but inefficient and hard to scale to operate above 1Gbps. TOEs perform either full offload or partial offload of TCP/IP packet processing. In a full-offload TOE all the TCP (1) (Transmission Control Protocol) The reliable transport protocol within the TCP/IP protocol suite. TCP ensures that all data arrive accurately and 100% intact at the other end. connection management is performed on-chip. A full-offload TOE is harder to design, but fits better with the existing software interfaces, most of which are based on Berkeley sockets. In a partial-offload TOE only the data processing is performed on-chip, everything else must be performed in the host TCP/IP stack. It is easier to design the hardware for a partial-offload TOE, but it is very hard to design the software to interface the hardware to a TCP/IP stack. There are several ways to design a TOE. The challenge is to balance a series of conflicting objectives or boundary conditions; three of the most important are power, cost and performance. In order to maintain reasonable power consumption (typically below 5W for the TOE chip) it is essential to keep the amount of logic and operating clock frequency low. To allow TOE-enabled Ethernet to be deployed anywhere and everywhere that a high-speed serial interconnect is needed, the TOE chip must be low-cost. In 2002, 10/100Mbps Fast Ethernet cards sell for less than $10 and 1Gbps network cards sell for less than $50. To be able to reach these price points TOE designs must keep the amount of logic small, to reduce the silicon die size, and avoid using a high clock frequency, which would require an advanced, and thus expensive, fabrication fabrication (fab´rikā´sh n the construction or making of a restoration. process. Reaching full linespeed performance goals is difficult to achieve--especially at small packet sizes. Consider the receive path design for a continuous stream of packets containing 64 bytes (we will ignore the inter-frame gap and preamble). At 1Gbps, we receive a bit every nanosecond (1) One billionth of a second. Used to measure the speed of logic and memory chips, a nanosecond can be visualized by converting it to distance. In one nanosecond, electricity travels approximately a foot in a wire. , and thus need to process 64 bytes about every 500ns. The processing needed to perform for a full-offload TOE includes a check of the Ethernet CRC (Cyclical Redundancy Checking) An error checking technique used to ensure the accuracy of transmitting digital data. The transmitted messages are divided into predetermined lengths which, used as dividends, are divided by a fixed divisor. , check of hardware addresses, frame type check, checking and processing of any VLAN See virtual LAN. VLAN - Virtual Local Area Network tags, stripping of the Ethernet frame header, a lookup on destination IP address in an address cache, a check for the correct protocol, calculation and check of the IP checksum A value used to ensure data are stored or transmitted without error. It is created by calculating the binary values in a block of data using some algorithm and storing the results with the data. , stripping of the IP header, parsing See parse. parsing - parser of TCP flags, a lookup on source and destination IP addresses and ports, creation of TCP pseudo-header checksum, calculation of the TCP checksum, stripping the TCP header, and DMA (1) (Digital Media Adapter) See digital media hub. (2) (Document Management Alliance) A specification that provides a common interface for accessing and searching document databases. of the packet data to host memory. Using a processor running at 1GHz we have only 500 clock cycles to perform these operations -- an impossible task and the reason that a general-purpose processor cannot deliver full linespeed for small packets at 1Gbps. A TOE design that employs multiple processors can just perform these operations at 1Gbps, but such a design will typically use 5-10 processors running at 100-300MHz fabricated on an advanced, and expensive, fabrication process of below 0.18 micron. Besides the difficulty of organizing the memory arbitration between multiple processors, the power dissipation and cost of a multiple-general-purpose processor solution is unattractive--and such a design certainly does not scale well to higher speeds. The only way to architect a TOE that achieves an optimum balance between power, cost and performance objectives is to use special-purpose, highly optimized, hardwired logic machines. Each of these logic machines is designed and dedicated for each of the operations to be performed and all logic machines must operate in parallel. If we require 1Gbps throughput and use a 64-bit wide data path to perform all operations in a massive pipeline, we can use a clock frequency of 1Gbps/64 or only 16MHz. At 10Gbps, such a design still only requires a clock frequency of 160MHz, well within the reach of a 0.18 micron process. [GRAPH OMITTED] www.iready.cam Michael J. S. Smith is chief technology officer at iReady (Santa Clara, Calif.) |
|
||||||||||||||||||

Printer friendly
Cite/link
Email
Feedback
Reader Opinion