Webway Sigalerts--And How To Unsnarl Them.Part 1--choked by its own access This article is the first in a two part series. The second part will appear in the November issue of CTR See click-through rate. . Measuring and improving Web site response is big business, and getting bigger. Many of the players were at the Keynote Global Internet Performance Conference in August, presented by Keynote Systems Keynote Systems, Inc. provides measurement and monitoring, service level and customer experience management services to customers to improve e-business performance by reducing costs, improving customer satisfaction and increasing profitability. Inc. and there was much to be learned. Next month I'll review some of the information that was presented there; this month, I'll set the stage by discussing the problem of Internet congestion The condition of a network when there is not enough bandwidth to support the current traffic load. congestion - When the offered load of a data communication path exceeds the capacity. and the concomitant slowdown in Web site response, what causes it, and what the prognosis is. The Web is being choked by its own success. Ironically, this is due to the very protocol whose robustness makes the Internet possible, the Transmission Control Protocol, the TCP (1) (Transmission Control Protocol) The reliable transport protocol within the TCP/IP protocol suite. TCP ensures that all data arrive accurately and 100% intact at the other end. in TCP/IP TCP/IP in full Transmission Control Protocol/Internet Protocol Standard Internet communications protocols that allow digital computers to communicate over long distances. . Along with IP and the fundamental routing protocols A formula used by routers to determine the appropriate path onto which data should be forwarded. The routing protocol also specifies how routers report changes and share information with the other routers in the network that they can reach. of the Internet, TCP has created a network that is at once a world library (over a billion pages and growing almost exponentially), a soapbox for everyone from genius to crank (and worse), and the beginnings of a highway for e-commerce. The problem is, even though almost everyone has a Porsche on their desk--not to mention what's possible in a modern data center or Web farm, the highway is all too often more like a potholed pot·hole n. 1. A hole or pit, especially one in a road surface. Also called chuckhole. 2. A deep round hole worn in rock by loose stones whirling in strong rapids or waterfalls. 3. Western U.S. one-and-a-half lane country road. The biggest complaint of users today is slow Web page response, which results from Internet congestion: too many packets flying around. Research suggests that one-third of users will abandon a page if it takes more than eight seconds to load, but that the average load time on today's Internet may average anywhere from three to nine seconds. And research at IBM (International Business Machines Corporation, Armonk, NY, www.ibm.com) The world's largest computer company. IBM's product lines include the S/390 mainframes (zSeries), AS/400 midrange business systems (iSeries), RS/6000 workstations and servers (pSeries), Intel-based servers (xSeries) almost 30 years ago (during the design of SNA (Systems Network Architecture) IBM's mainframe network standards introduced in 1974. Originally a centralized architecture with a host computer controlling many terminals, enhancements, such as APPN and APPC (LU 6. ), showed that true interactivity requires a response time of no more than two seconds. Much of the time, the world's largest interactive system, isn't! The Marching Morons It could hardly be otherwise, and throwing more bandwidth at the problem won't help as much as some think. The problem is that both the robustness and the congestion of the Internet arise because it's a dumb network--the intelligence is at the endpoints, much of it encapsulated in TCP, which is responsible for seeing that all the packets in a connection eventually reach the endpoint in the proper order, and, even more important, is also responsible for controlling congestion via flow control. The routers that interconnect the various networks that comprise the Internet know nothing of flow control. They are simply buffers for packet input and packet output, and a routing engine that knows how to figure out which router to relay each packet to. If either set of buffers fill up, either because packets are coming in too fast for the router engine, or because an output link can't take them out as fast as the router engine can decide their next hop, the router has no way to tell the rest of the network about this congestion. It simply starts dropping packets. It's up to TCP to detect this and throttle back throttle back Verb to reduce the speed of a vehicle or aircraft by reducing the quantity of fuel entering the engine: throttling back the engine failed to bring the plane under control its sending. I Know It When I See It The phrase "I know it when I see it" is a colloquial expression by which the user attempts to categorize an observable fact or event, although the category is subjective or lacks clearly-defined parameters. Here's where it gets really interesting. TCP actually tries to cause congestion, at least momentarily, so that it can judge the capacity of the path(s) the packets it sends are taking. The TCP stack at the receiver (e.g. a browser) first establishes a connection, which takes three packets for standard traffic, but six for SSL (Secure Sockets Layer) The leading security protocol on the Internet. Developed by Netscape, SSL is widely used to do two things: to validate the identity of a Web site and to create an encrypted connection for sending credit card and other personal data. , making encrypted en·crypt tr.v. en·crypt·ed, en·crypt·ing, en·crypts 1. To put into code or cipher. 2. Computer Science traffic far more sensitive to congestion. During this exchange, the receiver advertises its "window," which is the maximum number of bytes that its input buffer can hold. Then in what's called the "slow-start" algorithm (which is actually exponential; go figure!) the TCP stack in the server sends one packet and waits for a special ACK (ACKnowledgment code) The communications code sent from a receiving station to a transmitting station to indicate that it is ready to accept data. It is also used to acknowledge the error-free receipt of transmitted data. Contrast with NAK. 1. packet back from the receiver. Each ACK contains the sequence number of the next packet expected, which is required for packet ordering, and also plays a role in congestion detection. Receiving that, it then sends two; if that is ACKed, then four, and so forth up to a threshold, where it throttles back to an increase of one packet each exchange. The number of packets sent during each exchange is called the "congestion window In TCP, the congestion window determines the number of bytes that can be outstanding at any time. This is a means of stopping the link between two places from getting overloaded with too much traffic. ." During this process, three things can happen. First, all the packets can get through, at which point TCP closes the connection with an exchange of special packets, ending with FIN. Second, the sender may not receive an ACK within a timeout period, which is determined by an algorithm that's only slightly pertinent to this discussion. TCP then assumes the packet is lost, halves the threshold it's been working with (the point at which it slows the doubling progression to unit increments), drops back to a congestion window of one packet, and retransmits the next packet in the sequence, as determined by the last ACK received. In effect, TCP pushes the envelope until it stresses the connection beyond its capacity, causing congestion, and then backs off! Third, the sender may receive a duplicate ACK, sent by the receiver when the latter received a packet whose sequence number is not the next one expected. Because packets in a single TCP connection can flow through different network routes with different latencies (transit time transit time the time required for ingesta to pass through the gastrointestinal tract; a shorter transit time is seen in conditions associated with gut hypermotility, such as diarrhea. Delayed passage from any cause results in a longer transit time. ), they can become re-ordered. If three duplicate ACKs are received before the timeout (because each ensuing en·sue intr.v. en·sued, en·su·ing, en·sues 1. To follow as a consequence or result. See Synonyms at follow. 2. To take place subsequently. packet causes an identical ACK containing the sequence number of the last packet arriving in order), the TCP stack executes a fast re-transmit of the last sequence of packets, assuming that the congestion was only momentary mo·men·tar·y adj. 1. Lasting for only a moment. 2. Occurring or present at every moment: in momentary fear of being exposed. 3. Short-lived or ephemeral, as a life. , since only one packet was lost and the following ones got through. Otherwise, a timeout ensues and has the effect discussed above. Costly Troublemaker So TCP uses its own traffic to probe for congestion by attempting to cause it, by attempting to overload routers, which are the fundamental bottlenecks of the Internet. This is why more bandwidth won't help as much as one thinks; that, and Parkinson's Corollary corollary: see theorem. . Just as work expands to fill the time available for it, so does Internet traffic Internet traffic is the flow of data around the Internet. It includes web traffic, which is the amount of that data that is related to the World Wide Web, along with the traffic from other major uses of the Internet, such as electronic mail and peer-to-peer networks. expand to fill the bandwidth supplied. E-commerce vendors and content providers constantly strive to strike a balance between fast page download and compelling content to create or maintain a competitive advantage. Vendors delivering applications over the Internet (so-called Application Service Providers, or ASPs) struggle with a similar balance between the speed of application response and the richness of the information it can supply or process. As the bandwidth available to them increases, they will naturally "test the envelope" of the Internet by adding more and richer content or "bigger" applications to their sites. And remember that under HTTP HTTP in full HyperText Transfer Protocol Standard application-level protocol used for exchanging files on the World Wide Web. HTTP runs on top of the TCP/IP protocol. 1.0, each element of a Web page requires a separate TCP connection. (HTTP 1.1 changes this, but introduces in the process its own problems with congestion.) Now imagine the impact of billions of TCP connections doing the same thing all at once! The result is tremendous burstiness: sudden increases and declines in demand. Of course, it should be noted that the majority of TCP connections are very short; something like 25% of all connections from a typical Web portal See portal. site--which are virtually all file transfers delivering a web page component--are only three packets long. The distribution (probability) of file sizes, caused mostly by the tremendous increase in and dominance of HTTP traffic, is also part of the problem: it creates a self-similarity in network traffic (fractality) that makes it largely impossible to predict network traffic. What self-similarity means is that no matter what scale you observe data network traffic--seconds, minutes, hours, days, or even longer--you see the same overall pattern of burstiness. This is in contrast to the voice network, which smoothes out over larger scales, making it relatively easy, using standard queuing theory queuing theory Study of the behaviour of queues (waiting lines) and their elements. Queuing theory is a tool for studying several performance parameters of computer systems and is particularly useful in locating the reasons for “bottlenecks,” compromised , to estimate demand over time. As a result, the planning theory and tools that work so well for the phone system don't work at all for the Internet, and theorists and vendors alike are hard at work trying to develop new ways of describing network traffic with the aim of overcoming its unpredictability, which necessitates overbuilding the network. In 2002, it's estimated that the major carriers will spend $1.6 trillion on the data network infrastructure in North America North America, third largest continent (1990 est. pop. 365,000,000), c.9,400,000 sq mi (24,346,000 sq km), the northern of the two continents of the Western Hemisphere. alone! Life And Death At The Browser Interface OK. The Internet is slow. What's the bottom line? Simply that the quality of the user experience translates into money, either lost or gained. Consider business-to-consumer vendors, like Amazon.com, or a Web portal like Yahoo. Both of them live or die at the browser interface. Page load speed is a critical factor, in some cases even more important than uptime. Customers may return to a site that was down when they visited, they rarely do so to a site that was too slow. The impact of slow Web page response is measured here in terms of customer conversion: the percentage of visitors to a web site or page that complete an activity of value to the site: transaction, registration, click-through, etc. Speed up page response, and, all other things being equal, you make more money. And it doesn't take much. Zona Research notes that at one site, decreasing the home page load time by only one second dropped the bail-out rate from 30% to 6-8%. Put another way, since the purpose of a home page is to be clicked through, that one second decrease in load time increased the conversion rate for that page almost 33%, from 70% to around 93%. On an e-commerce site that's a third more potential transactions! For business to business sites, or ASPs, the impact of slow response is more likely to be measured in lost productivity, although customer retention is important. But here the assumption is more that the web site is intended to save money by replacing more costly ways of doing the same job. If it's too slow, users simply abandon it and use the old way--e.g. phone or fax--thus eliminating the savings. Peter Sevcik, president of NetForecast, a networking consultancy, described the impact of a two-second increase in page response time (two seconds slower) on an insurance firm using a Web-based claims reporting service: they had to hire an entire additional shift of processing clerks to get the work done! And if the application was supplied by an ASP, do you think they kept the insurance firm's business? As a VAR or systems integrator, if you are in any way involved with networking, these problems will have a fundamental impact on your bottom line. There are many solutions, and many ways to measure the problem. Next month I'll look at some of them. |
|
||||||||||||||||||

Printer friendly
Cite/link
Email
Feedback
Reader Opinion