Using Edge Caching To Speed Site Performance.
As more Web sites are created using dynamic Web pages to serve their content, the Internet's ability to deliver that content in real time is decreasing. At the same time, users demand faster page delivery to match their faster connections. Edge caching is the technology that is driving the development of fast data delivery services. Caching allows Web sites to manage their content in complex database-driven publishing systems, without having to serve pages real time.
Currently only a handful of companies are doing dynamic edge caching--the majority of caching companies are focused on caching graphics for those pages, include Akamai, CacheFlow, and Inktomi. This article discusses how edge caching works, its main benefits, and how dynamic caching on the edge will evolve in the Internet of the future.
Edge caching is the ability to distribute content from a local Web server to caching servers that are closer to the end user--nearer the "edge". The technology is usually implemented in a service model, where a service company provides a cached version of a Web site and charges by the number of bytes served to users from the cache.
This type of server scales very well, since the company running the Web site only has to pay for the traffic they receive, instead of making initial capital investments in hardware and monthly charges for bandwidth. The company can then implement their marketing plan without worrying about the Web server keeping up with demand.
The end user requests a page from the cache server just as he would from any Web site, using a domain name that is specific to the caching server. When the caching server receives the request, it checks to see if it has a cached page that is specific to that URL. If it does, it serves that page. If not, it requests the page from the origin server and saves the output of that page to the cache. This reduces the work the origin Web server has to do, since the pages are served primarily from the cache.
Another advantage of edge caching is that there are multiple edge servers handling requests. When the browser asks the DNS server to resolve the domain name of the caching server, the DNS server responds with the closest edge server to the user. The pages are then cached on this edge server. This reduces the number of hops the user has to take to get the requested pages. The transaction can be even faster if the pages are already cached from another user's traffic.
Because the edge caching server name is different from the origin server, the URL for the page might be significantly different than the origin Web site's URL for that page--causing branding loss because the URL address may change from www.companyname.com to www.caching.com/accountl2345/. Because the server name changes, edge caching is primarily used for images or other inpage graphics and streaming media where the server name does not need to be displayed in order to display the image.
Graphics: The Beauty And The Beast
Graphics are the beauty of the Web. Site designers are always trying to add more without going over the threshold of size that will slow the rendering time of the page. The graphics in a page on average consume 80 percent of the overall download time of the page. When a request for a page is made, the response is returned in the form of HTML. The browser then parses the page and finds all the references to the images. These images are then requested from the server, all at the same time.
One technique Web designers employ to speed downloads is to use the same graphic on multiple pages. This allows the browser to cache the graphics and reuse them without having to return to the server. But this method is different from caching on an edge server. While graphics re-use can remove the strain on the Web server for some pages, you must assume that all graphics will be requested on the first request to the home page.
Edge caching can help remove the strain of severing graphics from the Web server. Assume that one page request spawns twenty simultaneous requests for the twenty graphics needed to render the page. Handling the twenty requests for the graphics stresses the Web server, since all graphics come at the same time and the bandwidth to serve them is greater than the size of the page. Edge caching allows you to offload this stress to a third party without affecting the page display or the URL shown in the address bar.
The ability to offload graphics to another machine is probably the biggest benefit for the Web administrator in terms of edge caching. Consider having a Web server that just served images, located at a faster remote facility. Changing the graphics URL to point to the new server removes the load from the Web server. However, edge caching is more than just the reduction of load, it is the ability to cache closer to the end user.
The Big Sacrifice
Unlike the first releases of edge caching, which only cached images and streaming media, the next generation of edge caching caches dynamic pages. But using edge caching with dynamic pages has a tradeoff: you sacrifice the real-time changes to gain performance.
In the early Internet, virtually all the pages were created by hand--text files containing HTML saved to disk. However, as the Internet matured it became apparent the average Webmaster could only maintain 200-300 pages by hand during a 40-hour work week. This is because as you add more pages, the amount of maintenance on the remaining pages goes up exponentially. For example, if you have a hundred pages and you add another, on average you will need to modify twenty percent of the hundred pages in order to link in the 101st page. This means that you will need to modify 20 pages; if you have 200 pages, you will need to modify 40 pages in order to add one more. In a big site, perhaps 100,000 pages, you would have to modify 20,000 pages just to add a new product. Cross selling, up selling, and inter-category links compound the problem.
In order to overcome this barrier, Webmasters created dynamic pages. Using databases and telling the Web server the rules for displaying the information in the database, adding a page is as easy as adding a row in the database. Now sites of 100,000 pages are possible because the computer is doing the maintenance. However, as designed, most of the content publishing systems in use today generate pages on-the-fly and in real time for each request. This means that there is latency between the request for the pages to the server and the response to the browser. This latency is sometimes noticeable to the user. In high load situations where the server is generating hundreds of pages a second, latency can increase to unacceptable levels.
This problem is magnified by the fact that users are getting faster connections to the Internet. On a connection speed of 56Kbps, all Web pages display slowly. But as DSL and cable modems increase the user's connection speed, users can detect which Web servers are serving pages quickly. This means that the Web site administrator must serve pages quickly to hold the attention of the user.
Edge caching solves this problem by caching the output of the dynamic pages, reducing the amount of execution the server needs to handle. Reducing the load on the server allows for more pages to be served with the same amount of hardware, while removing the execution from the equation results in better site performance. Typically 60-70 percent of the response time is taken in generating the page, while 20-30 percent is needed to transport the bytes to the browser. By removing the response generation, the page is served much faster. When caching, the Web administrator is forced to sacrifice real time dynamic pages for pages that contain some latency. The pages might be just ten minutes old, but this sacrifice can significantly improve performance.
Wayne Berry is the president and CEO of Post Point Software (Bellingham, WA).
|Printer friendly Cite/link Email Feedback|
|Title Annotation:||Technology Information; Web sites|
|Publication:||Computer Technology Review|
|Date:||Jan 1, 2001|
|Previous Article:||Why Fiber Optics Can't Solve Today's Broadband Shortage.|
|Next Article:||Next Generation Internet: The "Fourth Tier" Is Born.|
|ANGARA DATA SERVER USES MAIN MEMORY AS DATA STORE.|
|Webway Sigalerts--And How To Unsnarl Them.|
|PMC-SIERRA INTRODUCES 600 MHZ PIN-COMPATIBLE RM7000 64-BIT MIPS-BASED PROCESSORS.|
|Sabrix certified on Oracle9i application server.|