Hint-Based Cooperative Caching.
Caching is an essential feature of distributed file systems because it filters out accesses to the slower layers of the storage hierarchy [Howard et al. 1988; Nelson et al. 1993; Sandberg et al. 1985]. File accesses that hit in the local client cache avoid network accesses to the server. Local cache misses that hit in the server cache avoid expensive disk accesses.
Although local client caches are effective, the performance of a distributed file system is often limited by the poor server cache hit rate [Dahlin et al. 1994]. Server cache misses cause disk accesses, which are an order of magnitude slower than a server cache hit. As a result, the average time to access a file block increases considerably, limiting performance even when the hit rate to the local client caches is high. Such a phenomenon has been observed even when the server is lightly loaded [Riedel and Gibson 1996].
There are two principal reasons for the poor hit rate on the server cache: capacity and locality. First, the number of server accesses tends to be proportional to the number of clients. As more clients are added, the hit rate on the fixed-size server cache decreases. Second, and more importantly, local client caches exploit the locality in file references, leaving little locality in the accesses to the (downstream) server cache. Consequently, the server hit rate suffers [Dahlin et al. 1994].
The poor locality of accesses to the server reduces the effectiveness of increasing the server cache size. Adding more memory to the server does reduce the cache miss rate, but adding enough memory to achieve a significant effect is prohibitively expensive. Furthermore, as the server cache tends to duplicate the contents of the client caches, overall performance is improved more if the additional memory is distributed over all the clients [Dahlin et al. 1994].
The goal of this research is to improve the performance of distributed file systems by reducing the effect of the poor server cache hit rate. This is achieved by adding another logical layer to the storage hierarchy, one that allows clients to access file blocks in the caches of other clients. This layer of the storage hierarchy is called the cooperative cache, and improves performance by reducing server accesses.
The performance of a cooperative caching system depends in part on the level of coordination required to manage the cooperative cache. The cooperative cache is unique among the layers of the storage hierarchy in that it is shared by all the clients, yet distributed across them. This means that any operation on the cooperative cache could potentially require coordination among clients. The level of coordination required affects the cooperative cache's performance. Tight coordination based on global state allows the cache to be managed efficiently, but incurs overhead. Loose coordination based on local state reduces overhead, but may cause inefficiencies in managing the cache's resources. The focus of our research is to examine this trade-off, and design a cooperative cache that is coordinated enough to be effective, without imposing undue overhead.
The major contribution of our work is to show that a cooperative cache is better managed using (possibly incorrect) local information about the state of the system, or hints, rather than (correct) global information, or facts. For example, hints about the location of a file's blocks in the cooperative cache are forwarded to a client when it opens the file. These hints tell the client about the probable locations of the file's blocks in the cooperative cache, and avoid the need to contact a manager on every subsequent access to the file. Clients also maintain hints about the ages of blocks cached on other clients, allowing them to forward blocks to other clients without coordinating with a manager.
The benefit of hints is simple: hints are less expensive to maintain than facts, and as long as hints are highly accurate they will improve system performance. The use of hints enables simpler implementation strategies for cooperative caching and avoids the complexities of colocating managers with clients to reduce overhead in tightly coordinated algorithms. However, inaccurate hints can increase overhead and degrade performance. Thus the key challenge in designing a hint-based system is to ensure that the hints are highly accurate.
We have developed a cooperative caching system that uses hints (local state) instead of facts (global state) whenever possible. To evaluate the performance of hint-based cooperative caching, we first use trace-driven simulations to show that the hit ratios to the different layers of the storage hierarchy are as good as those of the tightly coordinated algorithms, but with significantly reduced overhead [Sarkar and Hartman 1996]. We also use the simulation results to estimate the benefits of colocating managers with clients, a technique that reduces network traffic, and find that colocation has little effect on the cache performance. The final result of our simulations show the effectiveness of hint-based cooperative caching over a wide range of network latencies; in contrast, the tightly coordinated algorithms do well only in low-latency environments. To validate our simulations we implemented a prototype hint-based system on a cluster of Linux machines, and measured its performance with real users over a period of one week. The prototype's performance was in line with that predicted by the simulation, reducing the average block access time to almost half that of NFS.
2. PREVIOUS COOPERATIVE CACHES
The cooperative cache is the layer in the storage hierarchy positioned between the local client cache and the server. The cooperative cache shares the physical memory of the clients with the local client cache, the distinction being that the cooperative cache in a client stores blocks accessed by other clients, while the local cache stores blocks accessed by the client itself. Whenever a client lacks a block in its local cache, it attempts to fetch the block from either the cooperative or the local caches of other clients. Similarly, when a client replaces a block from its local cache, it may forward the block to the cooperative cache on another client instead of discarding it. One complication is that the replaced block might be dirty. To simplify the protocols, and ensure that one client cannot lose data written by another client, all cooperative cache algorithms, including ours, write dirty blocks back to the server before forwarding them. This ensures that the server always has up-to-date copies of all blocks.
Cooperative caching involves three logical entities: clients, servers, and managers (Figure 1). Clients access file blocks stored on the servers, under the coordination of the managers. The coordination provided by the managers may include locating and replacing blocks in the client caches, but it varies among cooperative caching algorithms and serves as a major distinction between them.
[Figure 1 ILLUSTRATION OMITTED]
The cooperative cache is different from the other layers in the storage hierarchy because it is distributed over the clients and shares the client memories with the local client caches. As a result, the management of the cooperative cache may involve coordination between clients and managers. This coordination complicates cooperative caching; not only must clients coordinate to provide traditional caching functions such as block lookup, replacement, and consistency, but they must also coordinate to manage the size of the cooperative cache so that local cache hit rates are not affected. The cooperative cache also reduces server accesses, making the server cache less effective and suggesting that server memory should be reduced or used for another purpose.
The pioneering paper on cooperative caching by Dahlin et al.  described several schemes for implementing cooperative caching, and settled on one called N-Chance as providing the best performance with the lowest overhead. The xfs file system uses a refined version of N-Chance [Anderson et al. 1995]. A subsequent paper by Feeley et al.  described the Global Memory Service (GMS) that provides better performance than N-Chance as well as reduced overhead. Voelker et al. adapted GMS to deal with high client loads [Voelker et al. 1997].
The N-Chance algorithm dynamically partitions each client's cache into blocks needed by local applications and those stored in the cooperative cache. When a new block is brought into its cache, a client always replaces the oldest block on its LRU list. Whether the client forwards or discards the oldest block depends on the number of copies of the block in the client caches. Blocks with more than one copy are discarded, while blocks with only one copy, or singlets, are forwarded to another client chosen at random. A singlet is given a recirculation count of N when it is first forwarded. Should a forwarded block reach the end of a client's LRU list, its recirculation count is decremented, and it is only forwarded again if the count is positive (hence the name "N-Chance"). The recirculation count allows active clients that are forwarded blocks to reforward them, while ensuring that unused blocks are eventually flushed from the cache.
In the original N-Chance algorithm the file server is responsible for maintaining cache consistency and block location information. The server is contacted on all block movement in and out of the client caches and uses this information to satisfy block location requests from clients on a local cache miss. When a client writes a block, it informs the server which in turn invalidates copies of the block in other clients' caches. A client also contacts the server when it is necessary to determine the singlet status of a block.
The xfs file system uses a modified version of the N-Chance algorithm in which files are managed by managers that are logically distinct from the servers. Files are assigned to managers using a "First Writer" policy, in which the first client to write a file is designated its manager. This policy attempts to colocate a file's manager with the client that is using the file, eliminating network communication between the two.
GMS is more general than N-Chance because it is a distributed shared-memory system, of which cooperative caching is only one possible use. GMS is similar to N-Chance in that it uses managers to locate blocks in the client caches. Pages are distributed across managers to balance the load, with the exception that a nonshared page is always managed by the client that is using the page. Shared pages are assigned to managers using a global, static mapping. In this respect GMS differs from N-Chance, which uses the First Writer policy to assign files to managers. GMS also differs from N-Chance in that it is a shared-memory system and does not provide a consistency mechanism; consistency is the responsibility of the higher-level services built on top of GMS.
GMS decides to forward or discard a block depending on its number of copies. A local block is forwarded to the cooperative cache if no other copies exist (i.e., the block is a singlet). GMS differs from N-Chance in that each copy of a block is always tagged as to whether or not it is a singlet. The manager keeps track of the number of copies of each block and notifies the appropriate client when a block becomes a singlet.
GMS uses a more tightly coordinated replacement algorithm than N-Chance. Time is divided into epochs, and at the start of each epoch an initiator node collects information about blocks in the client caches and the client loads from all clients. The fraction of old blocks cached on each client is distributed to all the clients, and during the epoch the clients use this information to forward blocks to other clients such that the probability of choosing a client is proportional to its fraction of old blocks. In this way GMS approximates LRU replacement. The trade-off between accuracy and overhead can be adjusted by varying the length of the epoch.
3. HINT-BASED COOPERATIVE CACHING
The previous cooperative caching algorithms rely on managers to control the cooperative cache, performing such operations as keeping track of where blocks are located and which blocks should be replaced. Although this reliance on managers allows these algorithms to make accurate decisions, it also increases manager load. Our goal is to relax the control of the cooperative cache, and allow clients to make decisions based on local information. This should improve performance by allowing clients to access the cache without consulting a manager. This allows a manager in a hint-based system to support more clients, and makes it possible to simplify the system architecture by not having to implement a distributed management service across a collection of managers.
Reducing clients' dependence on managers is achieved through hints. The decisions made using hints may not be accurate, but managing hints is less expensive than managing facts. Hints do not need to be consistent throughout the system, eliminating the need for coordination of state changes. As long as the overhead eliminated by not using facts more than offsets the cost of making mistakes, the strategy of using hints will pay off. To offset some of the penalty caused by incorrect hints, we use the server memory as a discard cache that holds blocks that were potentially replaced in error. This reduces the number of server disk accesses caused by incorrect hints.
3.1 Block Lookup
A client handles a local cache miss by retrieving the missing block from the lower levels of the storage hierarchy. This procedure is called block lookup and requires locating blocks in the caches of other clients. As a result, a block lookup in cooperative caching requires information about the contents of the client caches.
Previous cooperative caches use managers to keep track of which blocks are present in the client caches. Each manager is responsible for managing the location information for a subset of the file system's blocks, and a client performs a block lookup by sending a request to the manager for the desired block. All block movement in and out of the client caches must be reported to the managers, ensuring that they always have up-to-date information about the locations of blocks in the client caches. Block lookup is thus perfect, always returning the correct location of a block in the client caches if one exists. The downside is that the overhead is high; not only must clients contact a manager on every block lookup, but also when blocks move in and out of client caches. As the aggregate size of the application working sets approaches the aggregate size of the client memories, blocks move between client caches rapidly, greatly increasing the manager load. At some point the overhead of managing the cooperative cache exceeds its benefit, and it should be abandoned in favor of accessing the server directly, as noted by Feeley et al. .
Our system uses hints to keep track of block locations, trading accuracy for reduced overhead. A client obtains block location hints from the manager when the file is opened, and uses those hints to access the file's blocks. This not only avoids contacting the manager on every block lookup, but also provides the client with the locations of all the file's blocks, potentially allowing it to optimize the order in which it fetches blocks. However, if the location hints are not accurate then the lookups will not succeed, and thus the advantage of not contacting the manager is lost. The success of this strategy depends on maintaining highly accurate hints, while providing a mechanism for dealing with occasional errors. The lookup mechanism must return the correct location of a block in the storage hierarchy even if the block's location hints are incorrect.
3.1.1 Location Hint Accuracy. Clients use location hints to find blocks in the cooperative cache. One possibility is for the hints to store the locations of all copies of a block, but clearly this is overkill, as a client only really needs to know the location of one copy. Furthermore, as blocks move between clients the hints will quickly become out-of-date. For these reasons, our hint-based system keeps track of only one copy of each block, the master copy. The master copy of a block is the first copy to be cached by any client and is therefore the copy originally fetched from the server (Figure 2).
[Figure 2 ILLUSTRATION OMITTED]
As a result, a block location hint contains only the (probable) location of the block's master copy, and the locations of other copies are not recorded. Hints therefore refer to a relatively stable part of the block location state, increasing hint accuracy. When a client opens a file, the manager gives the client a set of hints containing the probable location of the master copy for each of the file's blocks. The manager obtains the file's block location hints from the last client to have opened the file, because that client is also likely to have accessed the contents of the file recently. As a result, this last client is likely to have accurate block location hints for the file. When a client forwards a master copy of a block to another client, both clients update their hints to reflect the new location of the master copy.
3.1.2 Handling Incorrect Hints. The lookup mechanism must ensure that a block lookup is successful, regardless of whether the hints are right or wrong. This is not as difficult as it sounds because cooperative caches employ write-through client caches, so that the server always has an up-to-date copy of each block. This simplifies the lookup mechanism:
(1) On a local cache miss the client consults its hint information for the block.
(2) If the hint information contains the probable location for the master copy of the block, the client forwards its request to this location. Otherwise, the request is sent to the server.
(3) The client that receives a forwarded request for a block consults its hint information for the block and proceeds to Step 3.1.2.
The result is that the access request is passed between clients until one is found that either has a copy of the block, or has no hint for the block, in which case the request is passed to the server. Although infinite loops can be avoided by having clients forward a previously seen request to the server, it is possible that a request will visit every client in the system before being handled. If this happens frequently the resulting block access times will be abysmal. A production system should therefore limit the number of times a block request is passed between clients before the block request is forwarded to the server. On the other hand, our results indicate that the block location hints are highly accurate (97%-98%) and that only 0.002% of the block requests are forwarded more than four times. This implies that even if we impose a limit that a block request can be forwarded at most four times, the cooperative cache hit ratio would decline by only 0.002% and would have a negligible effect on the average block access time (less than 0.01ms). Our prototype therefore does not limit the number of times it forwards a request before diverting it to the server.
Our measurements show this algorithm works well when the working set of the client applications fits easily in the aggregate size of the client memories. However, if the working set approaches the aggregate memory size, the replacement rate on the client cache will increase significantly. This will cause master copies to move rapidly between clients, rendering the location hints for the blocks inaccurate at a much faster rate. The inaccurate hints will decrease the hit ratio on the client caches, increase costly server accesses, and degrade performance.
3.2 Replacement Policy
The cache replacement policy determines the order in which cache blocks are replaced. In the steady state, every new block brought into the cache must replace an existing block. For a local client cache this is a local decision: the client always replaces the least valuable block in its cache (in terms of its effect on the cache hit rate), ideally the block that is accessed the farthest in the future. As the future is usually unknown, most caches use some variation of the least recently used (LRU) policy for replacing blocks.
The replacement decision is quite a bit more complicated in a cooperative cache. Replacing the least valuable block in a cooperative cache requires finding it among all the blocks cached on all the clients. The client caching the new block can no longer simply discard its locally least valuable block, but must first consider its value with respect to all the other blocks in the cooperative cache. If the client's block is the least valuable overall then it can indeed discard it; otherwise, it must forward the block to the client with the least valuable block so that it can be replaced.
Unfortunately, it is prohibitively expensive for every client to find and replace the least valuable block in the cooperative cache on every block replacement. For this reason N-Chance simply forwards blocks at random, while GMS distributes block age information at epoch boundaries and within epochs chooses clients as forwarding targets based on the distribution of old blocks. In contrast, our hint-based approach is more loosely coordinated; clients collect block age information from each other and use these locally maintained hints to make replacement decisions. No communication with a manager is needed when replacing a block. By exchanging block age information among themselves, the clients are able to maintain high enough hint accuracy to replace blocks effectively, without incurring the overhead of communicating with a manager.
3.2.1 Forwarding. The first decision a client must make when replacing a block is whether or not to forward the block to another client. A block should only be forwarded if it is not the least valuable block in the entire cooperative cache, and it should be forwarded to the client having the least valuable block. A block's value is in part determined by how many copies of it exist; only one copy of a block is needed in the cooperative cache to satisfy subsequent accesses. Multiple copies of the same block do not increase the cache hit rate, and may indeed decrease it by reducing the cache's effective size. A duplicate avoidance mechanism is therefore needed to prevent duplicate copies of the same block. To this end, an N-Chance client only forwards singlets, blocks with only one copy cached, and contacts the manager if a block is not known to be a singlet. GMS also only forwards singlets, but relies on the managers to notify clients when blocks become singlets.
These systems rely on the manager to determine whether or not a block should be forwarded. To avoid the overhead of maintaining the singlet status of every block, we propose a forwarding mechanism in which the copy to be forwarded is instead predetermined. In particular, only the master copy of a block is forwarded to the cooperative cache, and all other copies are discarded. The forwarding decision is therefore reduced to distinguishing the master copy from other copies. Clients make local copies from the master copies, but since only master copies are forwarded, and each block has only one master copy, there should be at most one copy of a block in the cooperative cache.
A complication with this scheme is that incorrect location hints can lead to the creation of multiple master copies of the same block. An incorrect block location hint may cause a client to fetch a block from the server, even though another client already has a master copy. This will create two master copies of the same block. Thus, the performance of the hint-based forwarding policy relies on the accuracy of the block location hints. Fortunately, our measurements show that less than 0.01% of the server accesses create a second master copy of a block.
Another potential drawback of the master copy algorithm is that it has different forwarding behavior than N-Chance or GMS. Instead of only forwarding the last existing copy of a block, the master copy algorithm forwards the first (master) copy. Our system will forward the master copy when it is replaced, even though other copies of the block may exist. However, our simulation results show that only 1.97% of all forwardings are unnecessary forwardings of a master copy.
3.2.2 Best-Guess Replacement. A forwarded block should be sent to the client with the least valuable block. Our hint-based algorithm chooses a target based on local information about the state of the client caches. We refer to this as best-guess replacement because each client chooses a target client that it believes has the system's least recently accessed ("oldest") block. The goal is to approximate global LRU, without requiring tight coordination with a manager or excessive communication between the clients. The challenge is that the block age information is distributed among all the clients, making it expensive to determine the best block to replace.
In best-guess replacement, each client maintains an oldest-block list that is a sorted list of what the client believes to be the age of the oldest block on each client. The algorithm is simple: a client forwards a block to the target client at the head of its list.
The high accuracy of best-guess replacement comes from exchanging oldest-block ages. When a block is forwarded, both clients exchange the age of their current oldest block, allowing each client to update its oldest-block list. This exchange of block ages allows both active clients (clients that are accessing the cooperative cache) and idle clients (clients that are not) to maintain accurate lists. Active clients have accurate lists because they frequently forward blocks. Idle clients are the targets of the forwards, keeping their lists up-to-date as well. Active clients will also tend to have young blocks, preventing other clients from forwarding blocks to them. In contrast, idle clients will tend to accumulate old blocks and therefore be the target of most forwarding requests.
Best guess replacement also adapts to changes in the behavior of a client. An active client that becomes idle will initially not be forwarded blocks, but its oldest block will age relative to the other blocks in the system. Eventually this block will be the oldest on the lists, and be replaced. On the other hand, an idle client that becomes active will initially have an up-to-date list because of the blocks it was forwarded while idle. Other clients may erroneously forward blocks to the newly active client, but once they do, their updated oldest-block lists will prevent them from making the same mistake twice.
Although best-guess replacement is based on local information, its deviation from the true global LRU replacement order is bounded. Consider a distributed file system with N clients. When one client forwards a block to another client, they exchange the ages of their current oldest blocks. After this exchange, these clients may subsequently replace blocks older than these two blocks, but they will not choose to replace younger blocks until these two are replaced. Therefore, after all possible exchanges between the N- i clients that do not have the globally LRU block, no client will replace a block younger than the globally LRU block. There are
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]
possible exchanges, so the replacement of the globally LRU block can be delayed by at most this many replacements. In other words, if a global LRU replacement policy replaces a particular block in the ith replacement, then best-guess replacement will replace the block at most by the (i + (N - 1) (N - 2)/2)th replacement.
One problem with allowing clients to select forwarding targets independently is that multiple clients may simultaneously choose the same target, overloading it. This forwarding storm happens when several clients believe that the same target client has the oldest block. N-Chance avoids this problem by forwarding blocks at random, and GMS by forwarding blocks probabilistically using the distribution of old blocks across the clients.
Although the hint-based system does not rely on randomness explicitly, best-guess replacement does reduce the probability that multiple clients would simultaneously forward their blocks to the same target. This is because each client that forwards a block to the same target receives a different answer for the age of the oldest block on the target, since each forwarded block replaces a different oldest block. Over time, the clients' oldest-block lists will contain different block age information even if they start out identical, reducing the probability of simultaneously choosing the same block to replace. This does not mean that a forwarding storm is impossible, however. For example, if there is one client that has all the oldest blocks, all the other clients will choose it as a target. Although we did not observe a forwarding storm in either our simulation or prototype, the possibility of one occurring still exists.
Best guess replacement could be modified to work with client loads as well as with block ages. The two clients involved in a replacement would simply exchange the age of their oldest blocks and their CPU loads, and CPU load could then be used to implement a load-balanced replacement policy. Though research has shown that client loads may be an issue [Voelker et al. 1997], we did not observe high client loads in the course of our experiments, and we did not incorporate client load considerations into our replacement policy.
3.3 Cache Consistency
The level of coordination in cooperative caching affects cache consistency. One solution is to use block-based consistency, but this requires contacting the manager on every local cache miss to locate an up-to-date copy, making hints useless for block lookup or replacement. Our system uses file-based consistency, in which the consistency of a file's cached blocks are checked when the file is opened. This consistency check is done as part of fetching the file's block location hints from the manager. Note that the benefit of file-based consistency diminishes with smaller files, since they will occupy only one or two blocks anyway. One possibility is to perform consistency over groups of files, but we have not investigated this option.
3.4 Discarded Cache
One drawback of best-guess replacement is that incorrect hints can cause erroneous replacements. A block may be forwarded to a client that does not have the oldest block; indeed, a block could be forwarded to a client whose oldest block is actually younger than the forwarded block. To offset these mistakes, we use the file server memory as a discard cache that holds blocks possibly replaced in error. A client chooses a forwarding target because it believes the target contains the oldest block. The target client considers the replacement to be in error if it does not agree with this assessment, and forwards this block to the discard cache. Subsequent accesses to the block can then be handled by the discard cache.
Two heuristics are used to determine whether the replacement of a block is erroneous. First, the target client checks whether the replaced block is a master copy. If the rep]aced block is not a master copy then it is discarded, and the replacement is not considered erroneous. Only master copies are accessed in the client caches, and mistakenly replacing a nonmaster copy will not affect the hit rate.
If the replaced block is a master copy, the target client compares the replaced block's age with the ages of the blocks on its oldest-block list. If the block is younger than any of the blocks on the list, then the replacement is deemed an error, and the block is forwarded to the discard cache. Otherwise, the block is discarded (Table I). The heuristic could probably be improved by basing it on more than only the age of the oldest block in each client cache, if this information were available to each client; however, the amount of information is limited by the best-guess replacement algorithm.
Table I. Discard Cache Policy. This table shows the policy for sending blocks to the discard cache. A master copy is old if it is older than all blocks in the oldest-block list; otherwise it is considered young.
Type of Block Action Nonmaster copy Discard Old master copy Discard Young master copy Send to discard cache
An alternative way of handling replacement mistakes is for the target client to consult its oldest-block list and forward the block to the target at the head of the list. However, this could trigger a cascade of block forwardings. The discard cache catches these mistakes without inducing such a cascade.
4. SIMULATION RESULTS
We use simulation to compare the performance of our hint-based system against N-Chance and GMS, as well as two ideal algorithms. The simulator of the hint-based and ideal algorithms we wrote ourselves. We obtained the N-Chance simulator from the developers of that system, and modified it to incorporate additional functionality employed in the xfs file system [Anderson et al. 1995]. In the modified system, a manager preferentially forwards a request to the client caches instead of a server, improving the cooperative caching hit rate and reducing the server load.
Similarly, we obtained the GMS simulator from its developers and modified it to add file-based consistency. GMS system does not maintain consistency; we added a consistency mechanism identical to that used in the hint-based algorithm.
4.1 Experimental Setup
The simulators were driven by the traces of the Sprite distributed file system [Baker et al. 1991]. These traces cover four two-day periods, and record file system accesses by user programs, such as opening and closing files, and seeking on file descriptors. Since these traces record application-level behavior, they are not Sprite-specific. Actual read and write events were not recorded, but can be inferred from file offsets in other records. We restricted our use of the traces to the main file server allspice. Table II shows statistics for each of the trace periods, while Table III shows the simulation parameters.
Table II. Trace Period Statistics. This table contains statistics for the four trace periods. Active clients refers to the number of clients that actually used the cooperative cache at any point during the period.
Trace Parameter 1 2 3 4 Block reads 276,628 2,011,915 261,023 343,189 Unique blocks accessed 53,349 13,108 33,063 75,273 Active clients 32 24 38 34
Table III. Simulation Parameters. This table describes the environment used to evaluate the various cooperative caching algorithms.
Clients 42 Managers 1 Server Cache Size 128 MB Local Memory Access Time 0.25 ms Disk Access Time 15.85 ms Warm-up Block Accesses 400,000 Servers 1 Client Cache Size 16 MB Block Size 8 KB Remote Memory Access Time 1.25 ms Write Policy write-through Message Latency 0.2 ms
Most of the simulation parameters are derived from the original study on cooperative caching by Dahlin et al. , simplifying performance comparisons with their published results. Although some of these parameters are now a bit dated, they were obtained from real systems.
Our simulation also assumed that there was a single manager handling functions such as cache consistency and block location. This makes it easier to measure the manager load imposed by the different systems, without introducing an algorithm to distribute the load over multiple managers. In our analysis, however, we do consider the effect of colocation on the results.
4.2 Evaluation Criteria
We use the following metrics to evaluate cooperative caching performance:
--Average Block Access Time: This metric is the average time required to access a file block. The access time is determined by the hit rates in the different layers of the storage hierarchy. Algorithms that make better use of the local and cooperative caches to avoid disk accesses will have smaller access times. Access time is only measured for block reads because all algorithms use write-through caches, so that writes are unaffected by cooperative caching.
--Overhead: This metric is the work required to implement cooperative caching. This overhead is broken down into manager load and network messages for block lookup and replacement. The manager load is expressed as the number of messages sent and received by the manager. This is a reasonable measure of manager load because each message represents work by the manager to coordinate cooperative caching.
4.3 Ideal Algorithms
For comparison purposes, the performance of two ideal cooperative caching algorithms--Optimal and Global LRU--is included. These algorithms assume accurate block lookup, duplicate avoidance, and consistency, but differ in their replacement policy. They provide a lower bound on average block access time, and thus an absolute yardstick against which other algorithms may be measured. As these algorithms are not practical, their overhead is ignored, and the mechanisms for implementing lookup, replacement, consistency, or duplicate avoidance are unspecified.
4.3.1 Optimal. The Optimal replacement policy for a cache always replaces the block whose next access is farthest in the future. It has been shown that this replacement policy is optimal because it minimizes the number of cache misses and therefore has the minimal block access time [Belady 1966].
The Optimal replacement algorithm must be slightly modified for use in a cooperative cache. There may be several blocks that are accessed equivalently far in the future (e.g., never), such that replacing any of them has the same effect on the hit rate. The cost of replacing a local block is less expensive than a remote, however; thus the Optimal algorithm preferentially replaces local blocks, if possible. Thus, the Optimal algorithm minimizes both the average block access time because it replaces a block accessed furthest in the future and the number of forwards because it chooses local blocks among all blocks accessed equivalently far in the future.
4.3.2 Global LRU. The Global LRU algorithm is the distributed version of the traditional LRU algorithm. The Global LRU algorithm approximates the Optimal algorithm by replacing the globally LRU block. In a real implementation, this algorithm is prohibitively expensive, as a client must query all other clients to collect information about their LRU blocks on every replacement.
The Global LRU replacement policy both determines the globally LRU block and minimizes the number of copies of a block. When a client needs to forward a block to the cooperative cache, it first checks to see if the block is a duplicate. If so it is discarded; otherwise it is forwarded to the client storing the globally LRU block, and the globally LRU block is discarded.
4.4 Simulation Results
Simulating the various cooperative caching algorithms on the Sprite traces validates the benefits of the hint-based approach. The performances of the algorithms are compared in terms of average block access time, manager load, lookup messages, and replacement messages. The effectiveness of the discard cache is also measured, as is the sensitivity of the block access time to variations in the simulation parameters.
Our earlier work on cooperative caching [Sarkar and Hartman 1996] also presented simulation results, but we subsequently discovered two errors in the simulation that affected some of the measurements. Although these errors did not affect our overall conclusions, we fixed our simulation and present the corrected results here.
The first error was related to the incorrect processing of the file identifier field in the traces by the simulators for the hint-based, GMS, Optimal, and Global LRU algorithms. This reduced the number of unique file identifiers, and erroneously increased the local cache hit ratio due to the resulting improved locality. The second error was a version-numbering problem in the hint-based algorithm that increased disk accesses and reduced cooperative cache hits.
On rerunning the experiments, we found that the errors had little affect on the average block access times in all the trace periods except for the second. In the second period, the average block access time of the affected algorithms was incorrectly reduced by a factor of three. Also affected was the sensitivity analysis, revealing a small divergence between the performance of the hint-based and ideal algorithms.
4.4.1 Block Access Time. The average block access times for the GMS and hint-based algorithm are very close to the idea] algorithms, and they spend similar amounts of time handling hits to the different levels of the storage hierarchy (Figure 3). The average block access time of the hint-based algorithm is about 3% worse than that of the other algorithms in the third period. This is caused by more block movement between client caches in this period than the other periods, and this decreases block location hint accuracy (Table IV).
[Figure 3 ILLUSTRATION OMITTED]
Table IV. Block Location Hint Accuracy in the Sprite traces. The row Hint Correctness refers to the percentage of correct block location hints in each trace period. The column Total refers to the combined hint accuracy across all periods.
Period 1 2 3 4 Total Hint Correctness (%) 99.68 99.98 99.07 99.54 99.94
The algorithms' block access times are very similar, even though the hint-based algorithm avoids manager communication. The small difference is caused by the small message latency used in the simulation, which reduces the benefit of avoiding manager contact. Obviously, higher message latencies, such as might be found on lower-performance or wide-area networks, would improve the hint-based performance relative to the other algorithms that must contact a manager. A analysis of the effect of network latencies on block access time appears at the end of this section. The hint-based performance would also benefit from a larger difference between computer and network speeds, although we have not quantified this effect.
The N-Chance algorithm has more disk accesses than the others in all of the trace periods except for the third. This is caused by N-Chance's random replacement policy, coupled with the low degree of sharing in these periods. The policy of randomly choosing a target client for replacement increases the probability that the replaced block was being used by the target client. The low degree of sharing makes it likely that there are no other copies of the randomly replaced block in the client memories. Consequently, a future access to the replaced block would likely result in a disk access. Further evidence of this phenomenon was also reported by Feeley et al. .
One possibility for reducing the block access times in the N-Chance and GMS systems is to colocate managers with the clients. Ideally, a block's manager would be the same machine as the client that is using it. Successful colocation will reduce block access times by eliminating network communication between the client and manager. Our simulation did not simulate colocated managers, but we can estimate the effect on the block access time. The xfs study [Anderson et al. 1995] found that the client was colocated with the manager in 12% of the read misses. This allows us to recompute the average block access times assuming that 12% of the read misses went directly to the client or server with the block, as appropriate. The block access time for direct client accesses is reduced by 0.2 ms (the message latency) to 1.45 ms, and server accesses to 16.05 ms. The overall effect is to reduce the average block access time by less than 0.01%.
4.4.2 Lookup Messages. The overhead imposed by lookup messages in a hint-based cooperative caching system depends on the accuracy of hints. If a hint is accurate, block lookup takes two messages, one to send a block request to a client or the server and one to respond with the requested block. If a hint is inaccurate, there is an additional message for each time the block request has to be forwarded to another client or the server.
The simulation shows that the block location hints for the client caches are highly accurate (Table IV). For only 0.01% of the local cache misses (averaged across all periods) is the desired block in the client caches, but the hints say otherwise. Conversely, when a hint says a block is in the client caches, it is correct for 99.94% of all local cache misses. Of these correct hints, 99.93% point to the actual location of the block, while the remaining result in requests being forwarded. The high hint accuracy and the small number of forwarded requests translate into an average of only 2.001 messages to perform a block lookup. In comparison, both N-Chance and GMS always require three messages per block lookup when the manager is not colocated, and two messages when it is. Since the manager is colocated in 12% of the lookups, these algorithms require an average of 2.88 messages per lookup.
4.4.3 Manager Load. The load imposed on the manager is one measure of the overhead of an algorithm. The less work a manager must do on the client's behalf, the more the number of clients it can handle. The results show that managing the client cache consistency imposes a very small load on the manager (Figure 4). This does not mean that the choice of a consistency algorithm does not affect system performance, only that it does not contribute significantly to manager load. File-based consistency is still important for enabling the use of hints for replacement and lookup.
[Figure 4 ILLUSTRATION OMITTED]
Replacement and lookup traffic account for nearly all of the manager load for the N-Chance and GMS algorithms. The clients must contact the manager each time a block is forwarded and each time a lookup is done, whereas the hint-based algorithm allows the clients to perform these functions themselves. The result is that the manager load is much higher for N-Chance and GMS.
The average message size in the hint-based algorithm is an average of 21% larger than in N-Chance and GMS, due to sending multiple hints when a file is opened, rather than one at a time as blocks are accessed. This can increase the manager load, but since the average size is still smaller than an Ethernet packet we do not expect a significant effect.
Colocation will also affect manager load, since it reduces the number of messages the manager must handle. The xfs paper does not provide statistics on the rate of colocation when blocks are forwarded, so we assumed that the manager was always colocated with the client forwarding the block. Even using this best-case assumption, we found that the manager load of the tightly coordinated algorithms is at least seven times higher than that of the hint-based algorithm in all of the traces.
4.4.4 Replacement Messages. About half of manager load in the N-Chance and GMS systems is caused by block replacement messages. The simulation allows us to normalize the results in terms of the number of replacement messages per local cache miss, and categorize what causes them (Figure 5). The N-Chance and GMS algorithms have three sources of replacement messages: forwarding the block to another client and notifying the manager; notifying the manager when a block is deleted; and exchanging messages between the clients and the manager to determine when a block should be discarded as opposed to forwarded. Except for the actual forwarding of the block to another client, all messages involve the manager, increasing its load. For best-guess replacement the only message required is the one to forward the master copy of a block to another client. This dramatically reduces the total number of replacement messages required per local cache miss for the hint-based algorithm.
[Figure 5 ILLUSTRATION OMITTED]
Manager colocation can reduce the amount of replacement-related manager traffic. A colocated manager during forwarding reduces the number of replacement messages by one. Similarly, no messages are required to delete a block or query its singlet status if the client and the manager for the block are colocated. The xfs study found that the manager is colocated in 68% of the block deletes, but does not provide numbers for colocation during forwarding nor singlet queries.
4.4.5 Discard Cache. The 128MB of server memory is a large fraction of the system's memory, and how it is used can have a large effect on the overall system performance. One possibility is to use it as a traditional disk cache; another is as part of the cooperative cache. Our hint-based system instead uses it as a discard cache, to mask mistakes made by the best-guess replacement policy. Unfortunately, the default 16MB client cache size used in the simulations leads to a cooperative cache that is so large that few accesses make it to the server. To measure the effectiveness of the discard cache compared to the other ways of using the server memory, we ran the simulations with the client caches reduced to 4MB and the server to 16MB. Reducing the cache sizes increases the miss rates on the local and cooperative caches, and therefore the load on the server.
The results (Table V) indicate that when the server memory is used as a traditional disk cache, it has a very low hit rate of 0.45% because most of the blocks it stores are duplicated in the local and cooperative caches. This results in an average block access time of 3.23 ms. If the server memory is instead used as part of the cooperative cache, the hit rate increases by nearly a factor of 4 to 1.83%, causing the access time to drop to 2.77 ms. Using the memory as a discard cache, however, further increases the hit rate to 2.46% and reduces the access time by nearly 10% to 2.57 ms.
Table V. Server Memory Usage. This table shows how different uses of the server memory affect the block access time of the hint-based system. Server memory is used as either a traditional disk cache, as part of the cooperative cache, or as a discard cache. The results are averaged across all trace periods.
Server Memory Hit Ratio Block Access Time (ms) Disk Cache 0.45% 3.23 Cooperative Cache 1.83% 2.77 Discard Cache 2.46% 2.57
4.4.6 Sensitivity Analysis. The results presented in the previous sections were based on a single system configuration, in which the number of clients, client cache size, number of servers, and other parameters were fixed. Although the hint-based algorithm performed well under the chosen configuration, its sensitivity to variations in the environment is also important. This section presents the sensitivity of the block access time to two environmental variables: the client cache size and the fraction of the clients that actively use the cooperative cache.
First, we measured the block access time as the cache size is varied from 4MB to 16MB (Figure 6). The remaining system parameters are unchanged. A smaller client cache increases the load on cooperative caching in two ways: first, it increases the local cache miss rates and therefore accesses to the caches of other clients; and second, it reduces the size of the cooperative cache. Even with 4MB caches the algorithms do a good job of finding and using the available idle memory, producing access times that are close to optimal. The exception is the N-Chance algorithm, whose policy of randomly forwarding blocks hurts performance when the working set size of client applications starts to approach the aggregate size of the client memories.
The average block access time of the hint-based algorithm is about 5% worse than the optimal when the client cache size is reduced to 4MB. This is because block locations hints are less accurate when the effective size of the cooperative cache is reduced.
The sensitivity of the block access time to the fraction of clients that are using the cooperative cache is important to the viability of the hint-based approach. Increasing the fraction of clients that use the cooperative cache increases the demand on the cache, and decreases the effective cooperative cache size. This combined effect increases the importance of managing the cooperative cache efficiently. As the fraction of clients using the cooperative cache is increased, the performance of the N-Chance algorithm declines at a faster rate than the remaining algorithms (Figure 7). Again, this is due to the random forwarding of blocks to other clients in N-Chance. The remaining algorithms all perform close to the optimal.
[Figure 7 ILLUSTRATION OMITTED]
Finally, we measured the sensitivity of the block access time to network latency. We reran the simulations using network latencies of 2 ms and 20 ms to represent MAN and WAN latencies, respectively [Caceres et al. 2000]. As the network latency increases, the average block access time of the hint-based algorithm increases more slowly than that of the other algorithms. This is primarily due to the latency of the additional message they incur (Figure 8). Furthermore, the average block access time of the hint-based algorithm is closer to that of the optimal algorithms than N-Chance and GMS. We conclude that the hint-based algorithm has the potential to perform well over a broad class of network latencies ranging from fast LANs to slower WANs.
[Figure 8 ILLUSTRATION OMITTED]
5. PROTOTYPE PERFORMANCE
After the simulation results indicated the benefits of the hint-based approach, we developed a prototype to demonstrate its usefulness under real workloads. We implemented the prototype on a cluster of machines running Linux [Beck et al. 1996] and NFS [Sandberg et al. 1985]. Linux and NFS imposed several constraints on the prototype:
--Linux uses the Global Clock algorithm to replace cache blocks, rather than LRU. The Global Clock algorithm is an approximation of the LRU algorithm, so the results obtained will not be identical to those from the simulation. The effect of not using an exact LRU order is hard to predict without detailed workload information, but both theory and practice suggest that this effect is minimal [Easton and Franaszek 1979; Voelker et al. 1997].
--The simulation assumed that clients do not reboot, an assumption that does not hold in the real world. A reboot is a problem because the blocks in the client's cache are lost, creating incorrect hints on other clients that refer to the now-empty cache.
We solved this problem in the prototype by maintaining a boot sequence number on each client that is incremented when the client reboots. When a block location hint is created, it is tagged with the boot sequence number of the client containing the block. Each client keeps track of the current boot sequence numbers of the other clients. This information is exchanged when a client receives a hint from another client, or uses a hint to contact another client. If the boot sequence number contained in a hint does not match the current boot sequence number of the client, the hint is not used.
--NFS does not provide strict cache consistency. NFS clients keep their caches consistent by periodically asking the NFS server whether a cached file had been modified. While the semantics of this consistency are not the same as used in the simulation, the granularity of this consistency mechanism is file-based, making it suitable for the hint-based approach.
--We could not implement the discard cache in the server because of the constraints imposed by the setup. The server is the main file server for the department and runs proprietary software, making modifications impossible. However, we did implement a discard cache on a designated machine to measure its usefulness.
--Finally, we were unable to obtain server cache miss rates from the server. As a result, server accesses were not segregated into cache and disk accesses.
5.1 Experimental Setup
We measured the performance of the hint-based cooperative caching file system on a cluster of 8 Pentium client workstations over a period of one week. Each workstation was a 200MHz Pentium Pro running Linux v2.0.23 and NFS v2, connected by 100Mbps switched Ethernet. The workstations were located on the desktops of faculty and students in the Department of Computer Science, and were configured so that most of the application binaries were accessed through the cooperative cache; the only binaries that bypassed the cooperative cache were those used during initial booting. The workstations rosewood, pelican, blackoak, delphin, carta, and omega were used by students and had 64MB of memory. The workstations roadrunner and cicada were used by faculty and had 128MB of memory. The server was a Network Appliance F520 machine. There was a single manager running on rosewood that coordinated the cooperative cache.
The workstations ran a mix of applications typical of an academic UNIX environment. In addition to the normal activities of UNIX users, several research groups used the workstations to edit, compile, and debug C source code. The workstations were also used as video servers and to develop Java-based applications. We used the workstations to build Linux kernels used in the hint-based cooperative caching prototype, and to perform some of the simulation runs.
The block size and average access times are shown in Table VI. The average times to fetch a 4KB block in the local cache, the cache of another client and the server are shown in the rows Local Memory Latency, Remote Memory Latency, and Server Access Latency. These values are the average latency to fetch 10,000 blocks from each layer of the storage hierarchy. The server access latency was measured repeatedly to account for differences at different times of the day and week. The average roundtrip time for forwarding a block using kernel-level SunRPC was also measured by taking the average for 10,000 messages and is shown in the row Forward Latency.
Table VI. Experimental Setup Parameters. This table lists the average access times and block size in the experimental setup. The average times to fetch a 4KB block in the local cache, the cache of another client, and the server are shown in the rows Local Memory Latency, Remote Memory Latency, and Server Access Latency. The average roundtrip time for forwarding a block using kernel-level SunRPC is shown in the row Forward Latency.
Local Memory Latency 0.1 ms Remote Memory Latency 0.5 ms Server Access Latency 12 ms Forward Latency 0.5 ms Block Size 4 KB
We measured the performance of the prototype by collecting statistics from every workstation every 15 minutes. The statistics included the hits to the various layers of the storage hierarchy, forwards, hint requests, hint accuracy, and the sizes of the local and cooperative caches.
5.2 Prototype Results
We evaluated the prototype performance using criteria similar to those used in the simulation. First, we measured the benefit of hint-based cooperative caching by estimating the average block access time with and without cooperative caching. To measure the effect of hint-based cooperative caching on the network and client activity, we monitored the overhead of maintaining hints in the file system. Finally, we evaluated each of the various design decisions made in hint-based cooperative caching--the accuracy of block location hints and the performance of best-guess replacement. We extrapolated the utility of the discard cache by designating an idle machine to serve as the host for an independent discard cache. All replacement errors were forwarded to the discard cache on this machine rather than the server. We monitored the hit rate on the discard cache as a measure of its effectiveness.
5.2.1 Client Activity Profiles. To understand better the performance of the file system, it is important to get an idea of the activity profile of every client. A good indicator of this is the composition of each client's cache (Figure 9). We draw two conclusions from these data. First, the ratio between the number of local cache and cooperative cache blocks depended on the level of activity of clients. Highly active clients such as carta had a larger fraction of local cache blocks, whereas mostly idle clients such as cicada had a correspondingly larger fraction of cooperative cache blocks. Second, the average percentage of local cache nonmaster copy blocks in the client caches was around 27.5%, which means that 27.5% of the blocks that clients accessed came from the cooperative cache on other clients.
[Figure 9 ILLUSTRATION OMITTED]
5.2.2 Block Access Time. One of the benefits of cooperative caching is that it can improve the average block access time (Table VII). We calculated the average block access time from the distribution of reads to the various layers of the storage hierarchy and the average times needed to access each layer. The average block access time with cooperative caching includes the extra remote client accesses caused by incorrect hints.
[TABULAR DATA VII NOT REPRODUCIBLE IN ASCII]
Measuring the average block access time without cooperative caching is difficult because we collected measurements of a running prototype that used cooperative caching. A controlled experiment would require running a control system on the same workload without cooperative caching; this was infeasible, so instead we extrapolated the performance of a system without cooperative caching from the measurements of the prototype. The simplest way of computing noncooperative caching performance from the prototype measurements is to replace cooperative cache hits in the latter with server accesses in the former. However, this ignores the fact that the cooperative cache may reduce the size of the local caches, and therefore the local cache hit ratios. Determining the exact effect of cooperative caching on the local cache hit rate is difficult; instead, we made the assumption that each forward to the cooperative cache of a client replaced a local cache block, decreasing the size of the local cache. Furthermore, we assumed that the replaced block would eventually be accessed again, resulting in one additional local cache miss and server access. Thus, if the number of local cache blocks in a client decreased over a time interval, we assumed that each forward to the cooperative cache of the client during that interval was responsible for replacing a local cache copy, increasing the local cache misses and server accesses by one. This is a conservative assumption, although it does not represent the worst case. Each forward decreases the size of the local cache by one block, which may cause the client's working set to no longer fit in the cache. If this happens, it is possible for an access pattern to occur in which every access results in a local cache miss (for all replacement policies except Optimal). Lacking any way to measure this effect in the prototype, we simply assume that such an access pattern does not happen.
The measurements show that the average block access time for all machines with cooperative caching was 1.01 ms, which was 85% faster than the average block access time without cooperative caching. Almost half of all local cache misses hit in the caches of other clients, reducing server accesses by the same amount. The result corroborates previous results [Dahlin et al. 1994] and shows that cooperative caching can reduce the average block access time by nearly a factor of two. This estimation used conservative assumptions, and in reality, cooperative caching's benefit would probably be larger.
5.2.3 Overhead. The effect of hint-based cooperative caching on the network and client activity is a concern. The biggest concern is the overhead messages required to manage the cooperative cache, which include hint requests to managers and clients, the extra accesses due to incorrect block location hints, and the forwards to the cooperative cache. Another concern is the overhead imposed on a client due to servicing requests for cache blocks.
We determined the overhead in the prototype by measuring the average rate (Figure 10) and maximum rate (Figure 11) of overhead messages processed by the clients during the week. Note that consistency did not account for any of the messages, because the manager does not enforce strict consistency and allows files to become temporarily inconsistent according to NFS conventions.
[Figures 10-11 ILLUSTRATION OMITTED]
The conclusion is that the rate of overhead messages in any client was not significant. The average overhead message size was 2218 bytes, and the cumulative average and maximum throughput requirements were 35 and 755Kbps, respectively. Given that the experiment was run on a 100Mbps switched Ethernet network, this represents an extremely small percentage of the available bandwidth.
Finally, we measured the overhead of servicing block requests on the clients. Our measurements indicated that there were an average and maximum of 0.07 and 23 block requests per second on a client. This translated to a throughput requirement of 2.25Kbps (average) and 738Kbps (maximum), insufficient to disrupt the client. We did not see the disruptions observed in GMS [Voelker et al. 1997], as the distribution of idle clients was significantly less concentrated.
5.2.4 Block Location Hints. To evaluate the accuracy of block location hints, we investigated whether hint-based cooperative caching locates blocks present in the caches of other clients (Table VIII). As can be seen, hints were correct an average of about 98% of the time. This result is almost identical to that obtained from the simulation.
[TABULAR DATA VIII NOT REPRODUCIBLE IN ASCII]
The results also showed that tagging block location hints with client boot sequence numbers solved the problem of reboots. There were 15 reboots of all the client machines in the experimental setup in the time period of one week, with some machines not having been rebooted at all, and some rebooted as many as five times. To verify the benefit, we did further experimentation without incorporating client boot sequence numbers. These results showed that with the same rate of rebooting, the block location hint accuracy dropped to 80%.
As a final measure, the average number of messages required for block lookup was found to be 2.002, indicating that lookup requests were rarely forwarded.
5.2.5 Best-Guess Replacement. To investigate the performance of best-guess replacement, we examined whether or not the prototype was removing the least valuable blocks from the client caches. One way to do this is to verify that the least recently used block is always the one replaced. Unfortunately, Linux does not use the LRU algorithm in its cache, making this impossible to verify.
Instead, we evaluated the accuracy of best-guess replacement by monitoring the activity of clients that were the targets of high rates of forwards from other clients. The activity of a client was estimated based on the number of forwards from that client, as forwarding blocks to another client is a definite sign of activity. The activity forwards of a client is the number of blocks it forwards during the time when the client itself is forwarded many blocks from other clients. If a client has many activity forwards, it means the client was busy when other clients forwarded it blocks. In contrast, having few activity forwards indicates that the client was idle when it was the target of forwards from other clients. To represent this better, we define the forward ratio of a client as the ratio of the activity forwards of the client to the total number of forwards from the client. A large forward ratio indicates that the client was busy when there was a high rate of forwards to that client, implying mistakes in best-guess replacement. On the other hand, if best-guess replacement was replacing the least valuable blocks from the client caches, then the forward ratio would be low.
Since what constitutes a "high" rate of forwarding is subjective, we measured forward ratios during intervals in which the forwarding rates to the client were 100, 500, and 1000 per hour (Figure 12). A higher forwarding rate to a client makes it less likely the client was actively forwarding blocks during the period of high rate of forwards, and consequently the forward ratio is lower.
[Figure 12 ILLUSTRATION OMITTED]
Overall, the forward ratio of all clients was between 0.2-1.1% during the periods of high rates of forwards. This means that clients were rarely busy when they were the target of high rates of forwards, and that best-guess replacement was doing a good job of selecting the target client. In fact, the maximum observed forward ratio was only 2.5%, assuming a forward rate of 100 per hour or greater.
5.2.6 Discard Cache. The clients contained 640MB of memory, which was much larger than the total average cache size of 150MB (Figure 9). As a result, there was relatively little forwarding traffic between the clients and relatively few chances for replacement errors. We therefore expected the discard cache to have little effect on overall system performance.
The measurements confirm this analysis (Table IX). The number of replacement errors was less than 1% of the total number of replacements, further testifying that best-guess replacement was doing a good job of choosing the target clients for replacement. Moreover the hit rate on the discard cache was very low at 0.01%, implying that there was no performance benefit by adding a discard cache to the experimental setup. However, as the simulation results have shown, the discard cache can become useful if the working set size of client applications starts to approach the aggregate size of the client memories.
[TABULAR DATA IX NOT REPRODUCIBLE IN ASCII]
6. RELATED WORK
Cooperative caching for file systems developed from remote memory research. The idea of remote memory servers in distributed systems was first introduced by Comer and Griffioen . Felten and Zahorjan proposed the use of idle machines as remote memory servers [Felten and Zahorjan 1991]. Franklin et al.  introduced the concept of remote client servers to extend the traditional client-server database architecture. They also proposed using the server memory to hold singlets that are replaced from client memories. This is similar to our discard cache, except their system forwards all singlets to the server when they are replaced, whereas our system only forwards master copies that are believed to have been replaced in error. Left et al.  showed that memory must be dynamically partitioned between local and remote client needs to maximize hit rates.
Our use of hints to perform block lookup is similar to the techniques used to perform page lookup in distributed shared-memory systems. Li and Hudak  describe several strategies for managing distributed shared pages, including a dynamic distributed manager algorithm in which nodes send page requests to the probable owner of the page. If the target node does not have the page, it forwards the request to the node it believes to be the probable owner. Unlike the hint-based algorithm we propose, all nodes keep track of probable owner information for all pages, so that the request eventually reaches the correct owner. Their results show that the probable owner information is quite accurate, and that the actual number of forwards is very small. Our hint-based algorithm also differs in that blocks can be forwarded to the cooperative cache, necessitating a distributed replacement policy. The work of Li and Hudak relies on the virtual memory systems of the individual machines to swap pages to disk, rather than forwarding pages to other nodes.
Shared Web caches also make use of hints to reduce overhead and access latency. Message latencies are higher in the Internet than in local area networks, so there is a considerable reduction in latency if a client contacts a Web proxy cache server directly instead of going through a location manager. For example, in the Summary Cache model [Fan et al. 1998], each proxy cache server maintains a summary of the contents of caches of other proxy cache servers and updates these summaries periodically. As in hint-based cooperative caching, the Summary Cache was able to reduce the message traffic between proxy cache servers considerably, without compromising the hit rate on these proxy cache servers. A similar mechanism is found in the Cache Digest protocol used in the Squid Internet Object cache.(1) One important variation from this approach is the protocol found in CrispySquid that tries to optimize latency in cache traffic rather than the overall hit ratio [Rabinovich et al. 1998].
Cooperative caching is also related to multiprocessor caching in shared-memory machines [Lenoski et al. 1990]. However, message latencies are typically greater across a network than a shared memory interconnect, leading to different techniques for reducing communication costs. This is a focus of distributed shared-memory research [Carter et al. 1991].
The discard cache is similar in purpose to the hardware-based victim cache proposed by Jouppi  to improve microprocessor performance. A victim cache is a small fully associative miss cache that is placed between a direct-mapped processor cache and the main memory system. The victim cache is loaded with the victim of a cache miss rather than the missed cache line itself. As a result, two cache lines that conflict in the processor cache can be cached in the victim cache, increasing performance. In essence, the victim cache catches replacement mistakes made by the processor cache because it is direct-mapped. The discard cache, in contrast, catches mistakes made because of incomplete information about block ages.
Cooperative caching is a technique that allows clients to access blocks stored in the caches of other clients. This enables some of the local cache misses to be handled by other clients, reducing server accesses and improving performance. However, cooperative caching requires coordination between clients and managers because of the distributed nature of the cooperative cache. Previous cooperative caching algorithms achieved this coordination by maintaining global information about the system state. This article presents a hint-based cooperative caching system that allows decisions to be made on loosely coordinated local state. The use of hints also allows for simpler implementation strategies for cooperative caching without involving the complexity of colocating managers with clients. Trace-driven simulations show that the hint-based system's hit ratios are as good as those of the previous systems, while substantially reducing overhead. The block access time of the hint-based system was comparable with more tightly coupled systems, but its manager load was reduced by at least a factor of seven. The prototype verified the simulations' results under a real workload, and showed that adding hint-based cooperative caching to NFS reduced the block access time by almost half.
We would like to thank Mike Dahlin for providing the N-Chance simulator and the Sprite trace information, and Mike Feeley for clarifications on the GMS system. We would also like to thank Olaf Kirch for clarifications on the NFS client implementation in Linux. Wanda Chiu, Matti Hiltunen, Mudita Jain, Dave Lowenthal, Anup Kuzhiyil, Rajesh Sundaram, Todd Proebsting, and the anonymous reviewers at OSDI and TOCS all provided much-appreciated comments on an early version of this article. We are also grateful to Miguel Castro for discovering the issue related to the incorrect processing of the file identifier in the simulations. Finally, we thank Andy Bavier, Patrick Bridges, Ian Murdock, Larry Peterson, and Oliver Spatschek for participating in the measurement of the hint-based cooperative caching file system.
(1) Rousskov, A. 1998. http://squid.nlanr.net/Squid/CacheDigest.
ANDERSON, T. E., DAHLIN, M. D., NEEFE, J. M., PATTERSON, D. A., ROSELLI, D. S., AND WANG, R. Y. 1995. Serverless network file systems. ACM SIGOPS Oper. Syst. Rev. 29, 5 (Dec.), 109-126.
BAKER, M. G., HARTMANN, J. H., KUPFER, J. H., SHIRRIF, K. W., AND OUSTERHOUT, J. K. 1991. Measurement of a distributed file system. In Proceedings of the 13th ACM Symposium on Operating Systems Principles (SOSP '91, Pacific Grove, CA, Oct. 13-16), H. M. Levy, Chair. ACM Press, New York, NY, 198-212.
BECK, M., BOHME, H., DZIADZKA, M., KUNITZ, U., AND MAGNUS, R. 1996. Linux Kernel Internals. Addison-Wesley, Reading, MA.
BELADY, L. A. 1966. A study of replacement algorithms for a virtual-storage computer. IBM Syst. J. 5, 2, 79-101.
CACERES, R., DUFFIELD, N., FELDMANN, A., GREENBERG, J. D., GREER, A., JOHNSON, R., KALMANEK, T., KRISHNAMURTHY, C. R., LAVELLE, B., MISHRA, D., RAMAKRISHNAN, P. P. R. J., TRUE, K. J., AND VAN DER MEMLE, F. D. 2000. Measurement and analysis of IP network usage and behavior. IEEE Commun. Mag. 38, 5 (May), 144-151.
CARTER, J. B., BENNETT, J. K., AND ZWAENEPOEL, W. 1991. Implementation and performance of Munin. In Proceedings of the 13th ACM Symposium on Operating Systems Principles (SOSP '91, Pacific Grove, CA, Oct. 13-16), H. M. Levy, Chair. ACM Press, New York, NY, 152-164.
COMER, D. E. AND GRIFFIOEN, J. 1990. A new design for distributed systems: The remote memory model. In Proceedings of the Summer 1990 Conference on USENIX (June). USENIX Assoc., Berkeley, CA, 127-135.
DAHLIN, M., WANG, R., ANDERSON, T., AND PATTERSON, D. 1994. Cooperative caching: Using remote client memory to improve file system performance. In Proceedings of the First USENIX Symposium on Operating Systems Design and Implementation (Monterey, CA, May). USENIX Assoc., Berkeley, CA, 267-280.
EASTON, M. C. AND FRANASZEK, P. A. 1979. Using bit scanning in replacement decisions. IEEE Trans. Comput. 28, 2 (Feb.), 133-141.
FAN, L., CAO, P., ALMEIDA, J., AND BRODER, A. 1998. Summary cache: A scalable wide-area cache sharing protocol. In Proceedings on SIGCOMM 1998 Conference (Feb.).
FEELEY, M. J., MORGAN, W. E., PIGHIN, F. H., KARLIN, A. R., AND LEVY, H. M. 1995. Implementing global memory management in a workstation cluster. In Proceedings of the 15th ACM Symposium on Operating System Principles (SOSP, Copper Mountain Resort, Colorado, U.S., 3-6 Dec.). ACM Press, New York, NY, 201-212.
FELTEN, E. W. AND ZAHORJAN, J. 1991. Issues in the implementation of a remote memory paging system. 91-03-09 (Mar.). University of Washington, Seattle, WA.
FRANKLIN, M. J., CAREY, M. J., AND LIVNY, M. 1992. Global memory management in a client-server DBMS architectures. In Proceedings of the 18th VLDB Conference (Aug.). 596-609.
HOWARD, J. H., KAZAR, M. L., MENEES, S. G., NICHOLS, D. A., SATYANARAYANAN, M., SIDEBOTHAM, R. N., AND WEST, M. J. 1988. Scale and performance in a distributed file system. ACM Trans. Comput. Syst. 6, 1 (Feb.), 51-81.
JOUPPI, N. 1990. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In Proceedings of the 17th International Symposium on Computer Architecture (ISCA '90, Seattle, WA, May). IEEE Press, Piscataway, NJ, 364-373.
LEFF, A., YU, P. S., AND WOLF, J. L. 1991. Policies for efficient memory utilization in a remote caching architecture. In Proceedings of the First International Conference on Parallel and Distributed Information Systems (Miami Beach, FL, Dec.). 198-207.
LENOSKI, D., LAUDON, J., GHARACHORLOO, K., GUPTA, A., AND HENNESSY, J. 1990. The directory-based cache coherence protocol for the DASH multiprocessor. In Proceedings of the 17th International Symposium on Computer Architecture (ISCA '90, Seattle, WA, May). IEEE Press, Piscataway, NJ, 148-159.
LI, K. AND HUDAK, P. 1989. Memory coherence in shared virtual memory systems. ACM Trans. Comput. Syst. 7, 4 (Nov.), 321-359.
NELSON, M. N., WELCH, B. B., AND OUSTERHOUT, J. K. 1993. Caching in the Sprite network file system. ACM Trans. Comput. Syst. 11, 2 (May), 228-239.
RABINOVICH, M., CHASE, J., AND GADDE, S. 1998. Not all hits are created equal: Cooperative proxy caching over a wide-area network. In Proceedings of the Third International WWW Caching Conference (June).
RIEDEL, E. AND GIBSON, G. 1996. Understanding customer dissatisfaction with underutilized distributed file servers. In Proceedings of the 5th NASA Goddard Space Flight Center Conference on Mass Storage Systems and Technologies (Sept.).
SANDBERG, R., GOLDBERG, D., KLEIMAN, S., WALSH, D., AND LYON, B. 1985. Design and implementation of the Sun network file system. In Proceedings of the Summer USENIX Conference (June). USENIX Assoc., Berkeley, CA, 119-130.
SARKAR, P. AND HARTMAN, J. H. 1996. Efficient cooperative caching using hints. In Proceedings of the 2nd Symposium on Operating System Design and Implementation (Nov.). 35-46.
VOELKER, G. M., JAMROZIK, H. A., VERNON, M. K., LEVY, H. M., AND LAZOWSKA, E. D. 1997. Managing server load in global memory systems. In Proceedings of the ACM SIGMETRICS Conference (June). 127-138.
Received: June 1998; revised: January 2000 and July 2000; accepted: September 2000
Authors' addresses: P. Sarkar, Storage Servers and Systems Department, IBM Almaden Research Center, 650 Harry Road, San Jose, CA 95120; J. H. Hartman, Department of Computer Science, The University of Arizona, Tucson, AZ 85721.
|Printer friendly Cite/link Email Feedback|
|Author:||SARKAR, PRASENJIT; HARTMAN, JOHN H.|
|Publication:||ACM Transactions on Computer Systems|
|Date:||Nov 1, 2000|
|Previous Article:||Manageability, Availability, and Performance in Porcupine: A Highly Scalable, Cluster-Based Mail Service.|
|Next Article:||Java Consistency: Nonoperational Characterizations for Java Memory Behavior.|