Latency-Sensitive Hashing for Collaborative Web Caching

Kun-Lung Wu and Philip S. Yu
IBM T. J. Watson Research Center
30 Saw Mill River Road
Hawthorne, NY 10532
{klwu,psyu}@us.ibm.com

Abstract

Many geographically distributed proxies are increasingly used for collaborative web caching to improve performance. In hashing-based collaborative web caching, the response times can be negatively impacted for those URL requests hashed into geographically distant or overloaded proxies. In this paper, we present and evaluate a latency-sensitive hashing scheme for collaborative web caching. It takes into account latency delays due to both geographical distances and dynamic load conditions. Each URL request is first hashed into an anchor hash bucket, with each bucket mapping to one of the proxies. Secondly, a number of nearby hash buckets are examined to select the proxy with the smallest latency delay to the browser. Trace-driven simulations are conducted to evaluate the performance of this new latency-sensitive hashing. The results show that (1) with the presence of load imbalance due to skew in request origination or hot-spot references, latency-sensitive hashing effectively balances the load by hashing into geographically distributed proxies for collaborative web caching and (2) when the overall system is lightly loaded, latency-sensitive hashing effectively reduces latency delays by directing requests to geographically closer proxies.

Keywords: Latency-Sensitive Hashing, Proxy Caching, Collaborative Web Caching, Performance, and Load Balancing.


1. Introduction

The growth of the Internet has really been exploding. As a consequence, user response times for accessing the Web have become increasingly unsatisfactory. One popular approach to improving the Web performance is to deploy proxy cache servers between clients and content servers. With proxy caching, most of client requests can be serviced by the proxy caches, reducing latency delays. Network traffic on the Internet can also be significantly reduced, eliminating network congestion. In fact, many commercial companies are providing hardware and software products and solutions for Web caching, such as Inktomi, Network Appliance and Akamai Technologies. Some of them are using geographically distributed data centers for collaborative web caching. Namely, many geographically distributed proxies are increasingly used to cooperate in web caching.

To cooperate in web caching, a coordinating protocol is generally required. Hash routing, such as the cache array routing protocol (CARP) [Ross97] [Vall98] and consistent hashing [Karg99], is an emerging approach to coordinating a collection of cooperating proxy caches. Hashing partitions the entire URL space among the caches, creating a single logical cache. Each cache is responsible for requests belonging to the assigned partition. Requests are sent to the proper proxy caches based on the hash values of the corresponding URLs. Hashing can be computed either by the browsers or by the domain name servers (DNS).

As more and more geographically distributed proxies are used in collaborative web caching, however, response times tend to be negatively impacted for those requests hashed into geographically distant proxies or overloaded proxies. Distant proxies tend to incur longer network latency delays. Overloaded proxies can cause significant delays as well, no matter how close they are to the browsers. As a result, the overall system response times can be significantly degraded. Therefore, there is a strong need to consider the latency issue in hashing-based web caching among geographically distributed proxies.

One obvious solution to this latency problem is to avoid hashing requests into geographically distant proxies. In [Karg99], a user's geographical region is encoded into the hash value and sent by the browser to a DNS in its geographical region. The DNS then maps the encoded hash value to a proxy cache within the same region. Thus, requests are served only by proxies in a geographically close region. In this paper, we refer to this approach as the geographically clustered hashing, or GCH. It works well if the proxies within a region can adequately service all the requests originated within the same region. However, if workloads are skewed among regions, proxies in one region may be overloaded while those in another region are under-loaded. As a result, the degree of collaboration among proxies is limited by geographical locations.

In contrast, one can simply hash requests into all cooperating proxy caches regardless of geographical locations. This approach is referred to, in this paper, as the geographically distributed hashing, or GDH. Compared with GCH, load tends to be more balanced among all the geographically distributed cooperating caches. However, GDH does not take into account network latency delays due to geographical distances. It does not deal with hot spots, either. In the presence of hot spots, all the references to the hot spots are hashed into the same proxies. As a result, the proxies that handle the hot spots can easily become overloaded.

In this paper, we propose a new latency-sensitive hashing, or LSH, for collaborative web caching. Similar to GDH, it hashes requests into all proxies. But, it takes into account latency delays and potential overloaded proxies. In latency-sensitive hashing, a request is first hashed into an anchor hash bucket. Each hash bucket is mapped to one of the geographically distributed proxies. Secondly, a selection algorithm is used to pick a proxy among a small number of hash buckets adjacent to the anchor hash bucket. The selection is based on an objective to reduce network latency and to avoid creating overloaded proxies.

Trace-driven simulations were conducted to evaluate latency-sensitive hashing. We compared the average response times of client requests using the three hashing schemes. Specifically, we examined the impacts of skew in request origination (more browsers in one geographical region are active than those in others) and hot-spot references (a particular URL is accessed by many browsers). The results show that (1) in the presence of high skew in request origination, the geographically clustered hashing scheme suffers the most because of limited collaboration; (2) in the presence of hot-spot references, both geographically clustered hashing and geographically distributed hashing degrade in performance because they cannot effectively balance the load; and (3) latency-sensitive hashing is very effective in dealing with skew in request origination and hot-spot references because it takes into account not only network latency but also potential load imbalance.

As an alternative to hashing, a class of query-based approaches can also be used to coordinate a collection of geographically distributed proxies [Bowm94] [Danzig] [FanL98] [Wess98]. Such query-based approaches use the Internet Cache Protocol (ICP) to ask its neighboring proxies for a missed object. A client browser sends its request to a configured proxy cache. If found locally, the requested object is returned. Otherwise, the configured proxy asks, on the browser's behalf, its neighboring proxies for a copy of the missed object using multi-casting. As more geographically distributed proxies are used for collaborative web caching, the overhead of locating an object from neighboring proxies increases. For more detailed comparisons of query-based and hashing-based approaches to collaborative web caching, readers are referred to [Ross97].

There are many other papers on various issues of collaborative web caching, such as [FanL98] [Malp95] [Ross97] [WuY99a] [YuMa98]. However, none has dealt specifically with the network latency issues. Various load balancing techniques have also been proposed for the Web, such as [CaoI97] [Cola98] [Cisco] [Dias96] [Gold97] [WuY99b]. Most of them try to distribute requests evenly among a local cluster of servers. None has considered the latency issue of geographically distributed proxies.

The paper is organized as follows. Section 2 describes the details of the new latency-sensitive hashing scheme. Section 3 then presents our approach to evaluating the latency-sensitive hashing, including the simulation model, system parameters and workload characteristics. Section 4 shows the results from trace-driven simulations. Finally, Section 5 summarizes the paper.


2. Latency-sensitive hashing

To understand the basic idea of the new latency-sensitive hashing, let us first look at the geographically distributed hashing, GDH, through an example. Fig. 1 shows three geographically distributed clusters of proxies. Each cluster contains 3 proxies. The proxies within the same cluster are geographically close to each other and to the browsers residing in the same geographical region. Fig. 2 shows an example of hashing two requests, u1 and u2, originated from the proximity of cluster 1. It includes the latency delays of requests to all the proxies due to network distance for requests originating from region 1. The latency delays are 0.2 seconds to proxies a, b and c, 1.0 second to proxies d, e and f and 1.0 second to proxies g, h and i.

Assume that the hash values are partitioned into a sufficiently large number of hash buckets and each hash bucket is mapped to a proxy. Fig. 2 shows u1 is hashed into proxy a and u2 is hashed into proxy h. Assuming both requests can be found locally in their respective proxy caches, the response times can still differ quite substantially. For request u2, the network latency of 1.0 second is substantially larger than 0.2 seconds for request u1. This example illustrates the potential problem of hashing requests into more and more geographically distributed proxies. For those that are hashed into geographically distant proxies, the network latency can be a problem.


Fig. 1. Three geographically distributed clusters of proxies.

Fig. 2. An example of a geographically distributed hashing.

Such network latency issue can be solved if it is taken into account during hashing. Fig. 3 shows an example of applying latency-sensitive hashing to the same requests in Fig. 2. In Fig. 3, requests are first hashed into anchor hash buckets, similar to Fig. 2. For example, request u1 is hashed into the hash bucket mapped to proxy a and u2 is first hashed into the one mapped to proxy h. However, a selection algorithm is then applied to a small number of hash buckets that are close to the anchor hash bucket to pick a proxy that can lower the latency delay. For example, in Fig. 3, after examining 3 hash buckets that are mapped to proxies h, c and f, proxy c is chosen to serve request u2 because proxy c can lower the latency. For request u1, however, the same proxy a will be chosen since it provides the smallest latency delay. Note that we are examining neighboring hash buckets of the anchor bucket in the hash value space. We are not examining the geographically neighboring proxies of the one mapped to the anchor hash bucket.

Obviously, in latency-sensitive hashing, both the mapping of hash buckets to proxies and the selection window size are important to its performance. These two issues are interdependent. With proper assignment, the window size need not be large. The objective is to have, for every request, a high probability of finding within the selection window size a proxy that is geographically close to the browser. If we model the geographical distribution of cooperative proxies as distributed clusters, similar to Fig. 1, then each hash bucket can be mapped to one proxy from each cluster in a round-robin fashion. And the selection window size can simply be the number of clusters. For instance, in Fig. 1 and 2, we assign one proxy from each cluster to every consecutive hash bucket in a round-robin fashion. As a result, any selection window of size 3 will have a proxy from each one of the 3 clusters.


Fig. 3. An example of a latency-sensitive hashing.

If each cluster contains the same number of proxies, the mapping of hash buckets to proxy IDs is relatively simple. The proxy IDs can be directly assigned to the hash buckets in a straightforward manner. However, the number of proxies within a cluster can be different. In such cases, the mapping of hash buckets to proxies for latency-sensitive hashing is not obvious if the requests are to be evenly distributed to all proxies. As a result, we need a general mechanism to map hash buckets to proxy IDs.

In this paper, we present an indirect mapping scheme for latency-sensitive hashing. Instead of directly mapping each hash bucket into a proxy ID, we map each hash bucket to an index of a proxy ID array. From this proxy ID array, we then obtain the proxy ID for the hash bucket. Once we hash an URL to a proxy ID through this indirect mapping, we can select a proxy within the proxy ID array to lower the latency delay as well as balance the load. The indirect mapping scheme consists of two parts. The first part is the construction of the proxy ID array, PA. The second part is the assignment of the indices of the proxy ID array to hash buckets. Let us assume that there are N total clusters and the number of proxies in each cluster is denoted as Ci, i = 0, 1, ..., N-1.

The proxy ID array is constructed as follows. Proxy IDs are selected in a two-level round-robin fashion. Namely, first we select a cluster and then within each cluster we select a proxy ID, all in a round-robin manner. The size of PA is computed as N * LCMc, where LCMc is the l.c.m. of Ci, i = 0, 1, .... N-1. As an example, assume we have two clusters, with one cluster containing 2 proxies a, b, and the other cluster containing 3 proxies, c, d, e. Fig. 4 shows the assignment of proxy IDs to the proxy ID array. There are a total of 12 elements in the proxy ID array since the l.c.m. of 2 and 3 is 6. For the proxy ID array, we also show the indices to the array elements. If we treat the proxy ID array as a circular array, then any window size of N in the proxy ID array will have at least one proxy from each cluster.


Fig. 4. An example of an indirect mapping scheme for LSH.

After the proxy ID array is constructed, we can then create the mapping of hash buckets to the indices of the proxy ID array. We first construct a segment of hash buckets. Each hash bucket is assigned with an index of the proxy ID array. This segment is then repeated for a large enough number of times, say a few hundreds, to achieve good hashing results. Let us denote that the number of times a proxy ID from cluster j appear in the proxy ID array is nj. The construction of the hash bucket segment is to create the indirect mapping such that each proxy ID is mapped to LCMp hash buckets, where LCMp is the l.c.m. of nj, j = 0, 1, ..., N-1. Therefore, the total size of the hash bucket segment is LCMp * Ci.

As an example, Fig. 4 also shows the assignment of the indices of a proxy ID array to a hash bucket segment. Note that a proxy from the first cluster appears 3 times and a proxy from the second cluster appears 2 times in the proxy ID array. Each proxy ID will be assigned to 6 hash buckets in the segment. Therefore, the size of the hash bucket segment is 5 * 6 = 30. Since proxy a appears 3 times in the proxy ID array and the indices to them are 0, 4 and 8, we simply assign the first two hash buckets with 0, the second two hash bucket with 4 and the next two hash buckets with 8. Similarly, proxy c appears 2 times in the proxy ID array and the indices to them are 1 and 7. So, we assign 1 to three hash buckets and 7 to another 3 hash buckets.

This indirect mapping scheme can be easily generalized to the case where each cluster has proxies with unequal processing powers. Let Ri be the relative processing power of proxies in cluster i. The total size of the hash bucket segment becomes LCMp * CiRi. In the above example in Fig. 4, if proxies in the first cluster is twice as powerful as those in the second cluster, the size of the hash bucket segment is 6*(2*2+3)=42. And we will assign 4 hash buckets with 0, 4 hash buckets with 4 and another 4 hash buckets with 8.

Note that the example shown in Fig. 2 represents a special case of the indirect mapping. For example, the size of the proxy ID array for the case of Fig. 1 would have been 3 * 3 = 9. Since each proxy ID would appear in this proxy ID array only once, the size of the hash bucket segment would be 1 * 9 = 9. As a result, there is a one-to-one correspondence between the hash bucket segment and the proxy ID array. Therefore, we can directly map the proxy IDs to the hash buckets as indicated in Fig. 2. For the rest of this paper, we assumed that each cluster contains the same number of proxies.

Finally, besides network latency delays, load balancing is also important in the latency-sensitive hashing. If load balancing is not considered, all the requests from the proximity of a cluster might be hashed into the proxies in the same cluster. And it degenerates into the geographically clustered hashing, GCH. So, in latency-sensitive hashing, the load conditions of the proxies being examined need to be considered. If the load of a proxy is too high, then it should not be selected. Generally, it is easier for the DNS to obtain the load conditions of all the proxies. Thus, the DNS should be a better place to implement the latency-sensitive hashing, especially during the selection step.


3. Performance evaluation

We implemented a trace-driven simulator that models the three hashing schemes, GCH, GDH and LSH. There were a total of 9 proxies, organized into 3 geographical clusters (see Fig. 1). Each cluster has 3 proxies. Each proxy has the same amount of computing resources.

A request originates from the proximity of one of the proxies, say proxy p1. Then it is hashed into one of the proxies, say p2, incurring a latency delay, L. If the requested object can be found locally, then the request is returned to the browser, incurring another latency delay. If not found locally, then a cache miss delay, Cmiss, is added for the proxy to obtain the object from the content server. Both L and Cmiss were assumed to be uniformly distributed. The mean latency delay is determined by a latency matrix that models the network latency delays among the proxies. A request from the proximity of proxy p1 hashed into proxy p2 incurs a network latency delay which is uniformly distributed with the mean specified in the matrix between p1 and p2. Note that the latency matrix can be dynamically updated by the DNS to reflect the changing network and load conditions.

For each proxy, we implemented a CPU server and a cache manager. For the cache manager, we implemented an LRU stack. For the CPU server, there is a FIFO service queue. The service time for processing an HTTP request or reply is defined as Thttp. And the service time for looking up an object from its cache or storing an object into its cache is defined as Tcache. As a result, the response time for a request whose object can be found locally is (L + Thttp + Tcache + Thttp + L + Q). Here, Q represents the queue delay the request incurs waiting for the CPU. On the other hand, the response time is (L + Thttp + Tcache + Cmiss + Tcache + Thttp + L + Q), if the requested object is a cache miss. In the simulations, we assumed that Tcache is 1/2 * Thttp.

In our simulations, we conducted experiments to see the impacts of skew in request origination and hot-spot reference. For request origination skew, we assumed that the distribution of client IDs around the proximity of proxies follows a Zipf-like distribution [WuY99b] [Zipf49]. Zipf(x, M) is a parametric distribution where the probability of selecting the i-th item is proportional to 1/i(1-x), where x is a parameter and i belongs to {1, ..., M}. Fig. 5 shows three distributions of clients around the proximity of the 9 proxies, including Zipf(1.0, 9), Zipf(0.5, 9) and Zipf(0.25, 9). Zipf(1.0, 9) represents no skew and corresponds to a uniform distribution. Namely, client IDs are uniformly distributed around all proxies. Zipf(0.25, 9) represents the high skew case. In other words, a great majority of clients are in the proximity of proxies a, b and c.


Fig. 5. Distributions of clients around the proximity of each proxy cache.

To model references to a hot spot, we injected additional requests to a specific page from different clients during the simulations. Each client is equally likely to generate such requests to the hot spot object. We varied the percentage of total references to this hot page in the simulations. For example, a 5% hot-spot reference means that for each request from the trace, there is 95% chance that the next request for the simulation will be also from the trace, while a 5% chance it will be a request for the hot spot. The arrival time of a hot-spot reference was determined as follows. For each trace entry, a probability function was used to determine if a hot-spot reference should be issued at the moment. The client ID for such a hot-spot reference was chosen randomly from all clients, indicating that every client is equally likely to issue requests for the hot spot.

The proxy traces collected between 08/29/1996 and 09/09/1996 from Digital [DEC96] were used to drive our simulations. The sizes of these 12 days' traces vary from about 300K to over 1.3M entries per day. The results are similar for each trace. For this paper, we concentrated our reports on two of the largest traces with over 1.3M entries, 09-04 and 09-06 traces. These traces were used to represent requests made to the 9 proxies traces.

To determine the proper cache sizes for our simulations, we computed through simulation the maximal buffer size needed for each proxy cache if no replacement were to be required. For large traces, the size is about 620M bytes per cache. As a result, we used 310M bytes as the default cache size for most of our simulations.

The mean latency delays between any pair of proxies were defined in a latency matrix. The matrix element (a, a) represents the mean latency delay for a request originated in the proximity of proxy a and hashed also into proxy a. In the simulations, we assumed that mean intra-cluster latencies are all the same. Namely, any request originated in the proximity of a cluster will have the same mean latency delay to all the proxies in the same geographical cluster. Moreover, the mean inter-cluster latency is 1.0 second, compared with the mean intra-cluster latency of 0.2 seconds. In order to illustrate the issue of latency, we also assumed that the mean cache miss delay is 1.0 second, same as the mean inter-cluster latency delay.

In the simulations, we first assigned each client to the proximity of a proxy so that we can compute the latency delay from the latency matrix. The default CPU service time for processing an HTTP request, Thttp, was set to be 0.045 seconds. This was chosen so that the average CPU utilization during the peak hours (between 5:00 am to 12:00 noon PDT) is around 65%~70%, representing a moderately loaded system. But, we also used 0.030 seconds to examine the performance of a more lightly loaded system.

For the latency-sensitive hashing, the default selection window size was 3. The selection algorithm was very simple. If a proxy within the selection window improves latency and it is not currently overloaded, then it is chosen. Note that the mean intra-cluster latency delay of 0.2 seconds is much smaller than the mean inter-cluster latency delay of 1.0 second. Thus, so long as a proxy within the selection window is not overloaded and is in the same cluster as the browser originating the request, it will be chosen. In our simulations, the DNS collects the CPU load conditions of all proxies every 5 seconds.

We considered a proxy as being overloaded if (1) the maximum utilization of all proxies, i.e., the utilization of the most heavily utilized proxy at the moment, exceeds 80% and the utilization of the proxy exceeds the mean utilization of all proxies; or (2) the maximum utilization of all proxies is smaller than 80% and the utilization of the proxy is 20% greater than the mean utilization of all proxies. Note that the utilization of the most heavily utilized proxy indicates the level of load imbalance over the system. The idea was to be more conservative in selecting a geographically closer proxy if the overall system is highly skewed. On the other hand, if the overall system is less skewed, then we want to be more aggressive in hashing requests into geographically closer proxies. We did vary the parameters for the overload conditions and found that the results were not very sensitive to them. However, the performance of the latency-sensitive hashing is sensitive to the time interval between collecting the CPU load conditions.


4. Simulation results

4.1. Impact of request origination skew

Fig. 6 shows the average response times of the three hashing schemes under different degrees of skew using the 09-04 trace. The skew was generated following the Zipf-like distributions described in Fig. 5. As the degree of skew in request origination increases, the average response time of the geographically clustered hashing increases significantly. In GCH, requests originated in one cluster are serviced only by the proxies in the same cluster. As a result, the proxies in cluster 1 become overloaded while those in cluster 3 are under-loaded. Therefore the average response time over all requests for the GCH scheme is much worse than the other two schemes under high skew condition. This problem can also be seen in Fig. 7 in the coefficient of variation of response time. The coefficient of variation is defined as the ratio of the standard deviation of the response time over the mean response time. It measures the variation of a response time about the mean response time. In the case of GCH under the high skew condition, the coefficient of variation is substantially larger than those of the other two schemes. Even in the case of no skew in request origination, the coefficient of variation of response time is still higher for GCH. This suggests that load tends to be more unbalanced for GCH, even in the case of no skew in request origination.


Fig. 6. The impact of request origination skew on average responsetime.

Fig. 7. The impact of request origination skew on coefficient of variation.

Fig. 8 and 9 show the level of load imbalances of the three hashing schemes under no skew in request origination and high skew in request origination, respectively. In these two figures, we show the cumulative frequency of the maximal utilization among the proxies during the peak hours. The idea is that, by looking at the most utilized proxy, one can deduce if the load level of a collection of proxies is balanced or not [Cola98]. The server with the maximum utilization changes over time. However, if the maximum utilization is low, it means that no proxy is overloaded during that time period. In general, the load level of the system is more balanced if the cumulative frequency chart moves towards the upper-left corner and less balanced if it moves towards the lower-right corner. For example, in Fig. 8, the probability of the maximum utilization being less than or equal to 70% is only 0.2 for GCH. This indicates that there is an 80% chance that the maximum utilization is greater than 70% for GCH. In contrast, there is only about 22% chance that the maximum utilization exceeds 70% during the observation period for GDH. Thus, the geographically clustered hashing GCH is less balanced while the geographically distributed hashing GDH is more balanced. The dramatic degradation of GCH in the average response time and the coefficient of variation due to request origination skew can be clearly explained by the dramatic level of load imbalance shown in Fig. 9.


Fig. 8. The level of load imbalance with no skew.

Fig. 9. The level of load imbalance with high skew.

From Fig. 6-9, it clearly shows that GCH is very sensitive to skew in request origination. Because requests originating from one cluster is only serviced by the proxies within the same geographical cluster, GCH cannot effectively utilize proxies in other clusters to help balance the load. One the other hand, GDH is immune to the skew in request origination. This is because hashing is based on URL and thus the load distribution among the proxies remains the same regardless of skew in request origination.

Similar to GDH, the latency-sensitive hashing LSH can distribute requests among all the proxies. As a result, LSH handles skew in request origination much better than GCH. However, compared with GDH, LSH is slightly less balanced (see Fig. 6-9). This is because, in order to lower latency delays, LSH tends to choose a proxy within the same cluster as the browser originating the request. Nevertheless, LSH performs almost as good as GDH even in the presence of high skew in request origination.

4.2. Impact of hot-spot references

Even though GDH is not susceptible to skew in request origination, it can be very sensitive to references to hot spots. Because each URL is hashed into the same proxy cache no matter which browser issues the request, GDH can become quite unbalanced in the presence of hot-spot references.

Fig. 10 shows the average response times of the three hashing schemes in the presence of hot-spot references for both the 9-4 and 9-6 traces. For these experiments, there was no skew in request origination. For both traces, with 5% hot-spot references, the performance of GDH degrades significantly. On the other hand, GCH is less susceptible to hot-spot references for the 9-4 trace while highly sensitive for the 9-6 trace. For both traces, the latency-sensitive hashing handles hot-spot references rather effectively and is almost insensitive to them. This is due to the fact that LSH can select different proxies to offload the hot-spot references originating from different browsers.


Fig. 10. The impact of hot-spot references on average response time.

Fig. 11. The level of load imbalance with hot-spot references.

Fig. 11 shows the impact of hot-spot references on the level of load imbalance for the 9-6 trace. For these experiments, 5% references to the hot-spot were introduced. But, there was no skew in request origination. As indicated in Fig. 11, GDH becomes extremely unbalanced in the presence of 5% hot-spot references. In contrast, the latency-sensitive hashing is the most balanced among the three.

4.3. Impact of selection window size

Now that we have demonstrated that the latency-sensitive hashing effectively handles skew in request origination as well as hot-spot references, we want to show the impact of some system design parameters specific to LSH. Here, we examined the selection window size, w. We define w to be the number of hash buckets LSH examines, including the anchor bucket. Thus, LSH with w=1 is equivalent to geographically distributed hashing, GDH. Namely, requests are hashed into the proxies mapped to the anchor hash buckets. In his section, we examined four cases of w, 1, 2, 3 and 4.

Fig. 12 shows the average response times for both the 9-4 and 9-6 traces under different values of w. For these experiments, there were no skew in request origination and no hot-spot references. Furthermore, Thttp was set to be 0.03 seconds so that the overall system is lightly loaded. The overall cpu utilization during the peak hours was about 45%. Under such lightly loaded and well-balanced conditions, LSH performs better as the selection window size increases from 1 to 3. Since we only uses 3 clusters in our experiments, LSH performs the same for w=3 and w=4.

Fig. 13 also shows the average response times of latency-sensitive hashing with various values of w, but when the overall system is relatively unbalanced. In contrast to Fig. 12, for the experiments in Fig. 13, there were 5% hot-spot references and high skew in request origination. Moreover, the system was moderately loaded with Thttp set to be 0.045 seconds. The average CPU utilization among the proxies were about 65%~70%. In the presence of high skew in request origination as well as hot-spot references, LSH also performs better with w > 1. However, the average response time for the case of w=2 was slightly better than those for the cases of w=3 and 4.

The differences in average response time between Fig. 12 and 13 with respect to the selection window size can be explained by the level of load imbalance shown in Fig. 14 and 15. For a lightly loaded and relatively well balanced system in Fig. 14, the maximum utilization never exceeded 55% for all cases. As a result, a larger w enables more requests to be hashed into geographically closer proxies. Thus, the average response time is better. On the other hand, for a moderately loaded and unbalanced system in Fig. 15, the chance that the maximum utilization exceeds 80% is very high for all cases. Therefore, compared with w=2, the selection window size of 3 may cause too many requests to be hashed into geographically closer proxies, resulting in slightly less balanced system. However, it is still beneficial to have the selection choice with w > 1, because the system is highly unbalanced without choice for the case of w=1.


Fig. 12. The impact of selection window size when the system is lightly loaded and balanced.

Fig. 13. The impact of selection window size when the system is moderately loaded and unbalanced.

Fig. 14. The level of load imbalance when the system is lightly loaded and well balanced.

Fig. 15. The level of load imbalance when the system is moderately loaded and unbalanced.

5. Summary

In this paper, we presented a new latency-sensitive hashing scheme for collaborative web caching. Latency-sensitive hashing is very important as more and more geographically distributed proxy caches are being used for collaborative web caching. It takes into account latency delays due to network distances as well as skew in workload. It first hashes a request into an anchor hash bucket. Then, it selects a proxy from a small number of hash buckets with the goal to reduce the latency delay and improve load balance.

Trace-driven simulations were used to evaluate latency-sensitive hashing (LSH). We compared latency-sensitive hashing with geographically clustered hashing (GCH) and geographically distributed hashing (GDH). GCH hashes requests originated from one region into proxies within the same region. GDH hashes requests to all proxies regardless of geographical locations. The results show that (1) in the presence of skew in request origination among geographical regions, GCH performs poorly because collaboration is limited within the same region; (2) in the presence of hot-spot references, GDH fails because hot-spots are hashed into the same proxies; (3) LSH effectively handles both skew in request origination and hot-spot references by hashing requests among geographically distributed proxies; and (4) when the overall system is lightly loaded, LSH effectively reduces latency delays by hashing requests into geographically closer proxies.


References

[Bowm94] C. M. Bowman et al., "The Harvest Information Discovery and Access System," in Proc. of 2nd Int. World Wide Web Conference, pp. 763-771, 1994.

[CaoI97] P. Cao and S. Irani, "Cost-Aware WWW Proxy Caching Algorithm," in Proc. of USENIX Symp. on Internet Technologies and Systems, pp. 193-206, 1997.

[Cisco] Cisco Systems, "Scaling the Internet Web Servers," http://www.cisco.com/, Nov. 1997. White Paper.

[Cola98] M. Colajanni, P. S. Yu and D. Dias, "Analysis of Task Assignment Policies in Scalable Distributed Web-Server System," IEEE Trans. on Parallel and Distributed Systems, 9(6):585-600, June 1998.

[Danzig] P. Danzig, "NetCache architecture and deployment,"http://www.netapp.com/technology/level3/3029.html.

[Dias96] D. M. Dias et al., "A Scalable and Highly Available Web Server," in Proc. of IEEE COMPCON Conf. on Technologies for the Information Superhighway, pp. 85-92, 1996.

[DEC96] Digital Equipment Corporation, "Digital's Web Proxy Traces," http://ftp.digital.com/pub/DEC/traces/proxy/webtraces.html, 1996.

[FanL98] L. Fan et al., "Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol," in Proc. of ACM SIGCOMM 98, pp. 254-165, 1998.

[Gold97] G. Goldszmidt and G. Hunt, "NetDispatcher: A TCP Connection Router," Technical Report, RC 20853, IBM T. J. Watson Research Center, May 1997.

[Karg99] D. Karger et al., "Web Caching with Consistent Hashing," in Proc. of 8th Int. World Wide Web Conference, pp. 125-135, 1999.

[Malp95] R. Malpani, J. Lorch and D. Berger, "Making World Wide Web Caching Servers Cooperate," in Proc. of 4th World Wide Web Conference, 1995.

[Ross97] K. W. Ross, "Hash-Routing for Collections of Shared Web Caches," IEEE Network Magazine, pp. 37-44, Nov.-Dec. 1997.

[Vall98] V. Valloppillil and K. W. Ross, "Cache Array Routing Protocol v1.0," Internet Draft, http://ircache.nlarnr.net/Cache/ICP/draft-vinod-carp-v1-03.txt, Feb. 1998.

[Wess98] D. Wessels, "Squid Internet Object Cache," http://squid.nlanr.net/Squid/, 1998.

[WuY99a] K.-L. Wu and P. S. Yu, "Load Balancing and Hot Spot Relief for Hash Routing among a Collection of Proxy Caches," in Proc. of IEEE Int. Conf. on Distributed Computing Systems," pp. 536-543, 1999.

[WuY99b] K.-L. Wu and P. S. Yu, "Local Replication for Proxy Web Caches with Hash Routing," in Proc. of ACM Int. Conf. on Information and Knowledge Management, pp. 69-76, 1999.

[YuMa98] P. S. Yu and E. A MacNair, "Performance Study of a Collaborative Method for Hierarchical Caching in Proxy Servers," Computer Networks and ISDN Systems, 30:215-224, 1998.

[Zipf49] G. K. Zipf, Human Behaviour and the Principles of Least Effort, Addison-Wesley, Cambridge, MA, 1949.


Vitae

Kun-Lung Wu received the B.S. degree in Electrical Engineering from the National Taiwan University, Taipei, Taiwan, in 1982, and the M.S. and Ph.D. degrees in Computer Science from the University of Illinois at Urbana-Champaign, in 1986 and 1990, respectively.

Since 1990, he has been with the IBM Thomas J. Watson Research Center, Yorktown Heights, NY, where he is a member of the Software Tools and Techniques Group. His current research interests include infrastructure design issues of the WWW, personalization and data mining tools for the WWW, Internet applications and pervasive computing. He has published extensively in the areas of Web caching, database performance, disk subsystems, transaction and query processing, multimedia systems, mobile computing and reliable computing. Dr. Wu has published more than 50 papers in refereed journals and conferences and over 30 research reports. He holds or has applied for 15 US patents.

Dr. Wu is a member of the ACM and the IEEE Computer Society. He has served as an organizing and program committee member on various conferences. He has received various IBM awards, including Research Division Award and Invention Achievement Awards.

Philip S. Yu received the B.S. Degree in E.E. from National Taiwan University, Taipei, Taiwan, the M.S. and Ph.D. degrees in E.E. from Stanford University, and the M.B.A. degree from New York University. He is with the IBM Thomas J. Watson Research Center and currently manager of the Software Tools and Techniques group. His current research interests include data mining, Internet applications and technologies, database systems, multimedia systems, parallel and distributed processing, disk arrays, computer architecture, performance modeling and workload analysis. Dr. Yu has published more than 270 papers in refereed journals and conferences. He holds or has applied for 92 US patents.

Dr. Yu is a Fellow of the ACM and a Fellow of the IEEE. He is a member of the IEEE Data Engineering steering committee. He is on the advisory board of IEEE Transactions on Knowledge and Data Engineering. He was an editor of IEEE Transactions on Knowledge and Data Engineering and also a guest co-editor of the special issue on mining of databases. In addition to serving as program committee member on various conferences, he was the program co-chair of the 11th Intl. Conference on Data Engineering and the program chair of the 2nd Intl. Workshop on Research Issues on Data Engineering: Transaction and Query Processing. He served as the general chair of the 14th Intl. Conference on Data Engineering. He has received several IBM and external honors including Best Paper Award, 2 IBM Outstanding Innovation Awards, an Outstanding Technical Achievement Award, 2 Research Division Awards and 30th plateau of Invention Achievement Awards. He also received an IEEE Region 1 Award for contributions of "New Concepts in Electrical Engineering". Dr. Yu is an IBM Master Inventor and was recognized as one of the IBM's ten top leading inventors in 1999.