Probabilistic location and routing

Abstract
We propose probabilistic location to enhance the per- formance of existing peer-to-peer location mechanisms in the case where a replica for the queried data item exists close to the query source. We introduce the attenuated Bloom filter , a lossy dis- tributed index. We describe how to use these data structures for document location and how to maintain them despite document motion. We include a detailed performance study which indicates that our algorithm performs as desired, both finding closer repli- cas and finding them faster than deterministic algorithms alone. I. INTRODUCTION Today's exponential growth in network bandwidth and stor- age capacity has inspired a whole new class of distributed, peer- to-peer storage infrastructures. Systems such as Farsite ( 1), Freenet (2), Intermemory (3), OceanStore (4), CFS (5), and PAST (6) seek to capitalize on the rapid growth of resources to provide inexpensive, highly-available storage without ce ntral- ized servers. The designers of these systems propose to achi eve high availability and long-term durability in the face of in di- vidual component failures through replication and coding tech- niques. Although wide-scale replication has the potential to incre ase availability and durability, it introduces two important c hal- lenges to system architects. First, if replicas may be place d anywhere in the system, how should we locate them? Sec- ond, once we have located one or more replicas, how should we route queries to them? We can formulate the combination of these two operations as a single location and routingproblem that efficiently routes queries from a client to the closest r eplica adhering to certain properties, such as the replica with the short- est network path to the client or the replica residing on the l east loaded server. In many cases, combining location and routing into a single, compound operation yields the greatest flexib ility to route queries quickly with minimal network overhead. The importance of such location-independent routingtechniques is well recognized in the community, and several proposals such as CAN (7), Chord (8), Pastry (9), and Tapestry (10) are cur- rently under study. These existing schemes share the characteristic that in the worst case, a location and routing operation requires O(logN) sequential network messages to search a distributed system of N servers. Some of these algorithms use substantially fewer messages to perform their task in the common case. This scala- bility is commendable, and it allows for the total query rout ing time to be close to optimal when the replica is far from the query source. However, as the replica approaches the location of the query source, the performance of the existing algorit hms quickly diverges from optimality. This divergence is easy t o

This publication has 14 references indexed in Scilit: