Peer-to-peer information retrieval using self-organizing semantic overlay networks
Top Cited Papers
- 25 August 2003
- proceedings article
- Published by Association for Computing Machinery (ACM)
- p. 175-186
- https://doi.org/10.1145/863955.863976
Abstract
Content-based full-text search is a challenging problem in Peer-to-Peer (P2P) systems. Traditional approaches have either been centralized or use flooding to ensure accuracy of the results returned.In this paper, we present pSearch, a decentralized non-flooding P2P information retrieval system. pSearch distributes document indices through the P2P network based on document semantics generated by Latent Semantic Indexing (LSI). The search cost (in terms of different nodes searched and data transmitted) for a given query is thereby reduced, since the indices of semantically related documents are likely to be co located in the network.We also describe techniques that help distribute the indices more evenly across the nodes, and further reduce the number of nodes accessed using appropriate index distribution as well as using index samples and recently processed queries to guide the search.Experiments show that pSearch can achieve performance comparable to centralized information retrieval systems by searching only a small number of nodes. For a system with 128,000 nodes and 528,543 documents (from news, magazines, etc.), pSearch searches only 19 nodes and transmits only 95.5KB data during the search, whereas the top 15 documents returned by pSearch and LSI have a 91.7% intersection.Keywords
This publication has 11 references indexed in Scilit:
- Replication strategies in unstructured peer-to-peer networksPublished by Association for Computing Machinery (ACM) ,2002
- Search and replication in unstructured peer-to-peer networksPublished by Association for Computing Machinery (ACM) ,2002
- Enabling efficient content location and retrieval in peer-to-peer systems by exploiting locality in interestsACM SIGCOMM Computer Communication Review, 2002
- A scalable content-addressable networkPublished by Association for Computing Machinery (ACM) ,2001
- Fast supervised dimensionality reduction algorithm with applications to document categorization & retrievalPublished by Association for Computing Machinery (ACM) ,2000
- GlOSSACM Transactions on Database Systems, 1999
- Matrices, Vector Spaces, and Information RetrievalSIAM Review, 1999
- Large-scale information retrieval with latent semantic indexingInformation Sciences, 1997
- Efficient and effective Querying by Image ContentJournal of Intelligent Information Systems, 1994
- Indexing by latent semantic analysisJournal of the American Society for Information Science, 1990