Clustering for approximate similarity search in high-dimensional spaces
- 7 August 2002
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Knowledge and Data Engineering
- Vol. 14 (4) , 792-808
- https://doi.org/10.1109/tkde.2002.1019214
Abstract
We present a clustering and indexing paradigm (called Clindex) for high-dimensional search spaces. The scheme is designed for approximate similarity searches, where one would like to find many of the data points near a target point, but where one can tolerate missing a few near points. For such searches, our scheme can find near points with high recall in very few IOs and perform significantly better than other approaches. Our scheme is based on finding clusters and, then, building a simple but efficient index for them. We analyze the trade-offs involved in clustering and building such an index structure, and present extensive experimental results.Keywords
This publication has 28 references indexed in Scilit:
- PAC nearest neighbor queries: Approximate and controlled search in high-dimensional and metric spacesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- CLARANS: a method for clustering objects for spatial data miningIEEE Transactions on Knowledge and Data Engineering, 2002
- PBIRPublished by Association for Computing Machinery (ACM) ,2001
- PBIR - perception-based image retrievalACM SIGMOD Record, 2001
- Efficient search for approximate nearest neighbor in high dimensional spacesPublished by Association for Computing Machinery (ACM) ,1998
- Query by image and video content: the QBIC systemComputer, 1995
- Nearest neighbor queriesPublished by Association for Computing Machinery (ACM) ,1995
- The TV-tree: An index structure for high-dimensional dataThe VLDB Journal, 1994
- An algorithm for approximate closest-point queriesPublished by Association for Computing Machinery (ACM) ,1994
- The K-D-B-treePublished by Association for Computing Machinery (ACM) ,1981