PAC nearest neighbor queries: Approximate and controlled search in high-dimensional and metric spaces
- 7 November 2002
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- p. 244-255
- https://doi.org/10.1109/icde.2000.839417
Abstract
In high-dimensional and complex metric spaces, determining the nearest neighbor (NN) of a query object q can be a very expensive task, because of the poor partitioning operated by index structures-the so-called “curse of dimensionality”. This also affects approximately correct (AC) algorithms, which return as results a point whose distance from q is less than (1+ε) times the distance between q and its true NN. In this paper we introduce a new approach to approximate similarity search, called PAC-NN queries, where the error bound ε can be exceeded with probability δ and both ε and δ parameters can be tuned at query time to trade the quality of the result for the cost of the search. We describe sequential and index-based PAC-NN algorithms that exploit the distance distribution of the query object in order to determine a stopping condition that respects the error bound. Analysis and experimental evaluation of the sequential algorithm confirm that, for moderately large data sets and suitable ε and δ values, PAC-NN queries can be efficiently solved and the error controlled. Then, we provide experimental evidence that indexing can further speed-up the retrieval process by up to 1-2 orders of magnitude without giving up the accuracy of the resultKeywords
This publication has 14 references indexed in Scilit:
- Independent quantization: an index compression technique for high-dimensional data spacesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- A cost model for similarity queries in metric spacesPublished by Association for Computing Machinery (ACM) ,1998
- Approximate nearest neighborsPublished by Association for Computing Machinery (ACM) ,1998
- A cost model for nearest neighbor search in high-dimensional data spacePublished by Association for Computing Machinery (ACM) ,1997
- Nearest neighbor queries in metric spacesPublished by Association for Computing Machinery (ACM) ,1997
- The SR-treePublished by Association for Computing Machinery (ACM) ,1997
- Distance-based indexing for high-dimensional metric spacesPublished by Association for Computing Machinery (ACM) ,1997
- Combining fuzzy information from multiple systems (extended abstract)Published by Association for Computing Machinery (ACM) ,1996
- Efficient and effective Querying by Image ContentJournal of Intelligent Information Systems, 1994
- The R*-tree: an efficient and robust access method for points and rectanglesACM SIGMOD Record, 1990