PAC nearest neighbor queries: Approximate and controlled search in high-dimensional and metric spaces

7 November 2002

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

p. 244-255
https://doi.org/10.1109/icde.2000.839417

Abstract

In high-dimensional and complex metric spaces, determining the nearest neighbor (NN) of a query object q can be a very expensive task, because of the poor partitioning operated by index structures-the so-called “curse of dimensionality”. This also affects approximately correct (AC) algorithms, which return as results a point whose distance from q is less than (1+ε) times the distance between q and its true NN. In this paper we introduce a new approach to approximate similarity search, called PAC-NN queries, where the error bound ε can be exceeded with probability δ and both ε and δ parameters can be tuned at query time to trade the quality of the result for the cost of the search. We describe sequential and index-based PAC-NN algorithms that exploit the distance distribution of the query object in order to determine a stopping condition that respects the error bound. Analysis and experimental evaluation of the sequential algorithm confirm that, for moderately large data sets and suitable ε and δ values, PAC-NN queries can be efficiently solved and the error controlled. Then, we provide experimental evidence that indexing can further speed-up the retrieval process by up to 1-2 orders of magnitude without giving up the accuracy of the result

Keywords

This publication has 14 references indexed in Scilit:

Independent quantization: an index compression technique for high-dimensional data spaces
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
A cost model for similarity queries in metric spaces
Published by Association for Computing Machinery (ACM) ,1998
Approximate nearest neighbors
Published by Association for Computing Machinery (ACM) ,1998
A cost model for nearest neighbor search in high-dimensional data space
Published by Association for Computing Machinery (ACM) ,1997
Nearest neighbor queries in metric spaces
Published by Association for Computing Machinery (ACM) ,1997
The SR-tree
Published by Association for Computing Machinery (ACM) ,1997
Distance-based indexing for high-dimensional metric spaces
Published by Association for Computing Machinery (ACM) ,1997
Combining fuzzy information from multiple systems (extended abstract)
Published by Association for Computing Machinery (ACM) ,1996
Efficient and effective Querying by Image Content
Journal of Intelligent Information Systems, 1994
The R*-tree: an efficient and robust access method for points and rectangles
ACM SIGMOD Record, 1990