Accounting for boundary effects in nearest neighbor searching

1 January 1995

proceedings article
Published by Association for Computing Machinery (ACM)

p. 336-344
https://doi.org/10.1145/220279.220315

Abstract

Given n data points in d-dimensional space, nearest neighbor searching involves determining the nearest of these data points to a given query point. Most average- case analyses of nearest neighbor searching algorithms are made under the simplifying assumption that d is fixed and that n is so large relative to d that boundary effects can be ignored. This means that for any query point the statistical distribution of the data points surrounding it is independent of the location of the query point. However, in many applications of nearest neighbor searching (such as data compression by vector quantization) this assumption is not met, since the number of data points n grows roughly as 2d. Largely for this reason, the actual performances of many nearest neighbor algorithms tend to be much better than their theoretical analyses would suggest. We present evidence of why this is the case. We provide an accurate analysis of the number of cells visited in nearest neighbor searching by the bucketing and k-d tree algorithms. We assume md points uniformly distributed in dimension d, where m is a fixed integer ≥ 2. Further, we assume that distances are measured in the L1 metric. Our analysis is tight in the limit as d approaches infinity. Empirical evidence is presented showing that the analysis applies even in low dimensions. ∗ A preliminary version of this paper appeared in the Proc. of the 11th Annual ACM Symp. on Compu-

Keywords

This publication has 0 references indexed in Scilit: