New techniques for best-match retrieval
- 1 April 1990
- journal article
- Published by Association for Computing Machinery (ACM) in ACM Transactions on Information Systems
- Vol. 8 (2) , 140-158
- https://doi.org/10.1145/96105.96111
Abstract
A scheme to answer best-match queries from a file containing a collection of objects is described. A best-match query is to find the objects in the file that are closest (according to some (dis)similarity measure) to a given target. Previous work [5, 331] suggests that one can reduce the number of comparisons required to achieve the desired results using the triangle inequality, starting with a data structure for the file that reflects some precomputed intrafile distances. We generalize the technique to allow the optimum use of any given set of precomputed intrafile distances. Some empirical results are presented which illustrate the effectiveness of our scheme, and its performance relative to previous algorithms.Keywords
This publication has 23 references indexed in Scilit:
- NEAREST NEIGHBOUR SEARCHING IN BINARY SEARCH TREES: SIMULATION OF A MULTIPROCESSOR SYSTEMJournal of Documentation, 1987
- Nearest neighbour searching in serial files using text signaturesJournal of Information Science, 1985
- Hierarchical file organization and its application to similar-string matchingACM Transactions on Database Systems, 1983
- Expected-time complexity results for hierarchic clustering algorithms which use cluster centresInformation Processing Letters, 1983
- Tree structures for high dimensionality nearest neighbor searchingInformation Systems, 1982
- The nearest neighbour problem in information retrievalACM SIGIR Forum, 1981
- Optimal Expected-Time Algorithms for Closest Point ProblemsACM Transactions on Mathematical Software, 1980
- On the estimation of the number of desired records with respect to a given queryACM Transactions on Database Systems, 1978
- A probabilistic minimum spanning tree algorithmInformation Processing Letters, 1978
- The best-match problem in document retrievalCommunications of the ACM, 1974