Fast parallel similarity search in multimedia databases
Open Access
- 1 June 1997
- journal article
- Published by Association for Computing Machinery (ACM) in ACM SIGMOD Record
- Vol. 26 (2) , 1-12
- https://doi.org/10.1145/253262.253263
Abstract
Most similarity search techniques map the data objects into some high-dimensional feature space. The similarity search then corresponds to a nearest-neighbor search in the feature space which is computationally very intensive. In this paper, we present a new parallel method for fast nearest-neighbor search in high-dimensional feature spaces. The core problem of designing a parallel nearest-neighbor algorithm is to find an adequate distribution of the data onto the disks. Unfortunately, the known declustering methods to not perform well for high-dimensional nearest-neighbor search. In contrast, our method has been optimized based on the special properties of high-dimensional spaces and therefore provides a near-optimal distribution of the data items among the disks. The basic idea of our data declustering technique is to assign the buckets corresponding to different quadrants of the data space to different disks. We show that our technique - in contrast to other declustering methods - guarantees that all buckets corresponding to neighboring quadrants are assigned to different disks. We evaluate our method using large amounts of real data (up to 40 MBytes) and compare it with the best known data declustering method, the Hilbert curve. Our experiments show that our method provides an almost linear speed-up and a constant scale-up. Additionally, it outperforms the Hilbert approach by a factor of up to 5.Keywords
This publication has 14 references indexed in Scilit:
- Nearest neighbor queriesPublished by Association for Computing Machinery (ACM) ,1995
- Efficient and effective Querying by Image ContentJournal of Intelligent Information Systems, 1994
- Techniques for automatically correcting words in textACM Computing Surveys, 1992
- Molecular docking using shape descriptorsJournal of Computational Chemistry, 1992
- Fast K-dimensional tree algorithms for nearest neighbor search with application to vector quantization encodingIEEE Transactions on Signal Processing, 1992
- Basic local alignment search toolJournal of Molecular Biology, 1990
- Optimal file distribution for partial match retrievalPublished by Association for Computing Machinery (ACM) ,1988
- Proximity: Fundamental AlgorithmsPublished by Springer Nature ,1985
- Disk allocation for Cartesian product files on multiple-disk systemsACM Transactions on Database Systems, 1982
- An Algorithm for Finding Best Matches in Logarithmic Expected TimeACM Transactions on Mathematical Software, 1977