A cost model for query processing in high dimensional data spaces
- 1 June 2000
- journal article
- Published by Association for Computing Machinery (ACM) in ACM Transactions on Database Systems
- Vol. 25 (2) , 129-178
- https://doi.org/10.1145/357775.357776
Abstract
During the last decade, multimedia databases have become increasingly important in many application areas such as medicine, CAD, geography, and molecular biology. An important research topic in multimedia databases is similarity search in large data sets. Most current approaches that address similarity search use the feature approach, which transforms important properties of the stored objects into points of a high-dimensional space (feature vectors). Thus, similarity search is transformed into a neighborhood search in feature space. Multidimensional index structures are usually applied when managing feature vectors. Query processing can be improved substantially with optimization techniques such as blocksize optimization, data space quantization, and dimension reduction. To determine optimal parameters, an accurate estimate of index-based query processing performance is crucial. In this paper we develop a cost model for index structures for point databases such as the R*-tree and the X-tree. It provides accurate estimates of the number of data page accesses for range queries and nearest-neighbor queries under a Euclidean metric and a maximum metric and a maximum metric. The problems specific to high-dimensional data spaces, called boundary effects, are considered. The concept of the fractal dimension is used to take the effects of correlated data into account.Keywords
This publication has 35 references indexed in Scilit:
- Similarity query processing using disk arraysACM SIGMOD Record, 1998
- Using extended feature objects for partial similarity retrievalThe VLDB Journal, 1997
- S3ACM SIGMOD Record, 1997
- The TV-tree: An index structure for high-dimensional dataThe VLDB Journal, 1994
- Similar shape retrieval using a structural feature indexInformation Systems, 1993
- Refinements to nearest-neighbor searching ink-dimensional treesAlgorithmica, 1991
- A retrieval technique for similar shapesACM SIGMOD Record, 1991
- Basic local alignment search toolJournal of Molecular Biology, 1990
- Analysis of an Algorithm for Finding Nearest Neighbors in Euclidean SpaceACM Transactions on Mathematical Software, 1979
- An Algorithm for Finding Best Matches in Logarithmic Expected TimeACM Transactions on Mathematical Software, 1977