High dimensional similarity joins: algorithms and performance evaluation
- 27 November 2002
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- p. 466-475
- https://doi.org/10.1109/icde.1998.655809
Abstract
Current data repositories include a variety of data types, including audio, images and time series. State of the art techniques for indexing such data and doing query processing rely on a transformation of data elements into points in a multidimensional feature space. Indexing and query processing then take place in the feature space. We study algorithms for finding relationships among points in multidimensional feature spaces, specifically algorithms for multidimensional joins. Like joins of conventional relations, correlations between multidimensional feature spaces can offer valuable information about the data sets involved. We present several algorithmic paradigms for solving the multidimensional join problem, and we discuss their features and limitations. We propose a generalization of the Size Separation Spatial Join algorithm, named Multidimensional Spatial Join (MSJ), to solve the multidimensional join problem. We evaluate MSJ along with several other specific algorithms, comparing their performance for various dimensionalities on both real and synthetic multidimensional data sets. Our experimental results indicate that MSJ, which is based on space filling curves, consistently yields good performance across a wide range of dimensionalities.Keywords
This publication has 12 references indexed in Scilit:
- High-dimensional similarity joinsIEEE Transactions on Knowledge and Data Engineering, 2002
- Partition based spatial-merge joinACM SIGMOD Record, 1996
- Fast subsequence matching in time-series databasesPublished by Association for Computing Machinery (ACM) ,1994
- Efficient processing of spatial joins using R-treesPublished by Association for Computing Machinery (ACM) ,1993
- An algorithm for computing the overlay of k-dimensional spacesPublished by Springer Nature ,1991
- The R*-tree: an efficient and robust access method for points and rectanglesPublished by Association for Computing Machinery (ACM) ,1990
- Redundancy in spatial databasesPublished by Association for Computing Machinery (ACM) ,1989
- Computational GeometryPublished by Springer Nature ,1985
- R-treesPublished by Association for Computing Machinery (ACM) ,1984
- The K-D-B-treePublished by Association for Computing Machinery (ACM) ,1981