Self-spacial join selectivity estimation using fractal concepts
- 1 April 1998
- journal article
- Published by Association for Computing Machinery (ACM) in ACM Transactions on Information Systems
- Vol. 16 (2) , 161-201
- https://doi.org/10.1145/279339.279342
Abstract
The problem of selectivity estimation for queries of nontraditional databases is still an open issue. In this article, we examine the problem of selectivity estimation for some types of spatial queries in databases containing real data . We have shown earlier [Faloutsos and Kamel 1994] that real point sets typically have a nonuniform distribution, violating consistently the uniformity and independence assumptions. Moreover, we demonstrated that the theory of fractals can help to describe real point sets. In this article we show how the concept of fractal dimension, i.e., (noninteger) dimension, can lead to the solution for the selectivity estimation problem in spatial databases. Among the infinite family of fractal dimensions, we consider here the Hausdorff fractal dimension D 0 and the “Correlation” fractal dimension D 2 . Specifically, we show that (a) the average number of neighbors for a given point set follows a power law, with D 2 as exponent, and (b) the average number of nonempty range queries follows a power law with E − D 0 as exponent ( E is the dimension of the embedding space). We present the formulas to estimate the selectivity for “biased” range queries, for self-spatial joins, and for the average number of nonempty range queries. The result of some experiments on real and synthetic point sets are shown. Our formulas achieve very low relative errors, typically about 10%, versus 40%–100% of the formulas that are based on the uniformity and independence assumptions.Keywords
This publication has 14 references indexed in Scilit:
- Efficient and effective Querying by Image ContentJournal of Intelligent Information Systems, 1994
- Fast subsequence matching in time-series databasesPublished by Association for Computing Machinery (ACM) ,1994
- Optimal histograms for limiting worst-case error propagation in the size of join resultsACM Transactions on Database Systems, 1993
- Mining association rules between sets of items in large databasesPublished by Association for Computing Machinery (ACM) ,1993
- Analytical results on the quadtree decomposition of arbitrary rectanglesPattern Recognition Letters, 1992
- On the propagation of errors in the size of join resultsPublished by Association for Computing Machinery (ACM) ,1991
- An optimized box-assisted algorithm for fractal dimensionsPhysics Letters A, 1990
- The R*-tree: an efficient and robust access method for points and rectanglesPublished by Association for Computing Machinery (ACM) ,1990
- Implications of certain assumptions in database performance evauationACM Transactions on Database Systems, 1984
- R-treesPublished by Association for Computing Machinery (ACM) ,1984