Optimal and approximate computation of summary statistics for range aggregates
- 1 May 2001
- conference paper
- Published by Association for Computing Machinery (ACM)
- p. 227-236
- https://doi.org/10.1145/375551.375598
Abstract
Fast estimates for aggregate queries are useful in database query optimization, approximate query answering and online query processing. Hence, there has been a lot of focus on “selectivity estimation”, that is, computing summary statistics on the underlying data and using that to answer aggregate queries fast and to a reasonable approximation. We present two sets of results for range aggregate queries, which are amongst the most common queries.First, we focus on a histogram as summary statistics and present algorithms for constructing histograms that are provably optimal (or provably approximate) for range queries; these algorithms take (pseudo-) polynomial time. These are the first known optimality or approximation results for arbitrary range queries; previously known results were optimal only for restricted range queries (such as equality queries, hierarchical or prefix range queries).Second, we focus on wavelet-based representations as summary statistics and present fast algorithms for picking wavelet statistics that are provably optimal for range queries. No previously-known wavelet-based methods have this property.We perform an experimental study of the various summary representations show the benefits of our algorithms over the known methods.Keywords
This publication has 9 references indexed in Scilit:
- Towards estimation error guarantees for distinct valuesPublished by Association for Computing Machinery (ACM) ,2000
- Optimal histograms for hierarchical range queries (extended abstract)Published by Association for Computing Machinery (ACM) ,2000
- Approximate computation of multidimensional aggregates of sparse data using waveletsPublished by Association for Computing Machinery (ACM) ,1999
- The Aqua approximate query answering systemPublished by Association for Computing Machinery (ACM) ,1999
- Wavelet-based histograms for selectivity estimationPublished by Association for Computing Machinery (ACM) ,1998
- Random sampling for histogram constructionPublished by Association for Computing Machinery (ACM) ,1998
- Online aggregationPublished by Association for Computing Machinery (ACM) ,1997
- Statistical profile estimation in database systemsACM Computing Surveys, 1988
- Accurate estimation of the number of tuples satisfying a conditionPublished by Association for Computing Machinery (ACM) ,1984