Optimal and approximate computation of summary statistics for range aggregates

1 May 2001

conference paper
Published by Association for Computing Machinery (ACM)

p. 227-236
https://doi.org/10.1145/375551.375598

Abstract

Fast estimates for aggregate queries are useful in database query optimization, approximate query answering and online query processing. Hence, there has been a lot of focus on “selectivity estimation”, that is, computing summary statistics on the underlying data and using that to answer aggregate queries fast and to a reasonable approximation. We present two sets of results for range aggregate queries, which are amongst the most common queries.First, we focus on a histogram as summary statistics and present algorithms for constructing histograms that are provably optimal (or provably approximate) for range queries; these algorithms take (pseudo-) polynomial time. These are the first known optimality or approximation results for arbitrary range queries; previously known results were optimal only for restricted range queries (such as equality queries, hierarchical or prefix range queries).Second, we focus on wavelet-based representations as summary statistics and present fast algorithms for picking wavelet statistics that are provably optimal for range queries. No previously-known wavelet-based methods have this property.We perform an experimental study of the various summary representations show the benefits of our algorithms over the known methods.

Keywords

This publication has 9 references indexed in Scilit:

Towards estimation error guarantees for distinct values
Published by Association for Computing Machinery (ACM) ,2000
Optimal histograms for hierarchical range queries (extended abstract)
Published by Association for Computing Machinery (ACM) ,2000
Approximate computation of multidimensional aggregates of sparse data using wavelets
Published by Association for Computing Machinery (ACM) ,1999
The Aqua approximate query answering system
Published by Association for Computing Machinery (ACM) ,1999
Wavelet-based histograms for selectivity estimation
Published by Association for Computing Machinery (ACM) ,1998
Random sampling for histogram construction
Published by Association for Computing Machinery (ACM) ,1998
Online aggregation
Published by Association for Computing Machinery (ACM) ,1997
Statistical profile estimation in database systems
ACM Computing Surveys, 1988
Accurate estimation of the number of tuples satisfying a condition
Published by Association for Computing Machinery (ACM) ,1984