Fast incremental maintenance of approximate histograms
- 1 September 2002
- journal article
- Published by Association for Computing Machinery (ACM) in ACM Transactions on Database Systems
- Vol. 27 (3) , 261-298
- https://doi.org/10.1145/581751.581753
Abstract
Many commercial database systems maintain histograms to summarize the contents of large relations and permit efficient estimation of query result sizes for use in query optimizers. Delaying the propagation of database updates to the histogram often introduces errors into the estimation. This article presents new sampling-based approaches for incremental maintenance of approximate histograms. By scheduling updates to the histogram based on the updates to the database, our techniques are the first to maintain histograms effectively up to date at all times and avoid computing overheads when unnecessary. Our techniques provide highly accurate approximate histograms belonging to the equidepth and Compressed classes. Experimental results show that our new approaches provide orders of magnitude more accurate estimation than previous approaches.An important aspect employed by these new approaches is a backing sample , an up-to-date random sample of the tuples currently in a relation. We provide efficient solutions for maintaining a uniformly random sample of a relation in the presence of updates to the relation. The backing sample techniques can be used for any other application that relies on random samples of data.Keywords
This publication has 21 references indexed in Scilit:
- Fast, small-space algorithms for approximate histogram maintenancePublished by Association for Computing Machinery (ACM) ,2002
- STHolesPublished by Association for Computing Machinery (ACM) ,2001
- Independence is goodPublished by Association for Computing Machinery (ACM) ,2001
- Space-efficient online computation of quantile summariesPublished by Association for Computing Machinery (ACM) ,2001
- Join synopses for approximate query answeringPublished by Association for Computing Machinery (ACM) ,1999
- Self-tuning histogramsPublished by Association for Computing Machinery (ACM) ,1999
- A logarithmic time sort for linear size networksJournal of the ACM, 1987
- Random sampling with a reservoirACM Transactions on Mathematical Software, 1985
- Implications of certain assumptions in database performance evauationACM Transactions on Database Systems, 1984
- Access path selection in a relational database management systemPublished by Association for Computing Machinery (ACM) ,1979