Improved histograms for selectivity estimation of range predicates
- 1 June 1996
- journal article
- Published by Association for Computing Machinery (ACM) in ACM SIGMOD Record
- Vol. 25 (2) , 294-305
- https://doi.org/10.1145/235968.233342
Abstract
Many commercial database systems maintain histograms to summarize the contents of relations and permit efficient estimation of query result sizes and access plan costs. Although several types of histograms have been proposed in the past, there has never been a systematic study of all histogram aspects, the available choices for each aspect, and the impact of such choices on histogram effectiveness. In this paper, we provide a taxonomy of histograms that captures all previously proposed histogram types and indicates many new possibilities. We introduce novel choices for several of the taxonomy dimensions, and derive new histogram types by combining choices in effective ways. We also show how sampling techniques can be used to reduce the cost of histogram construction. Finally, we present results from an empirical study of the proposed histogram types used in selectivity estimation of range predicates and identify the histogram types that have the best overall performance.Keywords
This publication has 20 references indexed in Scilit:
- Optimal histograms for limiting worst-case error propagation in the size of join resultsACM Transactions on Database Systems, 1993
- Statistical profile estimation in database systemsACM Computing Surveys, 1988
- Simultaneous estimation of several percentilesSIMULATION, 1987
- The P 2 algorithm for dynamic calculation of quantiles and histograms without storing observationsCommunications of the ACM, 1985
- Random sampling with a reservoirACM Transactions on Mathematical Software, 1985
- Accurate estimation of the number of tuples satisfying a conditionPublished by Association for Computing Machinery (ACM) ,1984
- Access path selection in a relational database management systemPublished by Association for Computing Machinery (ACM) ,1979
- Probability Inequalities for Sums of Bounded Random VariablesJournal of the American Statistical Association, 1963
- The Kolmogorov-Smirnov Test for Goodness of FitJournal of the American Statistical Association, 1951
- Confidence Limits for an Unknown Distribution FunctionThe Annals of Mathematical Statistics, 1941