New sampling-based summary statistics for improving approximate query answers
- 1 June 1998
- proceedings article
- Published by Association for Computing Machinery (ACM)
- Vol. 27 (2) , 331-342
- https://doi.org/10.1145/276304.276334
Abstract
In large data recording and warehousing environments, it is often advantageous to provide fast, approximate answers to queries, whenever possible. Before DBMSs providing highly-accurate approximate answers can become a reality, many new techniques for summarizing data and for estimating answers from summarized data must be developed. This paper introduces two new sampling-based summary statistics, concise samples and counting samples, and presents new techniques for their fast incremental maintenance regardless of the data distribution. We quantify their advantages over standard sample views in terms of the number of additional sample points for the same view size, and hence in providing more accurate query answers. Finally, we consider their application to providing fast approximate answers to hot list queries. Our algorithms maintain their accuracy in the presence of ongoing insertions to the data warehouse.Keywords
This publication has 15 references indexed in Scilit:
- Probabilistic counting algorithms for data base applicationsPublished by Elsevier ,2003
- Query processing and optimization in Oracle RdbThe VLDB Journal, 1996
- The space complexity of approximating the frequency momentsPublished by Association for Computing Machinery (ACM) ,1996
- Processing queries for first-few answersPublished by Association for Computing Machinery (ACM) ,1996
- Optimal histograms for limiting worst-case error propagation in the size of join resultsACM Transactions on Database Systems, 1993
- APPROXIMATE-a query processor that produces monotonically improving approximate answersIEEE Transactions on Knowledge and Data Engineering, 1993
- A linear-time probabilistic counting algorithm for database applicationsACM Transactions on Database Systems, 1990
- Random sampling with a reservoirACM Transactions on Mathematical Software, 1985
- Approximate counting: A detailed analysisBIT Numerical Mathematics, 1985
- Counting large numbers of events in small registersCommunications of the ACM, 1978