Statistical profile estimation in database systems
Open Access
- 1 September 1988
- journal article
- research article
- Published by Association for Computing Machinery (ACM) in ACM Computing Surveys
- Vol. 20 (3) , 191-221
- https://doi.org/10.1145/62061.62063
Abstract
A statistical profile summarizes the instances of a database. It describes aspects such as the number of tuples, the number of values, the distribution of values, the correlation between value sets, and the distribution of tuples among secondary storage units. Estimation of database profiles is critical in the problems of query optimization, physical database design, and database performance prediction. This paper describes a model of a database of profile, relates this model to estimating the cost of database operations, and surveys methods of estimating profiles. The operators and objects in the model include build profile, estimate profile, and update profile. The estimate operator is classified by the relational algebra operator (select, project, join), the property to be estimated (cardinality, distribution of values, and other parameters), and the underlying method (parametric, nonparametric, and ad-hoc). The accuracy, overhead, and assumptions of methods are discussed in detail. Relevant research in both the database and the statistics disciplines is incorporated in the detailed discussion.Keywords
This publication has 26 references indexed in Scilit:
- Estimating block selectivitiesInformation Systems, 1984
- Implications of certain assumptions in database performance evauationACM Transactions on Database Systems, 1984
- Estimating record selectivitiesInformation Systems, 1983
- Optimization Algorithms for Distributed QueriesIEEE Transactions on Software Engineering, 1983
- Estimating block accesses and number of records in file managementCommunications of the ACM, 1982
- Query processing in a system for distributed databases (SDD-1)ACM Transactions on Database Systems, 1981
- Support for repetitive transactions and ad hoc queries in System RACM Transactions on Database Systems, 1981
- Variable Kernel Estimates of Multivariate DensitiesTechnometrics, 1977
- Analysis and performance of inverted data base structuresCommunications of the ACM, 1975
- Estimation of a multivariate densityAnnals of the Institute of Statistical Mathematics, 1966