How many bins should be put in a regular histogram
Open Access
- 31 January 2006
- journal article
- research article
- Published by EDP Sciences in ESAIM: Probability and Statistics
- Vol. 10, 24-45
- https://doi.org/10.1051/ps:2006001
Abstract
Given an n-sample from some unknown density f on [0,1], it is easy to construct an histogram of the data based on some given partition of [0,1], but not so much is known about an optimal choice of the partition, especially when the data set is not large, even if one restricts to partitions into intervals of equal length. Existing methods are either rules of thumbs or based on asymptotic considerations and often involve some smoothness properties of f. Our purpose in this paper is to give an automatic, easy to program and efficient method to choose the number of bins of the partition from the data. It is based on bounds on the risk of penalized maximum likelihood estimators due to Castellan and heavy simulations which allowed us to optimize the form of the penalty function. These simulations show that the method works quite well for sample sizes as small as 25.Keywords
This publication has 18 references indexed in Scilit:
- Gaussian model selectionJournal of the European Mathematical Society, 2001
- Risk bounds for model selection via penalizationProbability Theory and Related Fields, 1999
- Selecting the number of bins in a histogram: A decision theoretic approachJournal of Statistical Planning and Inference, 1997
- On two recent papers of Y. KanazawaStatistics & Probability Letters, 1995
- Hellinger distance and Akaike's information criterion for the histogramStatistics & Probability Letters, 1993
- Akaike's information criterion and Kullback-Leibler loss for histogram density estimationProbability Theory and Related Fields, 1990
- On stochastic complexity and nonparametric density estimationBiometrika, 1988
- Stochastic complexity and the mdl principleEconometric Reviews, 1987
- On the histogram as a density estimator:L 2 theoryProbability Theory and Related Fields, 1981
- A new look at the statistical model identificationIEEE Transactions on Automatic Control, 1974