Statistical Inference, Occam's Razor, and Statistical Mechanics on the Space of Probability Distributions
- 1 February 1997
- journal article
- Published by MIT Press in Neural Computation
- Vol. 9 (2) , 349-368
- https://doi.org/10.1162/neco.1997.9.2.349
Abstract
The task of parametric model selection is cast in terms of a statistical mechanics on the space of probability distributions. Using the techniques of low-temperature expansions, I arrive at a systematic series for the Bayesian posterior probability of a model family that significantly extends known results in the literature. In particular, I arrive at a precise understanding of how Occam's razor, the principle that simpler models should be preferred until the data justify more complex models, is automatically embodied by probability theory. These results require a measure on the space of model parameters and I derive and discuss an interpretation of Jeffreys' prior distribution as a uniform prior over the distributions indexed by a family. Finally, I derive a theoretical index of the complexity of a parametric family relative to some true distribution that I call the razor of the model. The form of the razor immediately suggests several interesting questions in the theory of learning that can be studied using the techniques of statistical mechanics.Keywords
All Related Versions
This publication has 8 references indexed in Scilit:
- Network information criterion-determining the number of hidden units for an artificial neural network modelIEEE Transactions on Neural Networks, 1994
- A Practical Bayesian Framework for Backpropagation NetworksNeural Computation, 1992
- Bayesian InterpolationNeural Computation, 1992
- Minimum complexity density estimationIEEE Transactions on Information Theory, 1991
- Information-theoretic asymptotics of Bayes methodsIEEE Transactions on Information Theory, 1990
- Estimation and Inference by Compact CodingJournal of the Royal Statistical Society Series B: Statistical Methodology, 1987
- Stochastic Complexity and ModelingThe Annals of Statistics, 1986
- Universal coding, information, prediction, and estimationIEEE Transactions on Information Theory, 1984