Consistent inference of probabilities in layered networks: predictions and generalizations
- 1 January 1989
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- p. 403-409 vol.2
- https://doi.org/10.1109/ijcnn.1989.118274
Abstract
The problem of learning a general input-output relation using a layered neural network is discussed in a statistical framework. By imposing the consistency condition that the error minimization be equivalent to a likelihood maximization for training the network, the authors arrive at a Gibbs distribution on a canonical ensemble of networks with the same architecture. This statistical description enables them to evaluate the probability of a correct prediction of an independent example, after training the network on a given training set. The prediction probability is highly correlated with the generalization ability of the network, as measured outside the training set. This suggests a general and practical criterion for training layered networks by minimizing prediction errors. The authors demonstrate the utility of this criterion for selecting the optimal architecture in the continuity problem. As a theoretical application of the statistical formalism, they discuss the question of learning curves and estimate the sufficient training size needed for correct generalization, in a simple example.Keywords
This publication has 7 references indexed in Scilit:
- Statistical mechanics of neural networks near saturationPublished by Elsevier ,2004
- On the capabilities of multilayer perceptronsJournal of Complexity, 1988
- Stochastic Complexity and ModelingThe Annals of Statistics, 1986
- Alternative approach to maximum-entropy inferencePhysical Review A, 1984
- Optimization by Simulated AnnealingScience, 1983
- Neural networks and physical systems with emergent collective computational abilities.Proceedings of the National Academy of Sciences, 1982
- Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern RecognitionIEEE Transactions on Electronic Computers, 1965