Simplifying Neural Networks by Soft Weight-Sharing
- 1 July 1992
- journal article
- Published by MIT Press in Neural Computation
- Vol. 4 (4) , 473-493
- https://doi.org/10.1162/neco.1992.4.4.473
Abstract
One way of simplifying neural networks so they generalize better is to add an extra term to the error function that will penalize complexity. Simple versions of this approach include penalizing the sum of the squares of the weights or penalizing the number of nonzero weights. We propose a more complicated penalty term in which the distribution of weight values is modeled as a mixture of multiple gaussians. A set of weights is simple if the weights have high probability density under the mixture model. This can be achieved by clustering the weights into subsets with the weights in each cluster having very similar values. Since we do not know the appropriate means or variances of the clusters in advance, we allow the parameters of the mixture model to adapt at the same time as the network learns. Simulations on two different problems demonstrate that this complexity term is more effective than previous complexity terms.Keywords
This publication has 6 references indexed in Scilit:
- A time-delay neural network architecture for isolated word recognitionNeural Networks, 1990
- What Size Net Gives Valid Generalization?Neural Computation, 1989
- Using Relevance to Reduce Network Size AutomaticallyConnection Science, 1989
- Threshold Autoregression, Limit Cycles and Cyclical DataJournal of the Royal Statistical Society Series B: Statistical Methodology, 1980
- Modeling by shortest data descriptionAutomatica, 1978
- Maximum Likelihood from Incomplete Data Via the EM AlgorithmJournal of the Royal Statistical Society Series B: Statistical Methodology, 1977