Using an MDL-based cost function with neural networks

27 November 2002

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

Vol. 3 (10987576) , 2384-2389
https://doi.org/10.1109/ijcnn.1998.687235

Abstract

The minimum description length (MDL) principle is an information theoretically based method to learn models from data. This paper presents an approach to efficiently use an MDL-based cost function with neural networks. As usual, the cost function can be used to adapt the parameters in the network, but it can also include terms to measure the complexity of the structure of the network and can thus be applied to determine the optimal structure. The basic idea is to convert a conventional neural network such that each parameter and each output of the neurons is assigned a means and a variance. This greatly simplifies the computation of the description length and its gradient with respect to the parameters, which can then be adapted using the standard gradient descent method.

Keywords

This publication has 7 references indexed in Scilit:

Flat Minima
Neural Computation, 1997
Fisher information and stochastic complexity
IEEE Transactions on Information Theory, 1996
A Universal Prior for Integers and Estimation by Minimum Description Length
The Annals of Statistics, 1983
Universal modeling and coding
IEEE Transactions on Information Theory, 1981
Modeling by shortest data description
Automatica, 1978
An Information Measure for Classification
The Computer Journal, 1968
A Mathematical Theory of Communication
Bell System Technical Journal, 1948