On overfitting, generalization, and randomly expanded training sets

Abstract
An algorithmic procedure is developed for the random expansion of a given training set to combat overfitting and improve the generalization ability of backpropagation trained multilayer perceptrons (MLPs). The training set is K-means clustered and locally most entropic colored Gaussian joint input-output probability density function estimates are formed per cluster. The number of clusters is chosen such that the resulting overall colored Gaussian mixture exhibits minimum differential entropy upon global cross-validated shaping. Numerical studies on real data and synthetic data examples drawn from the literature illustrate and support these theoretical developments.

This publication has 26 references indexed in Scilit: