On-line versus off-line learning in the linear perceptron: A comparative study

Abstract
The spherical perceptron with N inputs and a linear output does not present optimal generalization if trained by minimization of the standard quadratic cost function E=1/2 Jμ=1αN (bμ-hμ )2, where bμ and hμ are the outputs from the rule (teacher) and hypothesis (student) networks for the example μ and there are αN examples. We derive an optimal algorithm for on-line learning of examples which outperforms the iterative (off-line) standard algorithm for α up to 0.71. The on-line optimized algorithm suggests a class of cost functions for off-line learning, which we then proceed to study using the replica method. The optimized cost function within that class has the suggestive form EN[Γ(1/αN) Jμ=1αN [-lnP(bμhμ)]-Γ lnZ], where Z is a normalization constant, P(bμhμ) is the conditional probability of the output data bμ given the hypothesis output hμ, and Γ is a learning parameter analogous to a temperature which decreases in a well defined manner along the learning process.

This publication has 17 references indexed in Scilit: