Why error measures are sub-optimal for training neural network pattern classifiers

Abstract
Pattern classifiers that are trained in a supervisedfashion (e.g., multi-layer perceptrons, radial basis functions, etc.)are typically trained with an error measure objective function such as mean-squared error (MSE) or cross-entropy(CE). These classifiers can in theory yield (optimal) Bayesian discrimination, but in practice they often fail to doso. We explain why this happens. In so doing, we identify a number of characteristics that the optimal objectivefunction for training classifiers...