Bayesian Framework for Least-Squares Support Vector Machine Classifiers, Gaussian Processes, and Kernel Fisher Discriminant Analysis

Top Cited Papers

1 May 2002

journal article
Published by MIT Press in Neural Computation

Vol. 14 (5) , 1115-1147
https://doi.org/10.1162/089976602753633411

Abstract

The Bayesian evidence framework has been successfully applied to the design of multilayer perceptrons (MLPs) in the work of MacKay. Nevertheless, the training of MLPs suffers from drawbacks like the nonconvex optimization problem and the choice of the number of hidden units. In support vector machines (SVMs) for classification, as introduced by Vapnik, a nonlinear decision boundary is obtained by mapping the input vector first in a nonlinear way to a high-dimensional kernel-induced feature space in which a linear large margin classifier is constructed. Practical expressions are formulated in the dual space in terms of the related kernel function, and the solution follows from a (convex) quadratic programming (QP) problem. In least-squares SVMs (LS-SVMs), the SVM problem formulation is modified by introducing a least-squares cost function and equality instead of inequality constraints, and the solution follows from a linear system in the dual space. Implicitly, the least-squares formulation corresponds to a regression formulation and is also related to kernel Fisher discriminant analysis. The least-squares regression formulation has advantages for deriving analytic expressions in a Bayesian evidence framework, in contrast to the classification formulations used, for example, in gaussian processes (GPs). The LS-SVM formulation has clear primal-dual interpretations, and without the bias term, one explicitly constructs a model that yields the same expressions as have been obtained with GPs for regression. In this article, the Bayesian evidence frame-work is combined with the LS-SVM classifier formulation. Starting from the feature space formulation, analytic expressions are obtained in the dual space on the different levels of Bayesian inference, while posterior class probabilities are obtained by marginalizing over the model param-eters. Empirical results obtained on 10 public domain data sets show that the LS-SVM classifier designed within the Bayesian evidence framework consistently yields good generalization performances.

Keywords

This publication has 16 references indexed in Scilit:

Financial time series prediction using least squares support vector machines within the evidence framework
IEEE Transactions on Neural Networks, 2001
An Expectation-Maximization Approach to Nonlinear Component Analysis
Neural Computation, 2001
The evidence framework applied to support vector machines
IEEE Transactions on Neural Networks, 2000
Recurrent least squares support vector machines
IEEE Transactions on Circuits and Systems I: Regular Papers, 2000
Moderating the outputs of support vector machine classifiers
IEEE Transactions on Neural Networks, 1999
Comparison of Approximate Methods for Handling Hyperparameters
Neural Computation, 1999
Nonlinear Component Analysis as a Kernel Eigenvalue Problem
Neural Computation, 1998
The connection between regularization operators and support vector kernels
Neural Networks, 1998
Bayesian classification with Gaussian processes
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1998
Probable networks and plausible predictions — a review of practical Bayesian methods for supervised neural networks
Network: Computation in Neural Systems, 1995