Bayesian Framework for Least-Squares Support Vector Machine Classifiers, Gaussian Processes, and Kernel Fisher Discriminant Analysis
Top Cited Papers
- 1 May 2002
- journal article
- Published by MIT Press in Neural Computation
- Vol. 14 (5) , 1115-1147
- https://doi.org/10.1162/089976602753633411
Abstract
The Bayesian evidence framework has been successfully applied to the design of multilayer perceptrons (MLPs) in the work of MacKay. Nevertheless, the training of MLPs suffers from drawbacks like the nonconvex optimization problem and the choice of the number of hidden units. In support vector machines (SVMs) for classification, as introduced by Vapnik, a nonlinear decision boundary is obtained by mapping the input vector first in a nonlinear way to a high-dimensional kernel-induced feature space in which a linear large margin classifier is constructed. Practical expressions are formulated in the dual space in terms of the related kernel function, and the solution follows from a (convex) quadratic programming (QP) problem. In least-squares SVMs (LS-SVMs), the SVM problem formulation is modified by introducing a least-squares cost function and equality instead of inequality constraints, and the solution follows from a linear system in the dual space. Implicitly, the least-squares formulation corresponds to a regression formulation and is also related to kernel Fisher discriminant analysis. The least-squares regression formulation has advantages for deriving analytic expressions in a Bayesian evidence framework, in contrast to the classification formulations used, for example, in gaussian processes (GPs). The LS-SVM formulation has clear primal-dual interpretations, and without the bias term, one explicitly constructs a model that yields the same expressions as have been obtained with GPs for regression. In this article, the Bayesian evidence frame-work is combined with the LS-SVM classifier formulation. Starting from the feature space formulation, analytic expressions are obtained in the dual space on the different levels of Bayesian inference, while posterior class probabilities are obtained by marginalizing over the model param-eters. Empirical results obtained on 10 public domain data sets show that the LS-SVM classifier designed within the Bayesian evidence framework consistently yields good generalization performances.Keywords
This publication has 16 references indexed in Scilit:
- Financial time series prediction using least squares support vector machines within the evidence frameworkIEEE Transactions on Neural Networks, 2001
- An Expectation-Maximization Approach to Nonlinear Component AnalysisNeural Computation, 2001
- The evidence framework applied to support vector machinesIEEE Transactions on Neural Networks, 2000
- Recurrent least squares support vector machinesIEEE Transactions on Circuits and Systems I: Regular Papers, 2000
- Moderating the outputs of support vector machine classifiersIEEE Transactions on Neural Networks, 1999
- Comparison of Approximate Methods for Handling HyperparametersNeural Computation, 1999
- Nonlinear Component Analysis as a Kernel Eigenvalue ProblemNeural Computation, 1998
- The connection between regularization operators and support vector kernelsNeural Networks, 1998
- Bayesian classification with Gaussian processesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1998
- Probable networks and plausible predictions — a review of practical Bayesian methods for supervised neural networksNetwork: Computation in Neural Systems, 1995