Super Learner
Top Cited Papers
- 16 January 2007
- journal article
- research article
- Published by Walter de Gruyter GmbH in Statistical Applications in Genetics and Molecular Biology
- Vol. 6 (1) , Article25
- https://doi.org/10.2202/1544-6115.1309
Abstract
When trying to learn a model for the prediction of an outcome given a set of covariates, a statistician has many estimation procedures in their toolbox. A few examples of these candidate learners are: least squares, least angle regression, random forests, and spline regression. Previous articles (van der Laan and Dudoit (2003); van der Laan et al. (2006); Sinisi et al. (2007)) theoretically validated the use of cross validation to select an optimal learner among many candidate learners. Motivated by this use of cross validation, we propose a new prediction method for creating a weighted combination of many candidate learners to build the super learner. This article proposes a fast algorithm for constructing a super learner in prediction which uses V-fold cross-validation to select weights to combine an initial set of candidate learners. In addition, this paper contains a practical demonstration of the adaptivity of this so called super learner to various true data generating distributions. This approach for construction of a super learner generalizes to any parameter which can be defined as a minimizer of a loss function.Keywords
This publication has 5 references indexed in Scilit:
- SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivationNature Genetics, 2008
- The cross-validated adaptive epsilon-net estimatorStatistics & Decisions, 2006
- Asymptotics of cross-validated risk estimation in estimator selection and performance assessmentStatistical Methodology, 2005
- Logic RegressionJournal of Computational and Graphical Statistics, 2003
- Multivariate Adaptive Regression SplinesThe Annals of Statistics, 1991