Boosting for high-dimensional linear models

Preprint

30 June 2006

preprint
Published by arXiv in arXiv

https://doi.org/10.48550/arXiv.math/0606789

Abstract

We prove that boosting with the squared error loss, $L_2$Boosting, is consistent for very high-dimensional linear models, where the number of predictor variables is allowed to grow essentially as fast as $O$(exp(sample size)), assuming that the true underlying regression function is sparse in terms of the $\ell_1$-norm of the regression coefficients. In the language of signal processing, this means consistency for de-noising using a strongly overcomplete dictionary if the underlying signal is sparse in terms of the $\ell_1$-norm. We also propose here an $\mathit{AIC}$-based method for tuning, namely for choosing the number of boosting iterations. This makes $L_2$Boosting computationally attractive since it is not required to run the algorithm multiple times for cross-validation as commonly used so far. We demonstrate $L_2$Boosting for simulated data, in particular where the predictor dimension is large in comparison to sample size, and for a difficult tumor-classification problem with gene expression microarray data.

Keywords

All Related Versions

Version 1, 2006-06-30, ArXiv
Published version: The Annals of Statistics, 34 (2), 559.