“Preconditioning” for feature selection and regression in high-dimensional problems
Open Access
- 1 August 2008
- journal article
- Published by Institute of Mathematical Statistics in The Annals of Statistics
- Vol. 36 (4) , 1595-1618
- https://doi.org/10.1214/009053607000000578
Abstract
We consider regression problems where the number of predictors greatly exceeds the number of observations. We propose a method for variable selection that first estimates the regression function, yielding a “preconditioned” response variable. The primary method used for this initial regression is supervised principal components. Then we apply a standard procedure such as forward stepwise selection or the LASSO to the preconditioned response variable. In a number of simulated and real data examples, this two-step procedure outperforms forward stepwise selection or the usual LASSO (applied directly to the raw outcome). We also show that under a certain Gaussian latent variable model, application of the LASSO to the preconditioned response variable is consistent as the number of predictors and observations increases. Moreover, when the observational noise is rather large, the suggested procedure can give a more accurate estimate than LASSO. We illustrate our method on some real problems, including survival analysis with microarray data.Keywords
All Related Versions
This publication has 13 references indexed in Scilit:
- The Adaptive Lasso and Its Oracle PropertiesJournal of the American Statistical Association, 2006
- High-dimensional graphs and variable selection with the LassoThe Annals of Statistics, 2006
- Prediction by Supervised Principal ComponentsJournal of the American Statistical Association, 2006
- Gene Expression Profiling Predicts Survival in Conventional Renal Cell CarcinomaPLoS Medicine, 2005
- Nonconcave penalized likelihood with a diverging number of parametersThe Annals of Statistics, 2004
- Semi-Supervised Methods to Predict Patient Survival from Gene Expression DataPLoS Biology, 2004
- Least angle regressionThe Annals of Statistics, 2004
- Adaptive Model SelectionJournal of the American Statistical Association, 2002
- Asymptotics for lasso-type estimatorsThe Annals of Statistics, 2000
- On the LASSO and Its DualJournal of Computational and Graphical Statistics, 2000