Persistence in high-dimensional linear predictor selection and the virtue of overparametrization

Top Cited Papers

Open Access

1 December 2004

journal article
Published by Bernoulli Society for Mathematical Statistics and Probability in Bernoulli

Vol. 10 (6) , 971-988
https://doi.org/10.3150/bj/1106314846

Abstract

Let Zi = (Y i;Xi 1;:::;Xi m), i = 1;:::;n, be i.i.d. random vectors, Zi » F; F 2 F. It is desired to predict Y by P fljXj, where (fl1;:::;flm) 2 Bn µ Rm, under a prediction loss. Suppose that m = nfi, fi > 1, i.e., there are many more explanatory variables than ob- servations. We consider sets Bn restricted by the maximal number of non-zero coe-cients of their members, or by their l1 radius. We study the following asymptotic question: How 'large' may the set Bn be, so that it is still possible to select empirically a predictor whose risk under F is close to that of the best predictor in the set. Sharp bounds for orders of magnitudes are given under various assumptions on F. Algo- rithmic complexity of the ensuing procedures is also studied. The main message of this paper and the implications of the above derived orders are that under various sparsity assumptions on the optimal predictor there is \asymptotically no harm" in introducing many more explana- tory variables than observations. Furthermore, such practice can be beneflcial in comparison with a procedure that screens in advance a small subset of explanatory variables. Another main result is that 'Lasso'-type procedures, i.e., optimization under l1 constraint, could be e-cient in flnding optimal sparse predictors in high dimensions.

Keywords

This publication has 15 references indexed in Scilit:

Some theory for Fisher's linear discriminant function, `naive Bayes', and some alternatives when there are many more variables than observations
Bernoulli, 2004
Least angle regression
The Annals of Statistics, 2004
Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author)
Statistical Science, 2001
Functional aggregation for nonparametric regression
The Annals of Statistics, 2000
Asymptotic distribution of the errors in scalar and vector quantizers
IEEE Transactions on Information Theory, 1996
The Risk Inflation Criterion for Multiple Regression
The Annals of Statistics, 1994
The Smallest Eigenvalue of a Large Dimensional Wishart Matrix
The Annals of Probability, 1985
Asymptotic Behavior of $M$-Estimators of $p$ Regression Parameters when $p^2/n$ is Large. I. Consistency
The Annals of Statistics, 1984
How Many Variables Should be Entered in a Regression Equation?
Journal of the American Statistical Association, 1983
Robust Regression: Asymptotics, Conjectures and Monte Carlo
The Annals of Statistics, 1973