Persistence in high-dimensional linear predictor selection and the virtue of overparametrization
Top Cited Papers
Open Access
- 1 December 2004
- journal article
- Published by Bernoulli Society for Mathematical Statistics and Probability in Bernoulli
- Vol. 10 (6) , 971-988
- https://doi.org/10.3150/bj/1106314846
Abstract
Let Zi = (Y i;Xi 1;:::;Xi m), i = 1;:::;n, be i.i.d. random vectors, Zi » F; F 2 F. It is desired to predict Y by P fljXj, where (fl1;:::;flm) 2 Bn µ Rm, under a prediction loss. Suppose that m = nfi, fi > 1, i.e., there are many more explanatory variables than ob- servations. We consider sets Bn restricted by the maximal number of non-zero coe-cients of their members, or by their l1 radius. We study the following asymptotic question: How 'large' may the set Bn be, so that it is still possible to select empirically a predictor whose risk under F is close to that of the best predictor in the set. Sharp bounds for orders of magnitudes are given under various assumptions on F. Algo- rithmic complexity of the ensuing procedures is also studied. The main message of this paper and the implications of the above derived orders are that under various sparsity assumptions on the optimal predictor there is \asymptotically no harm" in introducing many more explana- tory variables than observations. Furthermore, such practice can be beneflcial in comparison with a procedure that screens in advance a small subset of explanatory variables. Another main result is that 'Lasso'-type procedures, i.e., optimization under l1 constraint, could be e-cient in flnding optimal sparse predictors in high dimensions.Keywords
This publication has 15 references indexed in Scilit:
- Some theory for Fisher's linear discriminant function, `naive Bayes', and some alternatives when there are many more variables than observationsBernoulli, 2004
- Least angle regressionThe Annals of Statistics, 2004
- Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author)Statistical Science, 2001
- Functional aggregation for nonparametric regressionThe Annals of Statistics, 2000
- Asymptotic distribution of the errors in scalar and vector quantizersIEEE Transactions on Information Theory, 1996
- The Risk Inflation Criterion for Multiple RegressionThe Annals of Statistics, 1994
- The Smallest Eigenvalue of a Large Dimensional Wishart MatrixThe Annals of Probability, 1985
- Asymptotic Behavior of $M$-Estimators of $p$ Regression Parameters when $p^2/n$ is Large. I. ConsistencyThe Annals of Statistics, 1984
- How Many Variables Should be Entered in a Regression Equation?Journal of the American Statistical Association, 1983
- Robust Regression: Asymptotics, Conjectures and Monte CarloThe Annals of Statistics, 1973