Abstract
We derive sharp performance bounds for least squares regression with $L_1$ regularization from parameter estimation accuracy and feature selection quality perspectives. The main result proved for $L_1$ regularization extends a similar result in [Ann. Statist. 35 (2007) 2313--2351] for the Dantzig selector. It gives an affirmative answer to an open question in [Ann. Statist. 35 (2007) 2358--2364]. Moreover, the result leads to an extended view of feature selection that allows less restrictive conditions than some recent work. Based on the theoretical insights, a novel two-stage $L_1$-regularization procedure with selective penalization is analyzed. It is shown that if the target parameter vector can be decomposed as the sum of a sparse parameter vector with large coefficients and another less sparse vector with relatively small coefficients, then the two-stage procedure can lead to improved performance.Comment: Published in at http://dx.doi.org/10.1214/08-AOS659 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org
All Related Versions