The composite absolute penalties family for grouped and hierarchical variable selection

Top Cited Papers

Open Access

1 December 2009

journal article
Published by Institute of Mathematical Statistics in The Annals of Statistics

Vol. 37 (6A) , 3468-3497
https://doi.org/10.1214/07-aos584

Abstract

Extracting useful information from high-dimensional data is an important focus of today’s statistical research and practice. Penalized loss function minimization has been shown to be effective for this task both theoretically and empirically. With the virtues of both regularization and sparsity, the L₁-penalized squared error minimization method Lasso has been popular in regression models and beyond. In this paper, we combine different norms including L₁ to form an intelligent penalty in order to add side information to the fitting of a regression or classification model to obtain reasonable estimates. Specifically, we introduce the Composite Absolute Penalties (CAP) family, which allows given grouping and hierarchical relationships between the predictors to be expressed. CAP penalties are built by defining groups and combining the properties of norm penalties at the across-group and within-group levels. Grouped selection occurs for nonoverlapping groups. Hierarchical variable selection is reached by defining groups with particular overlapping patterns. We propose using the BLASSO and cross-validation to compute CAP estimates in general. For a subfamily of CAP estimates involving only the L₁ and L_∞ norms, we introduce the iCAP algorithm to trace the entire regularization path for the grouped selection problem. Within this subfamily, unbiased estimates of the degrees of freedom (df) are derived so that the regularization parameter is selected without cross-validation. CAP is shown to improve on the predictive performance of the LASSO in a series of simulated experiments, including cases with p≫n and possibly mis-specified groupings. When the complexity of a model is properly calculated, iCAP is seen to be parsimonious in the experiments.

Keywords

All Related Versions

Version 1, 2009-09-02, ArXiv

This publication has 20 references indexed in Scilit:

On the “degrees of freedom” of the lasso
The Annals of Statistics, 2007
Piecewise linear regularized solution paths
The Annals of Statistics, 2007
Regularization and Variable Selection Via the Elastic Net
Journal of the Royal Statistical Society Series B: Statistical Methodology, 2005
Least angle regression
The Annals of Statistics, 2004
A new approach to variable selection in least squares problems
IMA Journal of Numerical Analysis, 2000
A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting
Journal of Computer and System Sciences, 1997
Better Subset Regression Using the Nonnegative Garrote
Technometrics, 1995
A Statistical View of Some Chemometrics Regression Tools
Technometrics, 1993
Estimating the Dimension of a Model
The Annals of Statistics, 1978
Ridge Regression: Biased Estimation for Nonorthogonal Problems
Technometrics, 1970