Simultaneous Regression Shrinkage, Variable Selection, and Supervised Clustering of Predictors with OSCAR
Top Cited Papers
- 26 February 2008
- journal article
- Published by Oxford University Press (OUP) in Biometrics
- Vol. 64 (1) , 115-123
- https://doi.org/10.1111/j.1541-0420.2007.00843.x
Abstract
SummaryVariable selection can be challenging, particularly in situations with a large number of predictors with possibly high correlations, such as gene expression data. In this article, a new method called the OSCAR (octagonal shrinkage and clustering algorithm for regression) is proposed to simultaneously select variables while grouping them into predictive clusters. In addition to improving prediction accuracy and interpretation, these resulting groups can then be investigated further to discover what contributes to the group having a similar behavior. The technique is based on penalized least squares with a geometrically intuitive penalty function that shrinks some coefficients to exactly zero. Additionally, this penalty yields exact equality of some coefficients, encouraging correlated predictors that have a similar effect on the response to form predictive clusters represented by a single coefficient. The proposed procedure is shown to compare favorably to the existing shrinkage and variable selection techniques in terms of both prediction error and model complexity, while yielding the additional grouping information.Keywords
This publication has 11 references indexed in Scilit:
- Piecewise linear regularized solution pathsThe Annals of Statistics, 2007
- Model Selection and Estimation in Regression with Grouped VariablesJournal of the Royal Statistical Society Series B: Statistical Methodology, 2005
- Regularization and Variable Selection Via the Elastic NetJournal of the Royal Statistical Society Series B: Statistical Methodology, 2005
- Sparsity and Smoothness Via the Fused LassoJournal of the Royal Statistical Society Series B: Statistical Methodology, 2004
- Finding predictive gene groups from microarray dataJournal of Multivariate Analysis, 2004
- Least angle regressionThe Annals of Statistics, 2004
- Simultaneous gene clustering and subset selection for sample classification via MDLBioinformatics, 2003
- Supervised harvesting of expression treesGenome Biology, 2001
- A Multivariate Exponential DistributionJournal of the American Statistical Association, 1967
- A Multivariate Exponential DistributionJournal of the American Statistical Association, 1967