Penalized feature selection and classification in bioinformatics
Open Access
- 24 April 2008
- journal article
- review article
- Published by Oxford University Press (OUP) in Briefings in Bioinformatics
- Vol. 9 (5) , 392-403
- https://doi.org/10.1093/bib/bbn027
Abstract
In bioinformatics studies, supervised classification with high-dimensional input variables is frequently encountered. Examples routinely arise in genomic, epigenetic and proteomic studies. Feature selection can be employed along with classifier construction to avoid over-fitting, to generate more reliable classifier and to provide more insights into the underlying causal relationships. In this article, we provide a review of several recently developed penalized feature selection and classification techniques—which belong to the family of embedded feature selection methods—for bioinformatics studies with high-dimensional input. Classification objective functions, penalty functions and computational algorithms are discussed. Our goal is to make interested researchers aware of these feature selection and classification methods that are applicable to high-dimensional bioinformatics data.Keywords
This publication has 45 references indexed in Scilit:
- A review of feature selection techniques in bioinformaticsBioinformatics, 2007
- Additive risk survival model with microarray dataBMC Bioinformatics, 2007
- Sparse Logistic Regression with Lp Penalty for Biomarker IdentificationStatistical Applications in Genetics and Molecular Biology, 2007
- Sparse Principal Component AnalysisJournal of Computational and Graphical Statistics, 2006
- Dimension Reduction for Classification with Gene Expression Microarray DataStatistical Applications in Genetics and Molecular Biology, 2006
- Regularization and Variable Selection Via the Elastic NetJournal of the Royal Statistical Society Series B: Statistical Methodology, 2005
- Mismatch string kernels for discriminative protein classificationBioinformatics, 2004
- Regression Approaches for Microarray Data AnalysisJournal of Computational Biology, 2003
- Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression DataJournal of the American Statistical Association, 2002
- Gene expression profiling predicts clinical outcome of breast cancerNature, 2002