Molecular diagnosis - Classification, model selection and performance evaluation
- 1 January 2005
- journal article
- research article
- Vol. 44 (3) , 438-443
Abstract
Objectives. We discuss supervised classification techniques applied to medical diagnosis based on gene expression profiles. Our focus lies on strategies of adaptive model selection to avoid overfitting in high-dimensional spaces. Methods: We introduce likelihood-based methods, classification trees, support vector machines and regularized binary regression. For regularization by dimension reduction, we describe feature selection methods: feature filtering, feature shrinkage and wrapper approaches. In small sample-size situations efficient methods of data re-use are needed to assess the predictive power of a model. We discuss two issues in using cross-validation: the difference between in-loop and out-of-loop feature selection, and estimating model parameters in nested-loop cross-validation. Results: Gene selection does not reduce the dimensionality of the model. Tuning parameters enable adaptive model selection. The feature selection bias is a common pitfall in performance evaluation. Model selection and performance evaluation can be combined by nested-loop cross-validation. Conclusions. Classification of microarrays is prone to overfitting. A rigorous and unbiased assessment of the predictive power of the model is a must.Keywords
This publication has 8 references indexed in Scilit:
- SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivationNature Genetics, 2008
- Rules of evidence for cancer molecular-marker discovery and validationNature Reviews Cancer, 2004
- The Generalized LASSOIEEE Transactions on Neural Networks, 2004
- Treatment-specific changes in gene expression discriminate in vivo drug response in human leukemia cellsNature Genetics, 2003
- IMPROVED GENE SELECTION FOR CLASSIFICATION OF MICROARRAYSPacific Symposium on Biocomputing, 2002
- Wrappers for feature subset selectionArtificial Intelligence, 1997
- Selection of relevant features and examples in machine learningArtificial Intelligence, 1997
- Regression Shrinkage and Selection Via the LassoJournal of the Royal Statistical Society Series B: Statistical Methodology, 1996