A protocol for building and evaluating predictors of disease state based on microarray data
Open Access
- 7 April 2005
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 21 (19) , 3755-3762
- https://doi.org/10.1093/bioinformatics/bti429
Abstract
Motivation: Microarray gene expression data are increasingly employed to identify sets of marker genes that accurately predict disease development and outcome in cancer. Many computational approaches have been proposed to construct such predictors. However, there is, as yet, no objective way to evaluate whether a new approach truly improves on the current state of the art. In addition no ‘standard’ computational approach has emerged which enables robust outcome prediction. Results: An important contribution of this work is the description of a principled training and validation protocol, which allows objective evaluation of the complete methodology for constructing a predictor. We review the possible choices of computational approaches, with specific emphasis on predictor choice and reporter selection strategies. Employing this training-validation protocol, we evaluated different reporter selection strategies and predictors on six gene expression datasets of varying degrees of difficulty. We demonstrate that simple reporter selection strategies (forward filtering and shrunken centroids) work surprisingly well and outperform partial least squares in four of the six datasets. Similarly, simple predictors, such as the nearest mean classifier, outperform more complex classifiers. Our training-validation protocol provides a robust methodology to evaluate the performance of new computational approaches and to objectively compare outcome predictions on different datasets. Contact:l.f.a.wessels@ewi.tudelft.nl Supplementary information:http://ict.ewi.tudelft.nl/index.php?option=com_pub&task=view&id=1983Keywords
This publication has 25 references indexed in Scilit:
- Selection bias in gene extraction on the basis of microarray gene-expression dataProceedings of the National Academy of Sciences, 2002
- Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression DataJournal of the American Statistical Association, 2002
- Gene Selection for Cancer Classification using Support Vector MachinesMachine Learning, 2002
- Tissue Classification with Gene Expression ProfilesJournal of Computational Biology, 2000
- Distinct types of diffuse large B-cell lymphoma identified by gene expression profilingNature, 2000
- Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression MonitoringScience, 1999
- Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arraysProceedings of the National Academy of Sciences, 1999
- On the Optimality of the Simple Bayesian Classifier under Zero-One LossMachine Learning, 1997
- THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMSAnnals of Eugenics, 1936
- THE SECULAR VARIATIONS OF SKULL CHARACTERS IN FOUR SERIES OF EGYPTIAN SKULLSAnnals of Eugenics, 1935