Bayesian model averaging: development of an improved multi-class, gene selection and classification tool for microarray data
Top Cited Papers
Open Access
- 15 February 2005
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 21 (10) , 2394-2402
- https://doi.org/10.1093/bioinformatics/bti319
Abstract
Motivation: Selecting a small number of relevant genes for accurate classification of samples is essential for the development of diagnostic tests. We present the Bayesian model averaging (BMA) method for gene selection and classification of microarray data. Typical gene selection and classification procedures ignore model uncertainty and use a single set of relevant genes (model) to predict the class. BMA accounts for the uncertainty about the best set to choose by averaging over multiple models (sets of potentially overlapping relevant genes). Results: We have shown that BMA selects smaller numbers of relevant genes (compared with other methods) and achieves a high prediction accuracy on three microarray datasets. Our BMA algorithm is applicable to microarray datasets with any number of classes, and outputs posterior probabilities for the selected genes and models. Our selected models typically consist of only a few genes. The combination of high accuracy, small numbers of genes and posterior probabilities for the predictions should make BMA a powerful tool for developing diagnostics from expression data. Availability: The source codes and datasets used are available from our Supplementary website. Contact:kayee@u.washington.edu Supplementary information:http://www.expression.washington.edu/publications/kayee/bmaKeywords
This publication has 27 references indexed in Scilit:
- Boosting for tumor classification with gene expression dataBioinformatics, 2003
- Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression DataJournal of the American Statistical Association, 2002
- Tissue Classification with Gene Expression ProfilesJournal of Computational Biology, 2000
- Distinct types of diffuse large B-cell lymphoma identified by gene expression profilingNature, 2000
- Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression MonitoringScience, 1999
- Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arraysProceedings of the National Academy of Sciences, 1999
- Computing Bayes Factors by Combining Simulation and Asymptotic ApproximationsJournal of the American Statistical Association, 1997
- Calculation of polychotomous logistic regression parameters using individualized regressionsBiometrika, 1984
- Regressions by Leaps and BoundsTechnometrics, 1974
- VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITYMonthly Weather Review, 1950