Comparing the Characteristics of Gene Expression Profiles Derived by Univariate and Multivariate Classification Methods
- 23 January 2008
- journal article
- research article
- Published by Walter de Gruyter GmbH in Statistical Applications in Genetics and Molecular Biology
- Vol. 7 (1) , Article7
- https://doi.org/10.2202/1544-6115.1307
Abstract
One application of gene expression arrays is to derive molecular profiles, i.e., sets of genes, which discriminate well between two classes of samples, for example between tumour types. Users are confronted with a multitude of classification methods of varying complexity that can be applied to this task. To help decide which method to use in a given situation, we compare important characteristics of a range of classification methods, including simple univariate filtering, penalised likelihood methods and the random forest. Classification accuracy is an important characteristic, but the biological interpretability of molecular profiles is also important. This implies both parsimony and stability, in the sense that profiles should not vary much when there are slight changes in the training data. We perform a random resampling study to compare these characteristics between the methods and across a range of profile sizes. We measure stability by adopting the Jaccard index to assess the similarity of resampled molecular profiles. We carry out a case study on five well-established cancer microarray data sets, for two of which we have the benefit of being able to validate the results in an independent data set. The study shows that those methods which produce parsimonious profiles generally result in better prediction accuracy than methods which don't include variable selection. For very small profile sizes, the sparse penalised likelihood methods tend to result in more stable profiles than univariate filtering while maintaining similar predictive performance.Keywords
This publication has 21 references indexed in Scilit:
- Prediction of cancer outcome with microarrays: a multiple random validation strategyPublished by Elsevier ,2005
- Outcome signature genes in breast cancer: is there a unique set?Bioinformatics, 2004
- Selection of Potential Markers for Epithelial Ovarian Cancer with Gene Expression Arrays and Recursive Descent Partition AnalysisClinical Cancer Research, 2004
- Prognostically Useful Gene-Expression Profiles in Acute Myeloid LeukemiaNew England Journal of Medicine, 2004
- Class prediction and discovery using gene microarray and proteomics mass spectroscopy data: curses, caveats, cautionsBioinformatics, 2003
- Exploration, normalization, and summaries of high density oligonucleotide array probe level dataBiostatistics, 2003
- Pitfalls in the Use of DNA Microarray Data for Diagnostic and Prognostic ClassificationJNCI Journal of the National Cancer Institute, 2003
- A Gene-Expression Signature as a Predictor of Survival in Breast CancerNew England Journal of Medicine, 2002
- Gene expression in ovarian cancer reflects both morphology and biological behavior, distinguishing clear cell from other poor-prognosis ovarian carcinomas.2002
- Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression MonitoringScience, 1999