Class discovery in gene expression data
- 22 April 2001
- conference paper
- Published by Association for Computing Machinery (ACM)
Abstract
Recent studies (Alizadeh et al, [1]; Bittner et al,[5]; Golub et al, [11]) demonstrate the discovery of putative disease subtypes from gene expression data. The underlying computational problem is to partition the set of sample tissues into statistically meaningful classes. In this paper we present a novel approach to class discovery and develop automatic analysis methods. Our approach is based on statistically scoring candidate partitions according to the overabundance of genes that separate the different classes. Indeed, in biological datasets, an overabundance of genes separating known classes is typically observed. we measure overabundance against a stochastic null model. This allows for highlighting subtle, yet meaningful, partitions that are supported on a small subset of the genes.Using simulated annealing we explore the space of all possible partitions of the set of samples, seeking partitions with statistically significant overabundance of differentially expressed genes. We demonstrate the performance of our methods on synthetic data, where we recover planted partitions. Finally, we turn to tumor expression datasets, and show that we find several highly pronounced partitions.Keywords
This publication has 13 references indexed in Scilit:
- Molecular classification of cutaneous malignant melanoma by gene expression profilingNature, 2000
- Tissue Classification with Gene Expression ProfilesJournal of Computational Biology, 2000
- Distinct types of diffuse large B-cell lymphoma identified by gene expression profilingNature, 2000
- Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression MonitoringScience, 1999
- Comparative hybridization of an array of 21 500 ovarian cDNAs for the discovery of genes overexpressed in ovarian carcinomasGene, 1999
- Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arraysProceedings of the National Academy of Sciences, 1999
- On Bias, Variance, 0/1—Loss, and the Curse-of-DimensionalityData Mining and Knowledge Discovery, 1997
- Ratio-based decisions and the quantitative analysis of cDNA microarray imagesJournal of Biomedical Optics, 1997
- Multiple ComparisonsPublished by SAGE Publications ,1986
- Optimization by Simulated AnnealingScience, 1983