Judging the Quality of Gene Expression-Based Clustering Methods Using Gene Annotation
Open Access
- 1 October 2002
- journal article
- research article
- Published by Cold Spring Harbor Laboratory in Genome Research
- Vol. 12 (10) , 1574-1581
- https://doi.org/10.1101/gr.397002
Abstract
We compare several commonly used expression-based gene clustering algorithms using a figure of merit based on the mutual information between cluster membership and known gene attributes. By studying various publicly available expression data sets we conclude that enrichment of clusters for biological function is, in general, highest at rather low cluster numbers. As a measure of dissimilarity between the expression patterns of two genes, no method outperforms Euclidean distance for ratio-based measurements, or Pearson distance for non-ratio-based measurements at the optimal choice of cluster number. We show the self-organized-map approach to be best for both measurement types at higher numbers of clusters. Clusters of genes derived from single- and average-linkage hierarchical clustering tend to produce worse-than-random results. [The algorithm described is available at http://llama.med.harvard.edu, under Software.]Keywords
This publication has 32 references indexed in Scilit:
- Discrimination between Paralogs using Microarray Analysis: Application to the Yap1p and Yap2p Transcriptional NetworksMolecular Biology of the Cell, 2002
- Clustering Based on Conditional Distributions in an Auxiliary SpaceNeural Computation, 2002
- Computational analysis of microarray dataNature Reviews Genetics, 2001
- Assessing Clusters and Motifs from Gene Expression DataGenome Research, 2001
- Regulatory Networks Revealed by Transcriptional Profiling of DamagedSaccharomyces cerevisiaeCells: Rpn4 Links Base Excision Repair with ProteasomesMolecular and Cellular Biology, 2000
- Coupled two-way clustering analysis of gene microarray dataProceedings of the National Academy of Sciences, 2000
- Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitationNature Biotechnology, 1998
- How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster AnalysisThe Computer Journal, 1998
- Exploring the Metabolic and Genetic Control of Gene Expression on a Genomic ScaleScience, 1997
- Cluster analysisQuality & Quantity, 1980