Evaluation of gene-expression clustering via mutual information distance measure
Open Access
- 30 March 2007
- journal article
- research article
- Published by Springer Nature in BMC Bioinformatics
- Vol. 8 (1) , 1-12
- https://doi.org/10.1186/1471-2105-8-111
Abstract
The definition of a distance measure plays a key role in the evaluation of different clustering solutions of gene expression profiles. In this empirical study we compare different clustering solutions when using the Mutual Information (MI) measure versus the use of the well known Euclidean distance and Pearson correlation coefficient. Relying on several public gene expression datasets, we evaluate the homogeneity and separation scores of different clustering solutions. It was found that the use of the MI measure yields a more significant differentiation among erroneous clustering solutions. The proposed measure was also used to analyze the performance of several known clustering algorithms. A comparative study of these algorithms reveals that their "best solutions" are ranked almost oppositely when using different distance measures, despite the found correspondence between these measures when analysing the averaged scores of groups of solutions. In view of the results, further attention should be paid to the selection of a proper distance measure for analyzing the clustering of gene expression data.Keywords
This publication has 41 references indexed in Scilit:
- Cluster analysis for gene expression data: a surveyIEEE Transactions on Knowledge and Data Engineering, 2004
- GeneCluster 2.0: an advanced toolset for bioarray analysisBioinformatics, 2004
- Discrimination of genotoxic from non-genotoxic carcinogens by gene expression profilingCarcinogenesis: Integrative Cancer Research, 2004
- Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression DataJournal of the American Statistical Association, 2002
- Tumor classification by gene expression profilingACM SIGBIO Newsletter, 2001
- Gene expression data analysisFEBS Letters, 2000
- Tissue Classification with Gene Expression ProfilesJournal of Computational Biology, 2000
- Distinct types of diffuse large B-cell lymphoma identified by gene expression profilingNature, 2000
- Clustering Gene Expression PatternsJournal of Computational Biology, 1999
- A DNA-based method for rationally assembling nanoparticles into macroscopic materialsNature, 1996