Statistical estimation of cluster boundaries in gene expression profile data
Open Access
- 1 December 2001
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 17 (12) , 1143-1151
- https://doi.org/10.1093/bioinformatics/17.12.1143
Abstract
Motivation: Gene expression profile data are rapidly accumulating due to advances in microarray techniques. The abundant data are analyzed by clustering procedures to extract the useful information about the genes inherent in the data. In the clustering analyses, the systematic determination of the boundaries of gene clusters, instead of by visual inspection and biological knowledge, still remains challenging. Results: We propose a statistical procedure to estimate the number of clusters in the hierarchical clustering of the expression profiles. Following the hierarchical clustering, the statistical property of the profiles at the node in the dendrogram is evaluated by a statistics-based value: the variance inflation factor in the multiple regression analysis. The evaluation leads to an automatic determination of the cluster boundaries without any additional analyses and any biological knowledge of the measured genes. The performance of the present procedure is demonstrated on the profiles of 2467 yeast genes, with very promising results. Availability: A set of programs will be electronically sent upon request. Contact: horimoto@post.saga-med.ac.jp; toh@beri.co.jpKeywords
This publication has 0 references indexed in Scilit: