An Adaptive Strategy for Single- and Multi-Cluster Gene Assignment
- 5 September 2008
- journal article
- research article
- Published by Wiley in Biotechnology Progress
- Vol. 19 (4) , 1142-1148
- https://doi.org/10.1021/bp025648p
Abstract
Strict assignment of genes to one class, dimensionality reduction, a priori specification of the number of classes, the need for a training set, nonunique solution, and complex learning mechanisms are some of the inadequacies of current clustering algorithms. Existing algorithms cluster genes on the basis of high positive correlations between their expression patterns. However, genes with strong negative correlations can also have similar functions and are most likely to have a role in the same pathways. To address some of these issues, we propose the adaptive centroid algorithm (ACA), which employs an analysis of variance (ANOVA)-based performance criterion. The ACA also uses Euclidian distances, the center-of-mass principle for heterogeneously distributed mass elements, and the given data set to give unique solutions. The proposed approach involves three stages. In the first stage a two-way ANOVA of the gene expression matrix is performed. The two factors in the ANOVA are gene expression and experimental condition. The residual mean squared error (MSE) from the ANOVA is used as a performance criterion in the ACA. Finally, correlated clusters are found based on the Pearson correlation coefficients. To validate the proposed approach, a two-way ANOVA is again performed on the discovered clusters. The results from this last step indicate that MSEs of the clusters are significantly lower compared to that of the fibroblast-serum gene expression matrix. The ACA is employed in this study for single- as well as multi-cluster gene assignments.Keywords
This publication has 14 references indexed in Scilit:
- Molecular classification of cutaneous malignant melanoma by gene expression profilingNature, 2000
- Knowledge-based analysis of microarray gene expression data by using support vector machinesProceedings of the National Academy of Sciences, 2000
- PRINCIPAL COMPONENTS ANALYSIS TO SUMMARIZE MICROARRAY EXPERIMENTS: APPLICATION TO SPORULATION TIME SERIESPacific Symposium on Biocomputing, 1999
- Systematic determination of genetic network architectureNature Genetics, 1999
- Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arraysProceedings of the National Academy of Sciences, 1999
- Interpreting patterns of gene expression with self-organizing maps: Methods and application to hematopoietic differentiationProceedings of the National Academy of Sciences, 1999
- Cluster analysis and display of genome-wide expression patternsProceedings of the National Academy of Sciences, 1998
- Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitationNature Biotechnology, 1998
- Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA MicroarrayScience, 1995
- Multiplexed biochemical assays with biological chipsNature, 1993