Clustering Gene Expression Patterns
- 1 October 1999
- journal article
- research article
- Published by Mary Ann Liebert Inc in Journal of Computational Biology
- Vol. 6 (3-4) , 281-297
- https://doi.org/10.1089/106652799318274
Abstract
Recent advances in biotechnology allow researchers to measure expression levels for thousands of genes simultaneously, across different conditions and over time. Analysis of data produced by such experiments offers potential insight into gene function and regulatory mechanisms. A key step in the analysis of gene expression data is the detection of groups of genes that manifest similar expression patterns. The corresponding algorithmic problem is to cluster multicondition gene expression patterns. In this paper we describe a novel clustering algorithm that was developed for analysis of gene expression data. We define an appropriate stochastic error model on the input, and prove that under the conditions of the model, the algorithm recovers the cluster structure with high probability. The running time of the algorithm on an n-gene dataset is O{n2[log(n)]c}. We also present a practical heuristic based on the same algorithmic ideas. The heuristic was implemented and its performance is demonstrated on simulated data and on real gene expression data, with very promising results.Keywords
This publication has 18 references indexed in Scilit:
- [12] DNA arrays for analysis of gene expressionPublished by Elsevier ,1999
- Deterministic annealing for clustering, compression, classification, regression, and related optimization problemsProceedings of the IEEE, 1998
- Exploring the Metabolic and Genetic Control of Gene Expression on a Genomic ScaleScience, 1997
- Cluster analysis and mathematical programmingMathematical Programming, 1997
- Sequence to array: Probing the genome's secretsNature Biotechnology, 1996
- Expression monitoring by hybridization to high-density oligonucleotide arraysNature Biotechnology, 1996
- Expected complexity of graph partitioning problemsDiscrete Applied Mathematics, 1995
- Hybridization analyses of arrayed cDNA librariesTrends in Genetics, 1991
- Improved Chips for Sequencing by HybridizationJournal of Biomolecular Structure and Dynamics, 1991
- A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of ObservationsThe Annals of Mathematical Statistics, 1952