A memetic co-clustering algorithm for gene expression profiles and biological annotation

17 January 2005

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

p. 1631-1638
https://doi.org/10.1109/cec.2004.1331091

Abstract

With the invention of microarrays, researchers are capable of measuring thousands of gene expression levels in parallel at various time points of the biological process. To investigate general regulatory mechanisms, biologists cluster genes based on their expression patterns. In this paper, we propose a new memetic co-clustering algorithm for expression profiles, which incorporates a priori knowledge in the form of Gene Ontology information. Ontologies offer a mechanism to capture knowledge in a shareable form that is also processable by computers. The use of this additional annotation information promises to improve biological data analysis and simplifies the identification of processes that are relevant under the measured conditions. I. I NTRODUCTION In the past few years, DNA microarrays have become one of the major tools in the field of gene expression analysis. In contrast to traditional methods, this technology enables the monitoring of expression levels of thousands of genes in parallel (36). Thus, microarrays are a powerful tool helping to understand the underlying regulatory mechanisms of a cell. A problem inherent in the use of DNA arrays is the tremendous amount of data produced, whose analysis itself constitutes a challenge. Several approaches have been applied to analyze microarray data including principal component analysis (35) as well as supervised (12) and unsupervised learning (10), (32), (33). In unsupervised learning, clustering techniques are utilized to extract the gene expression patterns inherent in the data and thus find potentially co-regulated genes. Various methods have been applied, such as self-organizing- maps (SOMs) (32), k-means (33) and hierarchical clustering (10). Evolutionary approaches have also been applied to gene expression data and were shown to be superior to classical clustering algorithms (23), (30). Although the results of all these approaches are useful, one basic problem remains: none of these methods incorporates known biological information. Therefore, biologists are still forced to do a sequential analysis of their data by first clus- tering the expression data alone and afterwards annotating the genes of each cluster by hand and thus incorporating biological information into their models. Such an approach is slow and exhausting and may also result in a suboptimal clustering since information from other resources could often help in resolving ambiguities or avoiding errors caused by linkages based on noisy data or spurious similarities. One major problem of pure clustering methods is that cluster boundaries are often close and may also be arbitrary to some degree. Our work is based on the expectation that the use of the available biological knowledge is essential for the development of powerful automatic methods for the analysis of gene expression data. To our knowledge there are only a few published attempts that make use of additional biological information for the interpretation of gene expression profiles.

Keywords

This publication has 18 references indexed in Scilit:

Clustering gene expression data with memetic algorithms based on minimum spanning trees
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2004
Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation
Bioinformatics, 2003
Co-clustering of biological networks and gene expression data
Bioinformatics, 2002
Clustering spatial data using random walks
Published by Association for Computing Machinery (ACM) ,2001
Application of Regulatory Sequence Analysis and Metabolic Network Analysis to the Interpretation of Gene Expression Data
Published by Springer Nature ,2001
The Transcriptional Program in the Response of Human Fibroblasts to Serum
Science, 1999
Toward a Theory of Evolution Strategies: On the Benefits of Sex— the (μ/μ, λ) Theory
Evolutionary Computation, 1995
Development and application of a metric on semantic nets
IEEE Transactions on Systems, Man, and Cybernetics, 1989
Shortest Connection Networks And Some Generalizations
Bell System Technical Journal, 1957
On the Shortest Spanning Subtree of a Graph and the Traveling Salesman Problem
Proceedings of the American Mathematical Society, 1956