Identifying Conserved Gene Clusters in the Presence of Homology Families
- 1 July 2005
- journal article
- research article
- Published by Mary Ann Liebert Inc in Journal of Computational Biology
- Vol. 12 (6) , 638-656
- https://doi.org/10.1089/cmb.2005.12.638
Abstract
The study of conserved gene clusters is important for understanding the forces behind genome organization and evolution, as well as the function of individual genes or gene groups. In this paper, we present a new model and algorithm for identifying conserved gene clusters from pairwise genome comparison. This generalizes a recent model called "gene teams." A gene team is a set of genes that appear homologously in two or more species, possibly in a different order yet with the distance of adjacent genes in the team for each chromosome always no more than a certain threshold. We remove the constraint in the original model that each gene must have a unique occurrence in each chromosome and thus allow the analysis on complex prokaryotic or eukaryotic genomes with extensive paralogs. Our algorithm analyzes a pair of chromosomes in O(mn) time and uses O(m+n) space, where m and n are the number of genes in the respective chromosomes. We demonstrate the utility of our methods by studying two bacterial genomes, E. coli K-12 and B. subtilis. Many of the teams identified by our algorithm correlate with documented E. coli operons, while several others match predicted operons, previously suggested by computational techniques. Our implementation and data are publicly available at euler.slu.edu/∼goldwasser/homologyteams/.Keywords
This publication has 13 references indexed in Scilit:
- An algorithmic view of gene teamsTheoretical Computer Science, 2004
- Tests for Gene ClusteringJournal of Computational Biology, 2003
- Gene teams: a new formalization of gene clusters for comparative genomicsComputational Biology and Chemistry, 2003
- The COG database: new developments in phylogenetic classification of proteins from complete genomesNucleic Acids Research, 2001
- Automatic detection of conserved gene clusters in multiple genomes by graph comparison and P-quasi groupingNucleic Acids Research, 2000
- HomologyTrends in Genetics, 2000
- Fast Algorithms to Enumerate All Common Intervals of Two PermutationsAlgorithmica, 2000
- Conservation of gene order: a fingerprint of proteins that physically interactPublished by Elsevier ,1998
- A Genomic Perspective on Protein FamiliesScience, 1997