Automatic detection of conserved gene clusters in multiple genomes by graph comparison and P-quasi grouping
- 15 October 2000
- journal article
- research article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 28 (20) , 4029-4036
- https://doi.org/10.1093/nar/28.20.4029
Abstract
We previously reported two graph algorithms for analysis of genomic information: a graph comparison algorithm to detect locally similar regions called correlated clusters and an algorithm to find a graph feature called P-quasi complete linkage. Based on these algorithms we have developed an automatic procedure to detect conserved gene clusters and align orthologous gene orders in multiple genomes. In the first step, the graph comparison is applied to pairwise genome comparisons, where the genome is considered as a one-dimensionally connected graph with genes as its nodes, and correlated clusters of genes that share sequence similarities are identified. In the next step, the P-quasi complete linkage analysis is applied to grouping of related clusters and conserved gene clusters in multiple genomes are identified. In the last step, orthologous relations of genes are established among each conserved cluster. We analyzed 17 completely sequenced microbial genomes and obtained 2313 clusters when the completeness parameter P: was 40%. About one quarter contained at least two genes that appeared in the metabolic and regulatory pathways in the KEGG database. This collection of conserved gene clusters is used to refine and augment ortholog group tables in KEGG and also to define ortholog identifiers as an extension of EC numbers.Keywords
This publication has 34 references indexed in Scilit:
- Identification of common molecular subsequencesPublished by Elsevier ,2004
- KEGG: Kyoto Encyclopedia of Genes and GenomesNucleic Acids Research, 2000
- Detecting Protein Function and Protein-Protein Interactions from Genome SequencesScience, 1999
- Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequenceNature, 1998
- The Complete Genome Sequence of Escherichia coli K-12Science, 1997
- Conserved Clusters of Functionally Related Genes in Two Bacterial GenomesJournal of Molecular Evolution, 1997
- Complete Genome Sequence of the Methanogenic Archaeon, Methanococcus jannaschii Science, 1996
- Metabolism and evolution of Haemophilus influenzae deduced from a whole-genome comparison with Escherichia coliCurrent Biology, 1996
- Sequence Analysis of the Genome of the Unicellular Cyanobacterium Synechocystis sp. Strain PCC6803. II. Sequence Determination of the Entire Genome and Assignment of Potential Protein-coding RegionsDNA Research, 1996
- Whole-Genome Random Sequencing and Assembly of Haemophilus influenzae RdScience, 1995