A heuristic graph comparison algorithm and its application to detect functionally related enzyme clusters
- 15 October 2000
- journal article
- research article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 28 (20) , 4021-4028
- https://doi.org/10.1093/nar/28.20.4021
Abstract
The availability of computerized knowledge on biochemical pathways in the KEGG database opens new opportunities for developing computational methods to characterize and understand higher level functions of complete genomes. Our approach is based on the concept of graphs; for example, the genome is a graph with genes as nodes and the pathway is another graph with gene products as nodes. We have developed a simple method for graph comparison to identify local similarities, termed correlated clusters, between two graphs, which allows gaps and mismatches of nodes and edges and is especially suitable for detecting biological features. The method was applied to a comparison of the complete genomes of 10 microorganisms and the KEGG metabolic pathways, which revealed, not surprisingly, a tendency for formation of correlated clusters called FRECs (functionally related enzyme clusters). However, this tendency varied considerably depending on the organism. The relative number of enzymes in FRECs was close to 50% for Bacillus subtilis and Escherichia coli, but was <10% for SYNECHOCYSTIS: and Saccharomyces cerevisiae. The FRECs collection is reorganized into a collection of ortholog group tables in KEGG, which represents conserved pathway motifs with the information about gene clusters in all the completely sequenced genomes.Keywords
This publication has 21 references indexed in Scilit:
- The complete genome sequence of the Gram-positive bacterium Bacillus subtilisNature, 1997
- Complete genome sequence of Methanobacterium thermoautotrophicum deltaH: functional analysis and comparative genomicsJournal of Bacteriology, 1997
- The Complete Genome Sequence of Escherichia coli K-12Science, 1997
- A database for post-genome analysisTrends in Genetics, 1997
- The complete genome sequence of the gastric pathogen Helicobacter pyloriNature, 1997
- Genes and proteins of Escherichia coli K-12 (GenProtEC)Nucleic Acids Research, 1997
- The metabolic pathway collection: an updateNucleic Acids Research, 1997
- Conserved Clusters of Functionally Related Genes in Two Bacterial GenomesJournal of Molecular Evolution, 1997
- Complete Genome Sequence of the Methanogenic Archaeon, Methanococcus jannaschii Science, 1996
- Sequence Analysis of the Genome of the Unicellular Cyanobacterium Synechocystis sp. Strain PCC6803. II. Sequence Determination of the Entire Genome and Assignment of Potential Protein-coding RegionsDNA Research, 1996