Development of a Chemical Structure Comparison Method for Integrated Analysis of Chemical and Genomic Information in the Metabolic Pathways
Top Cited Papers
- 6 September 2003
- journal article
- research article
- Published by American Chemical Society (ACS) in Journal of the American Chemical Society
- Vol. 125 (39) , 11853-11865
- https://doi.org/10.1021/ja036030u
Abstract
Cellular functions result from intricate networks of molecular interactions, which involve not only proteins and nucleic acids but also small chemical compounds. Here we present an efficient algorithm for comparing two chemical structures of compounds, where the chemical structure is treated as a graph consisting of atoms as nodes and covalent bonds as edges. On the basis of the concept of functional groups, 68 atom types (node types) are defined for carbon, nitrogen, oxygen, and other atomic species with different environments, which has enabled detection of biochemically meaningful features. Maximal common subgraphs of two graphs can be found by searching for maximal cliques in the association graph, and we have introduced heuristics to accelerate the clique finding and to detect optimal local matches (simply connected common subgraphs). Our procedure was applied to the comparison and clustering of 9383 compounds, mostly metabolic compounds, in the KEGG/LIGAND database. The largest clusters of similar compounds were related to carbohydrates, and the clusters corresponded well to the categorization of pathways as represented by the KEGG pathway map numbers. When each pathway map was examined in more detail, finer clusters could be identified corresponding to subpathways or pathway modules containing continuous sets of reaction steps. Furthermore, it was found that the pathway modules identified by similar compound structures sometimes overlap with the pathway modules identified by genomic contexts, namely, by operon structures of enzyme genes.Keywords
This publication has 33 references indexed in Scilit:
- Bioinformatics in the post-sequence eraNature Genetics, 2003
- The KEGG databases at GenomeNetNucleic Acids Research, 2002
- SCOP database in 2002: refinements accommodate structural genomicsNucleic Acids Research, 2002
- Prediction of higher order functional networks from genomic dataPharmacogenomics, 2001
- Protein function in the post-genomic eraNature, 2000
- Gene Ontology: tool for the unification of biologyNature Genetics, 2000
- A Genomic Perspective on Protein FamiliesScience, 1997
- A database for post-genome analysisTrends in Genetics, 1997
- Characterization of the Yeast TranscriptomeCell, 1997
- Progress with Proteome Projects: Why all Proteins Expressed by a Genome Should be Identified and How To Do ItBiotechnology and Genetic Engineering Reviews, 1996