An Algorithm for Finding Maximal Common Subtopologies in a Set of Protein Structures
- 1 January 1996
- journal article
- research article
- Published by Mary Ann Liebert Inc in Journal of Computational Biology
- Vol. 3 (2) , 289-306
- https://doi.org/10.1089/cmb.1996.3.289
Abstract
For the comparison and analysis of protein structures, it is of interest to find maximal common substructures in a given set of proteins. This question is also relevant for motif definition and structure classification. In this paper we describe first a new suitable representation of the secondary structure topology of a protein by an undirected labeled graph. Based on this representation we developed a new fast algorithm that finds all common subtopologies in a set of protein structures. Our method is based on the algorithm by Bron and Kerbosch (1973), which enumerates all maximal cliques in a graph. The main improvement of our algorithm is to restrict the search process to cliques that represent connected substructures. This restriction reduces the number of cliques to be considered during the search process and the size of the search tree drastically. Thus we are able to handle large proteins. Experiments show the efficiency and superiority of our algorithm in comparison with other existing algorithms basing on graph-theoretical methods. Key words: secondary structure topology, maximum common subgraph, graph algorithm, structural similarity, maximal common substructure, multiple structure alignment.Keywords
This publication has 27 references indexed in Scilit:
- Identification of common molecular subsequencesPublished by Elsevier ,2004
- Identification of Tertiary Structure Resemblance in Proteins Using a Maximal Common Subgraph Isomorphism AlgorithmJournal of Molecular Biology, 1993
- Knowledge-based prediction of protein structuresJournal of Theoretical Biology, 1990
- Use of techniques derived from graph theory to compare secondary structure motifs in proteinsJournal of Molecular Biology, 1990
- A Simple Algorithm for Finding a Maximum Clique and Its Worst‐Case Time ComplexitySystems and Computers in Japan, 1990
- Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical featuresBiopolymers, 1983
- The protein data bank: A computer-based archival file for macromolecular structuresJournal of Molecular Biology, 1977
- Algorithm 457: finding all cliques of an undirected graphCommunications of the ACM, 1973
- A general method applicable to the search for similarities in the amino acid sequence of two proteinsJournal of Molecular Biology, 1970
- Congruent Graphs and the Connectivity of GraphsAmerican Journal of Mathematics, 1932