An Algorithm for Finding Maximal Common Subtopologies in a Set of Protein Structures

1 January 1996

journal article
research article
Published by Mary Ann Liebert Inc in Journal of Computational Biology

Vol. 3 (2) , 289-306
https://doi.org/10.1089/cmb.1996.3.289

Abstract

For the comparison and analysis of protein structures, it is of interest to find maximal common substructures in a given set of proteins. This question is also relevant for motif definition and structure classification. In this paper we describe first a new suitable representation of the secondary structure topology of a protein by an undirected labeled graph. Based on this representation we developed a new fast algorithm that finds all common subtopologies in a set of protein structures. Our method is based on the algorithm by Bron and Kerbosch (1973), which enumerates all maximal cliques in a graph. The main improvement of our algorithm is to restrict the search process to cliques that represent connected substructures. This restriction reduces the number of cliques to be considered during the search process and the size of the search tree drastically. Thus we are able to handle large proteins. Experiments show the efficiency and superiority of our algorithm in comparison with other existing algorithms basing on graph-theoretical methods. Key words: secondary structure topology, maximum common subgraph, graph algorithm, structural similarity, maximal common substructure, multiple structure alignment.

Keywords

This publication has 27 references indexed in Scilit:

Identification of common molecular subsequences
Published by Elsevier ,2004
Identification of Tertiary Structure Resemblance in Proteins Using a Maximal Common Subgraph Isomorphism Algorithm
Journal of Molecular Biology, 1993
Knowledge-based prediction of protein structures
Journal of Theoretical Biology, 1990
Use of techniques derived from graph theory to compare secondary structure motifs in proteins
Journal of Molecular Biology, 1990
A Simple Algorithm for Finding a Maximum Clique and Its Worst‐Case Time Complexity
Systems and Computers in Japan, 1990
Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features
Biopolymers, 1983
The protein data bank: A computer-based archival file for macromolecular structures
Journal of Molecular Biology, 1977
Algorithm 457: finding all cliques of an undirected graph
Communications of the ACM, 1973
A general method applicable to the search for similarities in the amino acid sequence of two proteins
Journal of Molecular Biology, 1970
Congruent Graphs and the Connectivity of Graphs
American Journal of Mathematics, 1932