An Algorithm for Finding Maximal Common Subtopologies in a Set of Protein Structures

Abstract
For the comparison and analysis of protein structures, it is of interest to find maximal common substructures in a given set of proteins. This question is also relevant for motif definition and structure classification. In this paper we describe first a new suitable representation of the secondary structure topology of a protein by an undirected labeled graph. Based on this representation we developed a new fast algorithm that finds all common subtopologies in a set of protein structures. Our method is based on the algorithm by Bron and Kerbosch (1973), which enumerates all maximal cliques in a graph. The main improvement of our algorithm is to restrict the search process to cliques that represent connected substructures. This restriction reduces the number of cliques to be considered during the search process and the size of the search tree drastically. Thus we are able to handle large proteins. Experiments show the efficiency and superiority of our algorithm in comparison with other existing algorithms basing on graph-theoretical methods. Key words: secondary structure topology, maximum common subgraph, graph algorithm, structural similarity, maximal common substructure, multiple structure alignment.