Pairwise and Multiple Identification of Three-Dimensional Common Substructures in Proteins
- 1 January 1998
- journal article
- Published by Mary Ann Liebert Inc in Journal of Computational Biology
- Vol. 5 (1) , 41-56
- https://doi.org/10.1089/cmb.1998.5.41
Abstract
In this paper, we present an algorithm to find three-dimensional substructures common to two or more molecules. The basic algorithm is devoted to pairwise structural comparison. Given two sets of atomic coordinates, it finds the largest subsets of atoms which are "similar" in the sense that all internal distances are approximately conserved. The basic idea of the algorithm is to recursively build subsets of increasing sizes, combining two sets of size k to build a set of size k + 1. The algorithm can be used "as is" for small molecules or local parts of proteins (about 30 atoms). When a high number of atoms is involved, we use a two step procedure. First we look for common "local" fragments by using the previous algorithm, and then we gather these fragments by using a Branch and Bound technique. We also extend the basic algorithm to perform multiple comparisons, by using one of the structures as a reference point (pivot) to which all other structures are compared. The solution is the largest subsets of atoms common to the pivot and at least q other structures. Although both algorithms are theoretically exponential in the number of atoms, experiments performed on biological data and using realistic parameters show that the solution is obtained within a few minutes. Finally, an application to the determination of the structural core of seven globins is presented.Keywords
This publication has 12 references indexed in Scilit:
- Knowledge-Based Protein ModelingCritical Reviews in Biochemistry and Molecular Biology, 1994
- Identification of Tertiary Structure Resemblance in Proteins Using a Maximal Common Subgraph Isomorphism AlgorithmJournal of Molecular Biology, 1993
- An Efficient Automated Computer Vision Based Technique for Detection of Three Dimensional Structural Motifs in ProteinsJournal of Biomolecular Structure and Dynamics, 1992
- MOLSCRIPT: a program to produce both detailed and schematic plots of protein structuresJournal of Applied Crystallography, 1991
- Algorithms for the identification of three-dimensional maximal common substructuresJournal of Chemical Information and Computer Sciences, 1987
- Computer-assisted examination of compounds for common three-dimensional substructuresJournal of Chemical Information and Computer Sciences, 1983
- How different amino acid sequences determine similar protein structures: The structure and evolutionary dynamics of the globinsJournal of Molecular Biology, 1980
- A discussion of the solution for the best rotation to relate two sets of vectorsActa Crystallographica Section A, 1978
- The protein data bank: A computer-based archival file for macromolecular structuresJournal of Molecular Biology, 1977
- Algorithm 457: finding all cliques of an undirected graphCommunications of the ACM, 1973