The Multiple Common Point Set Problem and Its Application to Molecule Binding Pattern Detection
- 1 March 2006
- journal article
- research article
- Published by Mary Ann Liebert Inc in Journal of Computational Biology
- Vol. 13 (2) , 407-428
- https://doi.org/10.1089/cmb.2006.13.407
Abstract
Recognition of binding patterns common to a set of protein structures is important for recognition of function, prediction of binding, and drug design. We consider protein binding sites represented by a set of 3D points with assigned physico-chemical and geometrical properties important for protein-ligand interactions. We formulate the multiple binding site alignment problem as detection of the largest common set of such 3D points. We discuss the computational problem of multiple common point set detection and, particularly, the matching problem in K-partite-epsilon graphs, where K partitions are associated with K structures and edges are defined between epsilon-close points. We show that the K-partite-epsilon matching problem is NP-hard in the Euclidean space with dimension larger than one. Consequently, we show that the largest common point set problem between three point sets is NP-hard. On the practical side, we present a novel computational method, MultiBind, for recognition of binding patterns common to a set of protein structures. It performs a multiple alignment between protein binding sites in the absence of overall sequence, fold, or binding partner similarity. Despite the NP-hardness results, in our applications, we practically overcome the exponential number of multiple alignment combinations by applying an efficient branchand- bound filtering procedure. We show applications of MultiBind to several biological targets. The method recognizes patterns which are responsible for binding small molecules, such as estradiol, ATP/ANP, and transition state analogues.Keywords
This publication has 36 references indexed in Scilit:
- The ASTRAL Compendium in 2004Nucleic Acids Research, 2004
- MASS: multiple structural alignment by secondary structuresBioinformatics, 2003
- Adenine recognition: A motif present in ATP‐, CoA‐, NAD‐, NADP‐, and FAD‐dependent proteinsProteins-Structure Function and Bioinformatics, 2001
- On the approximation of largest common subtrees and largest common point setsTheoretical Computer Science, 2000
- The ASTRAL compendium for protein structure and sequence analysisNucleic Acids Research, 2000
- The Protein Data BankNucleic Acids Research, 2000
- Approximation Algorithms for 3-D Common Substructure Identification in Drug and Protein MoleculesPublished by Springer Nature ,1999
- A Graph-theoretic Approach to the Identification of Three-dimensional Patterns of Amino Acid Side-chains in Protein StructuresJournal of Molecular Biology, 1994
- Measurement of protein surface shape by solid anglesJournal of Molecular Graphics, 1986
- Analytical molecular surface calculationJournal of Applied Crystallography, 1983