The Relationship between the Sequence Identities of Alpha Helical Proteins in the PDB and the Molecular Similarities of Their Ligands

1 November 2001

journal article
research article
Published by American Chemical Society (ACS) in Journal of Chemical Information and Computer Sciences

Vol. 41 (6) , 1617-1622
https://doi.org/10.1021/ci010364q

Abstract

This paper considers the relationship between the percentage sequence identities of protein chains and the molecular similarities of the ligands they bind. Among a set of alpha helical proteins from the PDB, it is found that related proteins tend to bind similar ligands. Furthermore, the property of binding similar ligands can be used to define the categories of “like” and “unlike” pairs of protein chains, separated by an approximate cutoff at a sequence identity of, or somewhat above, 45%. Similarly, the property of binding related protein chains can be used to define “low” and “high” similarity pairs of ligand residues, with a cutoff at a Tanimoto score of 0.70. The ligands bound to two “like” protein chains are five times more likely to be of high similarity than would be expected if protein sequence identity and ligand molecular similarity were independent variables. Nonetheless, the nature of the PDB means that it is unclear whether the same conclusions would be reached with a data set representing an unbiased sample of all protein−ligand complexes in a living cell. The construction of an appropriate data set for such a study represents a significant challenge.

Keywords

This publication has 10 references indexed in Scilit:

Fast assignment of protein structures to sequences using the Intermediate Sequence Library PDB-ISL
Bioinformatics, 2000
The Protein Data Bank
Nucleic Acids Research, 2000
The relationship between protein structure and function: a comprehensive survey with application to the yeast genome
Journal of Molecular Biology, 1999
Sequences annotated by structure: a tool to facilitate the use of structural information in sequence analysis
Protein Engineering, Design and Selection, 1998
On the Properties of Bit String-Based Measures of Chemical Similarity
Journal of Chemical Information and Computer Sciences, 1998
PDBsum: a web-based database of summaries and analyses of all PDB structures
Trends in Biochemical Sciences, 1997
CATH – a hierarchic classification of protein domain structures
Published by Elsevier ,1997
Selecting Optimally Diverse Compounds from Structure Databases: A Validation Study of Two-Dimensional and Three-Dimensional Molecular Descriptors
Journal of Medicinal Chemistry, 1997
Neighborhood Behavior: A Useful Concept for Validation of “Molecular Diversity” Descriptors
Journal of Medicinal Chemistry, 1996
Identification and classification of protein fold families
Protein Engineering, Design and Selection, 1993