Molecular modeling of protein function regions

Abstract
Experimental protein structures often provide extensive insight into the mode and specificity of small molecule binding, and this information is useful for understanding protein function and for the design of drugs. We have performed an analysis of the reliability with which ligand‐binding information can be deduced from computer model structures, as opposed to experimentally derived ones. Models produced as part of the CASP experiments are used. The accuracy of contacts between protein model atoms and experimentally determined ligand atom positions is the main criterion. Only comparative models are included (i.e., models based on a sequence relationship between the protein of interest and a known structure). We find that, as expected, contact errors increase with decreasing sequence identity used as a basis for modeling. Analysis of the causes of errors shows that sequence alignment errors between model and experimental template have the most deleterious effect. In general, good, but not perfect, insight into ligand binding can be obtained from models based on a sequence relationship, providing there are no alignment errors in the model. The results support a structural genomics strategy based on experimental sampling of structure space so that all protein domains can be modeled on the basis of 30% or higher sequence identity. Proteins 2004.