A template-finding algorithm and a comprehensive benchmark for homology modeling of proteins

25 February 2008

journal article
research article
Published by Wiley in Proteins-Structure Function and Bioinformatics

Vol. 72 (3) , 910-928
https://doi.org/10.1002/prot.21976

Abstract

The first step in homology modeling is to identify a template protein for the target sequence. The template structure is used in later phases of the calculation to construct an atomically detailed model for the target. We have built from the Protein Data Bank (PDB) a large-scale learning set that includes tens of millions of pair matches that can be either a true template or a false one. Discriminatory learning (learning from positive and negative examples) is used to train a decision tree. Each branch of the tree is a mathematical programming model. The decision tree is tested on an independent set from PDB entries and on the sequences of CASP7. It provides significant enrichment of true templates (between 50 and 100%) when compared to PSI-BLAST. The model is further verified by building atomically detailed structures for each of the tentative true templates with modeller. The probability that a true match does not yield an acceptable structural model (within 6 A RMSD from the native structure) decays linearly as a function of the TM structural-alignment score.

Keywords

Funding Information

NIH (GM067823)
Human Frontier Science Program (LT00469/2007-L)

This publication has 33 references indexed in Scilit:

OPUS‐Ca: A knowledge‐based potential function requiring only Cα positions
Protein Science, 2007
SCOP: A structural classification of proteins database for the investigation of sequences and structures
Published by Elsevier ,2006
Calibrating E-values for hidden Markov models using reverse-sequence null models
Bioinformatics, 2005
A decade of CASP: progress, bottlenecks and prognosis in protein structure prediction
Current Opinion in Structural Biology, 2005
Combining prediction of secondary structure and solvent accessibility in proteins
Proteins-Structure Function and Bioinformatics, 2005
Comparative Protein Structure Modeling of Genes and Genomes
Annual Review of Biophysics, 2000
The Protein Data Bank
Nucleic Acids Research, 2000
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
Nucleic Acids Research, 1997
Comparative Protein Modelling by Satisfaction of Spatial Restraints
Journal of Molecular Biology, 1993
Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features
Biopolymers, 1983