Fold assessment for comparative protein structure modeling
Open Access
- 1 November 2007
- journal article
- research article
- Published by Wiley in Protein Science
- Vol. 16 (11) , 2412-2426
- https://doi.org/10.1110/ps.072895107
Abstract
Accurate and automated assessment of both geometrical errors and incompleteness of comparative protein structure models is necessary for an adequate use of the models. Here, we describe a composite score for discriminating between models with the correct and incorrect fold. To find an accurate composite score, we designed and applied a genetic algorithm method that searched for a most informative subset of 21 input model features as well as their optimized nonlinear transformation into the composite score. The 21 input features included various statistical potential scores, stereochemistry quality descriptors, sequence alignment scores, geometrical descriptors, and measures of protein packing. The optimized composite score was found to depend on (1) a statistical potential z‐score for residue accessibilities and distances, (2) model compactness, and (3) percentage sequence identity of the alignment used to build the model. The accuracy of the composite score was compared with the accuracy of assessment by single and combined features as well as by other commonly used assessment methods. The testing set was representative of models produced by automated comparative modeling on a genomic scale. The composite score performed better than any other tested score in terms of the maximum correct classification rate (i.e., 3.3% false positives and 2.5% false negatives) as well as the sensitivity and specificity across the whole range of thresholds. The composite score was implemented in our program MODELLER‐8 and was used to assess models in the MODBASE database that contains comparative models for domains in approximately 1.3 million protein sequences.Keywords
This publication has 79 references indexed in Scilit:
- A composite score for predicting errors in protein structure modelsProtein Science, 2006
- SWISS-MODEL: an automated protein homology-modeling serverNucleic Acids Research, 2003
- Distance‐scaled, finite ideal‐gas reference state improves structure‐derived potentials of mean force for structure selection and stability predictionProtein Science, 2002
- Statistical potentials for fold assessmentProtein Science, 2002
- Ab initio construction of protein tertiary structures using a hierarchical approachJournal of Molecular Biology, 2000
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- Comparative Protein Modelling by Satisfaction of Spatial RestraintsJournal of Molecular Biology, 1993
- A new approach to protein fold recognitionNature, 1992
- Assessment of protein models with three-dimensional profilesNature, 1992
- Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical featuresBiopolymers, 1983