Evaluating the absolute quality of a single protein model using structural features and support vector machines
- 25 September 2008
- journal article
- research article
- Published by Wiley in Proteins-Structure Function and Bioinformatics
- Vol. 75 (3) , 638-647
- https://doi.org/10.1002/prot.22275
Abstract
Knowing the quality of a protein structure model is important for its appropriate usage. We developed a model evaluation method to assess the absolute quality of a single protein model using only structural features with support vector machine regression. The method assigns an absolute quantitative score (i.e. GDT‐TS) to a model by comparing its secondary structure, relative solvent accessibility, contact map, and beta sheet structure with their counterparts predicted from its primary sequence. We trained and tested the method on the CASP6 dataset using cross‐validation. The correlation between predicted and true scores is 0.82. On the independent CASP7 dataset, the correlation averaged over 95 protein targets is 0.76; the average correlation for template‐based and ab initio targets is 0.82 and 0.50, respectively. Furthermore, the predicted absolute quality scores can be used to rank models effectively. The average difference (or loss) between the scores of the top‐ranked models and the best models is 5.70 on the CASP7 targets. This method performs favorably when compared with the other methods used on the same dataset. Moreover, the predicted absolute quality scores are comparable across models for different proteins. These features make the method a valuable tool for model quality assurance and ranking. Proteins 2009.Keywords
This publication has 66 references indexed in Scilit:
- Using inferred residue contacts to distinguish between correct and incorrect protein modelsBioinformatics, 2008
- Protein model quality assessment prediction by combining fragment comparisons and a consensus Cα contact potentialProteins-Structure Function and Bioinformatics, 2007
- Critical assessment of methods of protein structure prediction—Round VIIProteins-Structure Function and Bioinformatics, 2007
- OPUS‐Ca: A knowledge‐based potential function requiring only Cα positionsProtein Science, 2007
- ProSA-web: interactive web service for the recognition of errors in three-dimensional structures of proteinsNucleic Acids Research, 2007
- Statistical potential for assessment and prediction of protein structuresProtein Science, 2006
- Distance‐scaled, finite ideal‐gas reference state improves structure‐derived potentials of mean force for structure selection and stability predictionProtein Science, 2002
- AQUA and PROCHECK-NMR: Programs for checking the quality of protein structures solved by NMRJournal of Biomolecular NMR, 1996
- Assessment of protein models with three-dimensional profilesNature, 1992
- Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical featuresBiopolymers, 1983