Assessing the utility of coevolution-based residue–residue contact predictions in a sequence- and structure-rich era

Top Cited Papers

5 September 2013

journal article
research article
Published by Proceedings of the National Academy of Sciences in Proceedings of the National Academy of Sciences

Vol. 110 (39) , 15674-15679
https://doi.org/10.1073/pnas.1314045110

Abstract

Recently developed methods have shown considerable promise in predicting residue–residue contacts in protein 3D structures using evolutionary covariance information. However, these methods require large numbers of evolutionarily related sequences to robustly assess the extent of residue covariation, and the larger the protein family, the more likely that contact information is unnecessary because a reasonable model can be built based on the structure of a homolog. Here we describe a method that integrates sequence coevolution and structural context information using a pseudolikelihood approach, allowing more accurate contact predictions from fewer homologous sequences. We rigorously assess the utility of predicted contacts for protein structure prediction using large and representative sequence and structure databases from recent structure prediction experiments. We find that contact predictions are likely to be accurate when the number of aligned sequences (with sequence redundancy reduced to 90%) is greater than five times the length of the protein, and that accurate predictions are likely to be useful for structure modeling if the aligned sequences are more similar to the protein of interest than to the closest homolog of known structure. These conditions are currently met by 422 of the protein families collected in the Pfam database. Significance We develop an improved method for predicting residue–residue contacts in protein structures that achieves higher accuracy than previous methods by integrating structural context and sequence coevolution information. We then determine the conditions under which these predicted contacts are likely to be useful for structure modeling and identify more than 400 protein families where these conditions are currently met.

Keywords

This publication has 23 references indexed in Scilit:

Genomics-aided structure prediction
Proceedings of the National Academy of Sciences, 2012
Accurate de novo structure prediction of large transmembrane protein domains using fragment-assembly and correlated mutation analysis
Proceedings of the National Academy of Sciences, 2012
Three-Dimensional Structures of Membrane Proteins from Genomic Sequencing
Cell, 2012
Protein 3D Structure Computed from Evolutionary Sequence Variation
PLOS ONE, 2011
Direct-coupling analysis of residue coevolution captures native contacts across many protein families
Proceedings of the National Academy of Sciences, 2011
Identification of direct residue contacts in protein–protein interaction by message passing
Proceedings of the National Academy of Sciences, 2009
The Protein Model Portal
Journal of Structural and Functional Genomics, 2008
Improved residue contact prediction using support vector machines and a large feature set
BMC Bioinformatics, 2007
The MPI Bioinformatics Toolkit for protein sequence analysis
Nucleic Acids Research, 2006
The Protein Data Bank
Nucleic Acids Research, 2000