Assessing the utility of coevolution-based residue–residue contact predictions in a sequence- and structure-rich era
Top Cited Papers
- 5 September 2013
- journal article
- research article
- Published by Proceedings of the National Academy of Sciences in Proceedings of the National Academy of Sciences
- Vol. 110 (39) , 15674-15679
- https://doi.org/10.1073/pnas.1314045110
Abstract
Recently developed methods have shown considerable promise in predicting residue–residue contacts in protein 3D structures using evolutionary covariance information. However, these methods require large numbers of evolutionarily related sequences to robustly assess the extent of residue covariation, and the larger the protein family, the more likely that contact information is unnecessary because a reasonable model can be built based on the structure of a homolog. Here we describe a method that integrates sequence coevolution and structural context information using a pseudolikelihood approach, allowing more accurate contact predictions from fewer homologous sequences. We rigorously assess the utility of predicted contacts for protein structure prediction using large and representative sequence and structure databases from recent structure prediction experiments. We find that contact predictions are likely to be accurate when the number of aligned sequences (with sequence redundancy reduced to 90%) is greater than five times the length of the protein, and that accurate predictions are likely to be useful for structure modeling if the aligned sequences are more similar to the protein of interest than to the closest homolog of known structure. These conditions are currently met by 422 of the protein families collected in the Pfam database. Significance We develop an improved method for predicting residue–residue contacts in protein structures that achieves higher accuracy than previous methods by integrating structural context and sequence coevolution information. We then determine the conditions under which these predicted contacts are likely to be useful for structure modeling and identify more than 400 protein families where these conditions are currently met.Keywords
This publication has 23 references indexed in Scilit:
- Genomics-aided structure predictionProceedings of the National Academy of Sciences, 2012
- Accurate de novo structure prediction of large transmembrane protein domains using fragment-assembly and correlated mutation analysisProceedings of the National Academy of Sciences, 2012
- Three-Dimensional Structures of Membrane Proteins from Genomic SequencingCell, 2012
- Protein 3D Structure Computed from Evolutionary Sequence VariationPLOS ONE, 2011
- Direct-coupling analysis of residue coevolution captures native contacts across many protein familiesProceedings of the National Academy of Sciences, 2011
- Identification of direct residue contacts in protein–protein interaction by message passingProceedings of the National Academy of Sciences, 2009
- The Protein Model PortalJournal of Structural and Functional Genomics, 2008
- Improved residue contact prediction using support vector machines and a large feature setBMC Bioinformatics, 2007
- The MPI Bioinformatics Toolkit for protein sequence analysisNucleic Acids Research, 2006
- The Protein Data BankNucleic Acids Research, 2000