Protein 3D Structure Computed from Evolutionary Sequence Variation

Top Cited Papers

Open Access

7 December 2011

journal article
research article
Published by Public Library of Science (PLoS) in PLOS ONE

Vol. 6 (12) , e28766
https://doi.org/10.1371/journal.pone.0028766

Abstract

The evolutionary trajectory of a protein through sequence space is constrained by its function. Collections of sequence homologs record the outcomes of millions of evolutionary experiments in which the protein evolves according to these constraints. Deciphering the evolutionary record held in these sequences and exploiting it for predictive and engineering purposes presents a formidable challenge. The potential benefit of solving this challenge is amplified by the advent of inexpensive high-throughput genomic sequencing. In this paper we ask whether we can infer evolutionary constraints from a set of sequence homologs of a protein. The challenge is to distinguish true co-evolution couplings from the noisy set of observed correlations. We address this challenge using a maximum entropy model of the protein sequence, constrained by the statistics of the multiple sequence alignment, to infer residue pair couplings. Surprisingly, we find that the strength of these inferred couplings is an excellent predictor of residue-residue proximity in folded structures. Indeed, the top-scoring residue couplings are sufficiently accurate and well-distributed to define the 3D protein fold with remarkable accuracy. We quantify this observation by computing, from sequence alone, all-atom 3D structures of fifteen test proteins from different fold classes, ranging in size from 50 to 260 residues., including a G-protein coupled receptor. These blinded inferences are de novo, i.e., they do not use homology modeling or sequence-similar fragments from known structures. The co-evolution signals provide sufficient information to determine accurate 3D protein structure to 2.7–4.8 Å C_α-RMSD error relative to the observed structure, over at least two-thirds of the protein (method called EVfold, details at http://EVfold.org). This discovery provides insight into essential interactions constraining protein evolution and will facilitate a comprehensive survey of the universe of protein structures, new strategies in protein and drug design, and the identification of functional genetic variants in normal and disease genomes.

Keywords

This publication has 80 references indexed in Scilit:

Improving Protein Structure Prediction Using Multiple Sequence-Based Contact Predictions
Structure, 2011
Protein Sectors: Evolutionary Units of Three-Dimensional Structure
Published by Elsevier ,2009
Sampling Bottlenecks in De novo Protein Structure Prediction
Journal of Molecular Biology, 2009
Structural genomics is the largest contributor of novel structural leverage
Journal of Structural and Functional Genomics, 2009
Structure prediction for CASP8 with all‐atom refinement using Rosetta
Proteins-Structure Function and Bioinformatics, 2009
Rewiring the Specificity of Two-Component Signal Transduction Systems
Cell, 2008
Accurate prediction of protein–protein interactions from sequence alignments using a Bayesian method
Molecular Systems Biology, 2008
Weak pairwise correlations imply strongly correlated network states in a neural population
Nature, 2006
Influence of conservation on calculations of amino acid covariance in multiple sequence alignments
Proteins-Structure Function and Bioinformatics, 2004
Conformation of twisted β-pleated sheets in proteins
Journal of Molecular Biology, 1973