Quantifying the evolutionary divergence of protein structures: The role of function change and function conservation
- 27 November 2009
- journal article
- research article
- Published by Wiley in Proteins-Structure Function and Bioinformatics
- Vol. 78 (1) , 181-196
- https://doi.org/10.1002/prot.22616
Abstract
The molecular clock hypothesis, stating that protein sequences diverge in evolution by accumulating amino acid substitutions at an almost constant rate, played a major role in the development of molecular evolution and boosted quantitative theories of evolutionary change. These studies were extended to protein structures by the seminal paper by Chothia and Lesk, which established the approximate proportionality between structure and sequence divergence. Here we analyse how function influences the relationship between sequence and structure divergence, studying four large superfamilies of evolutionarily related proteins: globins, aldolases, P‐loop and NADP‐binding. We introduce the contact divergence, which is more consistent with sequence divergence than previously used structure divergence measures. Our main findings are: (1) Small structure and sequence divergences are proportional, consistent with the molecular clock. Approximate validity of the clock is also supported by the analysis of the clustering coefficient of structure similarity networks. (2) Functional constraints strongly limit the structure divergence of proteins performing the same function and may allow to identify incomplete or wrong functional annotations. (3) The rate of structure versus sequence divergence is larger for proteins performing different functions than for proteins performing the same function. We conjecture that this acceleration is due to positive selection for new functions. Accelerations in structure divergence are also suggested by the analysis of the clustering coefficient. (4) For low sequence identity, structural diversity explodes. We conjecture that this explosion is related to functional diversification. (5) Large indels are almost always associated with function changes. Proteins 2010.Keywords
This publication has 52 references indexed in Scilit:
- Cross-Over between Discrete and Continuous Protein Structure Space: Insights into Automatic Classification and Networks of Protein StructuresPLoS Computational Biology, 2009
- InterPro: the integrative protein signature databaseNucleic Acids Research, 2008
- Protein Meta-Functional Signatures from Combining Sequence, Structure, Evolution, and Amino Acid Property InformationPLoS Computational Biology, 2008
- SSMap: A new UniProt-PDB mapping resource for the curation of structural-related information in the UniProt/Swiss-Prot KnowledgebaseBMC Bioinformatics, 2008
- Quantitative sequence-function relationships in proteins based on gene ontologyBMC Bioinformatics, 2007
- Emergence of Protein Fold Families through Rational DesignPLoS Computational Biology, 2006
- Improving the Precision of the Structure–Function Relationship by Considering Phylogenetic ContextPLoS Computational Biology, 2005
- The modern molecular clockNature Reviews Genetics, 2003
- CATH – a hierarchic classification of protein domain structuresPublished by Elsevier ,1997
- The rapid generation of mutation data matrices from protein sequencesBioinformatics, 1992