Accounting for Solvent Accessibility and Secondary Structure in Protein Phylogenetics Is Clearly Beneficial
Open Access
- 10 March 2010
- journal article
- research article
- Published by Oxford University Press (OUP) in Systematic Biology
- Vol. 59 (3) , 277-287
- https://doi.org/10.1093/sysbio/syq002
Abstract
Amino acid substitution models are essential to most methods to infer phylogenies from protein data. These models represent the ways in which proteins evolve and substitutions accumulate along the course of time. It is widely accepted that the substitution processes vary depending on the structural configuration of the protein residues. However, this information is very rarely used in phylogenetic studies, though the 3-dimensional structure of dozens of thousands of proteins has been elucidated. Here, we reinvestigate the question in order to fill this gap. We use an improved estimation methodology and a very large database comprising 1471 nonredundant globular protein alignments with structural annotations to estimate new amino acid substitution models accounting for the secondary structure and solvent accessibility of the residues. These models incorporate a confidence coefficient that is estimated from the data and reflects the reliability and usefulness of structural annotations in the analyzed sequences. Our results with 300 independent test alignments show an impressive likelihood gain compared with standard models such as JTT or WAG. Moreover, the use of these models induces significant topological changes in the inferred trees, which should be of primary interest to phylogeneticists. Our data, models, and software are available for download from http://atgc.lirmm.fr/phyml-structure/.Keywords
This publication has 38 references indexed in Scilit:
- New Algorithms and Methods to Estimate Maximum-Likelihood Phylogenies: Assessing the Performance of PhyML 3.0Systematic Biology, 2010
- Phylogenetic mixture models for proteinsPhilosophical Transactions Of The Royal Society B-Biological Sciences, 2008
- An Improved General Amino Acid Replacement MatrixMolecular Biology and Evolution, 2008
- Accurate prediction of protein secondary structure and solvent accessibility by consensus combiners of sequence and structure informationBMC Bioinformatics, 2007
- Environment and exposure to solvent of protein atoms. Lysozyme and insulinPublished by Elsevier ,2004
- The Protein Data BankNucleic Acids Research, 2000
- The rapid generation of mutation data matrices from protein sequencesBioinformatics, 1992
- Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical featuresBiopolymers, 1983
- A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequencesJournal of Molecular Evolution, 1980
- A new look at the statistical model identificationIEEE Transactions on Automatic Control, 1974