Assessing the Accuracy of Ancestral Protein Reconstruction Methods

Open Access

23 June 2006

journal article
research article
Published by Public Library of Science (PLoS) in PLoS Computational Biology

Vol. 2 (6) , e69
https://doi.org/10.1371/journal.pcbi.0020069

Abstract

The phylogenetic inference of ancestral protein sequences is a powerful technique for the study of molecular evolution, but any conclusions drawn from such studies are only as good as the accuracy of the reconstruction method. Every inference method leads to errors in the ancestral protein sequence, resulting in potentially misleading estimates of the ancestral protein's properties. To assess the accuracy of ancestral protein reconstruction methods, we performed computational population evolution simulations featuring near-neutral evolution under purifying selection, speciation, and divergence using an off-lattice protein model where fitness depends on the ability to be stable in a specified target structure. We were thus able to compare the thermodynamic properties of the true ancestral sequences with the properties of “ancestral sequences” inferred by maximum parsimony, maximum likelihood, and Bayesian methods. Surprisingly, we found that methods such as maximum parsimony and maximum likelihood that reconstruct a “best guess” amino acid at each position overestimate thermostability, while a Bayesian method that sometimes chooses less-probable residues from the posterior probability distribution does not. Maximum likelihood and maximum parsimony apparently tend to eliminate variants at a position that are slightly detrimental to structural stability simply because such detrimental variants are less frequent. Other properties of ancestral proteins might be similarly overestimated. This suggests that ancestral reconstruction studies require greater care to come to credible conclusions regarding functional evolution. Inferred functional patterns that mimic reconstruction bias should be reevaluated. It is now possible to apply computational methods to known current protein sequences to recreate the sequences of ancestral proteins. By synthesising these proteins and measuring their properties in the laboratory, we can gain much information about the nature of evolution, better understand how proteins change and adapt over time, and develop insights into the environments of ancient organisms. Unfortunately, the accuracy of these reconstructions is difficult to evaluate. We simulate protein evolution using a simplified computational model and apply the various reconstruction methods to the sequences that arise from our simulations. Because we have the complete record of the evolutionary history, we can evaluate the reconstruction accuracy directly. We demonstrate that the reconstruction procedures in common use may have a bias toward overestimating the properties of these ancestral proteins, opposite to what has been assumed previously. An alternative method of creating these sequences is presented, Bayesian sampling, that can eliminate this bias and provide more robust conclusions.

Keywords

This publication has 32 references indexed in Scilit:

Resurrecting ancient genes: experimental analysis of extinct molecules
Nature Reviews Genetics, 2004
Inferring the palaeoenvironment of ancient bacteria on the basis of resurrected proteins
Nature, 2003
Bayesian Inference of Phylogeny and Its Impact on Evolutionary Biology
Science, 2001
Why are proteins marginally stable?
Proteins-Structure Function and Bioinformatics, 2001
INDEPENDENT CONTRASTS SUCCEED WHERE ANCESTOR RECONSTRUCTION FAILS IN A KNOWN BACTERIOPHAGE PHYLOGENY
Evolution, 2000
The Protein Data Bank
Nucleic Acids Research, 2000
The distribution of structures in evolving protein populations
Biopolymers, 2000
The foldability landscape of model proteins
Biopolymers, 1997
Reconstructing the evolutionary history of the artiodactyl ribonuclease superfamily
Nature, 1995
Experimental Phylogenetics: Generation of a Known Phylogeny
Science, 1992