The Effects of Nucleotide Substitution Model Assumptions on Estimates of Nonparametric Bootstrap Support
Open Access
- 1 April 2002
- journal article
- research article
- Published by Oxford University Press (OUP) in Molecular Biology and Evolution
- Vol. 19 (4) , 394-405
- https://doi.org/10.1093/oxfordjournals.molbev.a004094
Abstract
The use of parameter-rich substitution models in molecular phylogenetics has been criticized on the basis that these models can cause a reduction both in accuracy and in the ability to discriminate among competing topologies. We have explored the relationship between nucleotide substitution model complexity and nonparametric bootstrap support under maximum likelihood (ML) for six data sets for which the true relationships are known with a high degree of certainty. We also performed equally weighted maximum parsimony analyses in order to assess the effects of ignoring branch length information during tree selection. We observed that maximum parsimony gave the lowest mean estimate of bootstrap support for the correct set of nodes relative to the ML models for every data set except one. For several data sets, we established that the exact distribution used to model among-site rate variation was critical for a successful phylogenetic analysis. Site-specific rate models were shown to perform very poorly relative to gamma and invariables sites models for several of the data sets most likely because of the gross underestimation of branch lengths. The invariable sites model also performed poorly for several data sets where this model had a poor fit to the data, suggesting that addition of the gamma distribution can be critical. Estimates of bootstrap support for the correct nodes often increased under gamma and invariable sites models relative to equal rates models. Our observations are contrary to the prediction that such models cause reduced confidence in phylogenetic hypotheses. Our results raise several issues regarding the process of model selection, and we briefly discuss model selection uncertainty and the role of sensitivity analyses in molecular phylogenetics.Keywords
This publication has 79 references indexed in Scilit:
- Avian evolution, Gondwana biogeography and the Cretaceous–Tertiary mass extinction eventProceedings Of The Royal Society B-Biological Sciences, 2001
- Exploring Among-Site Rate Variation Models in a Maximum Likelihood Framework Using Empirical Data: Effects of Model Assumptions on Estimates of Topology, Branch Lengths, and Bootstrap SupportSystematic Biology, 2001
- Phylogeny, Genome Evolution, and Host Specificity of Single-Stranded RNA Bacteriophage (Family Leviviridae)Journal of Molecular Evolution, 2001
- Topological bias and inconsistency of maximum likelihood using wrong modelsMolecular Biology and Evolution, 1999
- BEST‐FIT MAXIMUM‐LIKELIHOOD MODELS FOR PHYLOGENETIC INFERENCE: EMPIRICAL TESTS WITH KNOWN PHYLOGENIESEvolution, 1998
- Gene translocation links insects and crustaceansNature, 1998
- Model Selection and InferencePublished by Springer Nature ,1998
- Is Congruence between Data Partitions a Reliable Predictor of Phylogenetic Accuracy? Empirically Testing an Iterative Procedure for Choosing among Phylogenetic MethodsSystematic Biology, 1997
- Confidence intervals of evolutionary distances between sequences and comparison with usual approaches including the bootstrap method.Molecular Biology and Evolution, 1997
- Model Uncertainty, Data Mining and Statistical InferenceJournal of the Royal Statistical Society Series A: Statistics in Society, 1995