Sampling Trees from Evolutionary Models
Open Access
- 28 May 2010
- journal article
- research article
- Published by Oxford University Press (OUP) in Systematic Biology
- Vol. 59 (4) , 465-476
- https://doi.org/10.1093/sysbio/syq026
Abstract
A wide range of evolutionary models for species-level (and higher) diversification have been developed. These models can be used to test evolutionary hypotheses and provide comparisons with phylogenetic trees constructed from real data. To carry out these tests and comparisons, it is often necessary to sample, or simulate, trees from the evolutionary models. Sampling trees from these models is more complicated than it may appear at first glance, necessitating careful consideration and mathematical rigor. Seemingly straightforward sampling methods may produce trees that have systematically biased shapes or branch lengths. This is particularly problematic as there is no simple method for determining whether the sampled trees are appropriate. In this paper, we show why a commonly used simple sampling approach (SSA)—simulating trees forward in time until n species are first reached—should only be applied to the simplest pure birth model, the Yule model. We provide an alternative general sampling approach (GSA) that can be applied to most other models. Furthermore, we introduce the constant-rate birth–death model sampling approach, which samples trees very efficiently from a widely used class of models. We explore the bias produced by SSA and identify situations in which this bias is particularly pronounced. We show that using SSA can lead to erroneous conclusions: When using the inappropriate SSA, the variance of a gradually evolving trait does not correlate with the age of the tree; when the correct GSA is used, the trait variance correlates with tree age. The algorithms presented here are available in the Perl Bio::Phylo package, as a stand-alone program TreeSample, and in the R TreeSim package.Keywords
This publication has 39 references indexed in Scilit:
- Dynamics of origination and extinction in the marine fossil recordProceedings of the National Academy of Sciences, 2008
- The conditioned reconstructed processJournal of Theoretical Biology, 2008
- The delayed rise of present-day mammalsNature, 2007
- Which Random Processes Describe the Tree of Life? A Large-Scale Study of Phylogenetic Tree ImbalanceSystematic Biology, 2006
- Estimating the tempo and mode of gene family evolution from comparative genomic dataGenome Research, 2005
- A critical branching process model for biodiversityAdvances in Applied Probability, 2005
- The Impact of Species Concept on Biodiversity StudiesThe Quarterly Review of Biology, 2004
- Whole-Tree Methods for Detecting Differential Diversification RatesSystematic Biology, 2002
- Accounting for Mode of Speciation Increases Power and Realism of Tests of Phylogenetic AsymmetryThe American Naturalist, 1999
- Constant extinction, constrained diversification, and uncoordinated stasis in North American mammalsPalaeogeography, Palaeoclimatology, Palaeoecology, 1996