Construction and annotation of large phylogenetic trees
- 1 January 2007
- journal article
- Published by CSIRO Publishing in Australian Systematic Botany
- Vol. 20 (4) , 287-301
- https://doi.org/10.1071/sb07006
Abstract
Broad availability of molecular sequence data allows construction of phylogenetic trees with 1000s or even 10 000s of taxa. This paper reviews methodological, technological and empirical issues raised in phylogenetic inference at this scale. Numerous algorithmic and computational challenges have been identified surrounding the core problem of reconstructing large trees accurately from sequence data, but many other obstacles, both upstream and downstream of this step, are less well understood. Before phylogenetic analysis, data must be generated de novo or extracted from existing databases, compiled into blocks of homologous data with controlled properties, aligned, examined for the presence of gene duplications or other kinds of complicating factors, and finally, combined with other evidence via supermatrix or supertree approaches. After phylogenetic analysis, confidence assessments are usually reported, along with other kinds of annotations, such as clade names, or annotations requiring additional inference procedures, such as trait evolution or divergence time estimates. Prospects for partial automation of large-tree construction are also discussed, as well as risks associated with ‘outsourcing’ phylogenetic inference beyond the systematics community.Keywords
This publication has 100 references indexed in Scilit:
- Inferring angiosperm phylogeny from EST data with widespread gene duplicationBMC Ecology and Evolution, 2007
- RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed modelsBioinformatics, 2006
- Relaxed Phylogenetics and Dating with ConfidencePLoS Biology, 2006
- Toward Automatic Reconstruction of a Highly Resolved Tree of LifeScience, 2006
- Obesity alters gut microbial ecologyProceedings of the National Academy of Sciences, 2005
- Performance of maximum parsimony and likelihood phylogenetics when evolution is heterogeneousNature, 2004
- IntroductionPhilosophical Transactions Of The Royal Society B-Biological Sciences, 2004
- An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG IIBotanical Journal of the Linnean Society, 2003
- Evolutionary analysis of 58 proteins encoded in six completely sequenced chloroplast genomes: Revised molecular estimates of two seed plant divergence timesÖsterreichische botanische Zeitschrift, 1997
- TESTING SIGNIFICANCE OF INCONGRUENCECladistics, 1994