FastTree: Computing Large Minimum Evolution Trees with Profiles instead of a Distance Matrix
Top Cited Papers
Open Access
- 17 April 2009
- journal article
- Published by Oxford University Press (OUP) in Molecular Biology and Evolution
- Vol. 26 (7) , 1641-1650
- https://doi.org/10.1093/molbev/msp077
Abstract
Gene families are growing rapidly, but standard methods for inferring phylogenies do not scale to alignments with over 10,000 sequences. We present FastTree, a method for constructing large phylogenies and for estimating their reliability. Instead of storing a distance matrix, FastTree stores sequence profiles of internal nodes in the tree. FastTree uses these profiles to implement Neighbor-Joining and uses heuristics to quickly identify candidate joins. FastTree then uses nearest neighbor interchanges to reduce the length of the tree. For an alignment with N sequences, L sites, and a different characters, a distance matrix requires O(N(2)) space and O(N(2)L) time, but FastTree requires just O(NLa + N ) memory and O(N log (N)La) time. To estimate the tree's reliability, FastTree uses local bootstrapping, which gives another 100-fold speedup over a distance matrix. For example, FastTree computed a tree and support values for 158,022 distinct 16S ribosomal RNAs in 17 h and 2.4 GB of memory. Just computing pairwise Jukes-Cantor distances and storing them, without inferring a tree or bootstrapping, would require 17 h and 50 GB of memory. In simulations, FastTree was slightly more accurate than Neighbor-Joining, BIONJ, or FastME; on genuine alignments, FastTree's topologies had higher likelihoods. FastTree is available at http://microbesonline.org/fasttree.Keywords
This publication has 33 references indexed in Scilit:
- Quantitative Phylogenetic Assessment of Microbial Communities in Diverse EnvironmentsScience, 2007
- RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed modelsBioinformatics, 2006
- Neighbor-Joining RevealedMolecular Biology and Evolution, 2006
- Greengenes, a Chimera-Checked 16S rRNA Gene Database and Workbench Compatible with ARBApplied and Environmental Microbiology, 2006
- Pfam: clans, web tools and servicesNucleic Acids Research, 2006
- Protein Molecular Function Prediction by Bayesian PhylogenomicsPLoS Computational Biology, 2005
- Assessment of Protein Distance Measures and Tree-Building Methods for Phylogenetic Tree ReconstructionMolecular Biology and Evolution, 2005
- The MicrobesOnline Web site for comparative genomicsGenome Research, 2005
- Fast and Accurate Phylogeny Reconstruction Algorithms Based on the Minimum-Evolution PrincipleJournal of Computational Biology, 2002
- Confidence Limits on Phylogenies: An Approach Using the BootstrapEvolution, 1985