PartTree: an algorithm to build an approximate tree from a large number of unaligned sequences
Open Access
- 21 November 2006
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 23 (3) , 372-374
- https://doi.org/10.1093/bioinformatics/btl592
Abstract
Motivation: To construct a multiple sequence alignment (MSA) of a large number (>∼10 000) of sequences, the calculation of a guide tree with a complexity of O(N2) to O(N3), where N is the number of sequences, is the most time-consuming process. Results: To overcome this limitation, we have developed an approximate algorithm, PartTree, to construct a guide tree with an average time complexity of O(N log N). The new MSA method with the PartTree algorithm can align ∼60 000 sequences in several minutes on a standard desktop computer. The loss of accuracy in MSA caused by this approximation was estimated to be several percent in benchmark tests using Pfam. Availability: The present algorithm has been implemented in the MAFFT sequence alignment package ( ). Contact:katoh@bioreg.kyushu-u.ac.jp Supplementary information: Supplementary information is available at Bioinformatics online.Keywords
This publication has 14 references indexed in Scilit:
- Pfam: clans, web tools and servicesNucleic Acids Research, 2006
- BAliBASE 3.0: Latest developments of the multiple sequence alignment benchmarkProteins-Structure Function and Bioinformatics, 2005
- ProbCons: Probabilistic consistency-based multiple sequence alignmentGenome Research, 2005
- MAFFT version 5: improvement in accuracy of multiple sequence alignmentNucleic Acids Research, 2005
- The Ribosomal Database Project (RDP-II): sequences and tools for high-throughput rRNA analysisNucleic Acids Research, 2004
- MUSCLE: multiple sequence alignment with high accuracy and high throughputNucleic Acids Research, 2004
- CLUSTAL: a package for performing multiple sequence alignment on a microcomputerPublished by Elsevier ,2003
- T-coffee: a novel method for fast and accurate multiple sequence alignment 1 1Edited by J. ThorntonJournal of Molecular Biology, 2000
- CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choiceNucleic Acids Research, 1994
- The rapid generation of mutation data matrices from protein sequencesBioinformatics, 1992