Sequence embedding for fast construction of guide trees for multiple sequence alignment
Top Cited Papers
Open Access
- 14 May 2010
- journal article
- research article
- Published by Springer Nature in Algorithms for Molecular Biology
- Vol. 5 (1) , 21
- https://doi.org/10.1186/1748-7188-5-21
Abstract
The most widely used multiple sequence alignment methods require sequences to be clustered as an initial step. Most sequence clustering methods require a full distance matrix to be computed between all pairs of sequences. This requires memory and time proportional to N 2 for N sequences. When N grows larger than 10,000 or so, this becomes increasingly prohibitive and can form a significant barrier to carrying out very large multiple alignments.This publication has 29 references indexed in Scilit:
- The Ribosomal Database Project: improved alignments and new tools for rRNA analysisNucleic Acids Research, 2008
- Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein spaceBioinformatics, 2008
- Recent developments in the MAFFT multiple sequence alignment programBriefings in Bioinformatics, 2008
- Clustal W and Clustal X version 2.0Bioinformatics, 2007
- PartTree: an algorithm to build an approximate tree from a large number of unaligned sequencesBioinformatics, 2006
- Pfam: clans, web tools and servicesNucleic Acids Research, 2006
- MUSCLE: multiple sequence alignment with high accuracy and high throughputNucleic Acids Research, 2004
- T-coffee: a novel method for fast and accurate multiple sequence alignment 1 1Edited by J. ThorntonJournal of Molecular Biology, 2000
- The geometry of graphs and some of its algorithmic applicationsCombinatorica, 1995
- CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choiceNucleic Acids Research, 1994