Sequence embedding for fast construction of guide trees for multiple sequence alignment

Top Cited Papers

Open Access

14 May 2010

journal article
research article
Published by Springer Nature in Algorithms for Molecular Biology

Vol. 5 (1) , 21
https://doi.org/10.1186/1748-7188-5-21

Abstract

The most widely used multiple sequence alignment methods require sequences to be clustered as an initial step. Most sequence clustering methods require a full distance matrix to be computed between all pairs of sequences. This requires memory and time proportional to N 2 for N sequences. When N grows larger than 10,000 or so, this becomes increasingly prohibitive and can form a significant barrier to carrying out very large multiple alignments.

This publication has 29 references indexed in Scilit:

The Ribosomal Database Project: improved alignments and new tools for rRNA analysis
Nucleic Acids Research, 2008
Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space
Bioinformatics, 2008
Recent developments in the MAFFT multiple sequence alignment program
Briefings in Bioinformatics, 2008
Clustal W and Clustal X version 2.0
Bioinformatics, 2007
PartTree: an algorithm to build an approximate tree from a large number of unaligned sequences
Bioinformatics, 2006
Pfam: clans, web tools and services
Nucleic Acids Research, 2006
MUSCLE: multiple sequence alignment with high accuracy and high throughput
Nucleic Acids Research, 2004
T-coffee: a novel method for fast and accurate multiple sequence alignment 1 1Edited by J. Thornton
Journal of Molecular Biology, 2000
The geometry of graphs and some of its algorithmic applications
Combinatorica, 1995
CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice
Nucleic Acids Research, 1994