Stability of multiple alignments and phylogenetic trees: an analysis of ABC-transporter proteins family
Open Access
- 6 November 2008
- journal article
- Published by Springer Nature in Algorithms for Molecular Biology
- Vol. 3 (1) , 15
- https://doi.org/10.1186/1748-7188-3-15
Abstract
Background Sequence-based phylogeny reconstruction is a fundamental task in Bioinformatics. Practically all methods for phylogeny reconstruction are based on multiple alignments. The quality and stability of the underlying alignments is therefore crucial for phylogenetic analysis. Results In this short report, we investigate alignments and alignment-based phylogenies constructed for a set of 22 ABC transporters using CLUSTAL W and DIALIGN. Comparing the 22 "one-out phylogenies" one can obtain for this sequence set, some intrinsic phylogenetic instability is observed — even if attention is restricted to branches with high bootstrapping frequencies, the so-called safe branches. We show that this instability is caused by the fact that both, CLUSTAL W as well as DIALIGN, apparently get "confused" by sequence repeats in some of the ABC-transporter. To deal with such problems, two new DIALIGN options are introduced that prove helpful in our context, the "exclude-fragment" (or "xfr") and the "self-comparison" (or "sc") option. Conclusion "One-out strategies", known to be a useful tool for testing the stability of all sorts of data-analysis procedures, can successfully be used also in testing alignment stability. In case instabilities are observed, the sequences under consideration should be carefully checked for putative causes. In case one suspects sequence repeats to be the cause, the new "sc" option can be used to detect such repeats, and the "xfr" option can help to resolve the resulting problems.Keywords
This publication has 34 references indexed in Scilit:
- DIALIGN-TX: greedy and progressive approaches for segment-based multiple sequence alignmentAlgorithms for Molecular Biology, 2008
- MAFFT version 5: improvement in accuracy of multiple sequence alignmentNucleic Acids Research, 2005
- Multiple sequence alignment with user-defined constraints at GOBICSBioinformatics, 2004
- MUSCLE: multiple sequence alignment with high accuracy and high throughputNucleic Acids Research, 2004
- The Composite Genome of the Legume Symbiont Sinorhizobium melilotiScience, 2001
- Detection of internal repeats: how common are they?Current Opinion in Structural Biology, 1998
- CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choiceNucleic Acids Research, 1994
- PHYLOGENIES FROM MOLECULAR SEQUENCES: INFERENCE AND RELIABILITYAnnual Review of Genetics, 1988
- Confidence Limits on Phylogenies: An Approach Using the BootstrapEvolution, 1985
- A general method applicable to the search for similarities in the amino acid sequence of two proteinsJournal of Molecular Biology, 1970