Ultra-large alignments using phylogeny-aware profiles
Open Access
- 16 June 2015
- journal article
- research article
- Published by Springer Nature in Genome Biology
- Vol. 16 (1) , 1-15
- https://doi.org/10.1186/s13059-015-0688-z
Abstract
Many biological questions, including the estimation of deep evolutionary histories and the detection of remote homology between protein sequences, rely upon multiple sequence alignments and phylogenetic trees of large datasets. However, accurate large-scale multiple sequence alignment is very difficult, especially when the dataset contains fragmentary sequences. We present UPP, a multiple sequence alignment method that uses a new machine learning technique, the ensemble of hidden Markov models, which we propose here. UPP produces highly accurate alignments for both nucleotide and amino acid sequences, even on ultra-large datasets or datasets containing fragmentary sequences. UPP is available at https://github.com/smirarab/sepp .Keywords
All Related Versions
This publication has 32 references indexed in Scilit:
- Next-generation phylogenomicsBiology Direct, 2013
- DACTAL: divide-and-conquer trees (almost) without alignmentsBioinformatics, 2012
- Direct-coupling analysis of residue coevolution captures native contacts across many protein familiesProceedings of the National Academy of Sciences, 2011
- HMMER web server: interactive sequence similarity searchingNucleic Acids Research, 2011
- Multiple sequence alignment: a major challenge to large-scale phylogeneticsPLoS Currents, 2011
- INDELible: A Flexible Simulator of Biological Sequence EvolutionMolecular Biology and Evolution, 2009
- Infernal 1.0: inference of RNA alignmentsBioinformatics, 2009
- PROMALS3D: a tool for multiple protein sequence and structure alignmentsNucleic Acids Research, 2008
- MUSCLE: multiple sequence alignment with high accuracy and high throughputNucleic Acids Research, 2004
- The rapid generation of mutation data matrices from protein sequencesBioinformatics, 1992