Aligning short reads to reference alignments and trees
Open Access
- 2 June 2011
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 27 (15) , 2068-2075
- https://doi.org/10.1093/bioinformatics/btr320
Abstract
Motivation: Likelihood-based methods for placing short read sequences from metagenomic samples into reference phylogenies have been recently introduced. At present, it is unclear how to align those reads with respect to the reference alignment that was deployed to infer the reference phylogeny. Moreover, the adaptability of such alignment methods with respect to the underlying reference alignment strategies/philosophies has not been explored. It has also not been assessed if the reference phylogeny can be deployed in conjunction with the reference alignment to improve alignment accuracy in this context. Results: We assess different strategies for short read alignment and propose a novel phylogeny-aware alignment procedure. Our alignment method can improve the accuracy of subsequent phylogenetic placement of the reads into a reference phylogeny by up to 5.8 times compared with phylogeny-agnostic methods. It can be deployed to align reads to alignments generated by using fundamentally different alignment strategies (e.g. PRANK+F versus MUSCLE). Availability:http://www.exelixis-lab.org/software.html Contact:simon.berger@h-its.org; alexandros.stamatakis@h-its.org Supplementary information: Supplementary data are available at Bioinformatics online.Keywords
This publication has 20 references indexed in Scilit:
- Performance, Accuracy, and Web Server for Evolutionary Placement of Short Sequence Reads under Maximum LikelihoodSystematic Biology, 2011
- Characteristics of 454 pyrosequencing data—enabling realistic simulation with flowsimBioinformatics, 2010
- The GAAS Metagenomic Tool and Its Estimations of Viral and Microbial Average Genome Size in Four Major BiomesPLoS Computational Biology, 2009
- The influence of sex, handedness, and washing on the diversity of hand surface bacteriaProceedings of the National Academy of Sciences, 2008
- Striped Smith–Waterman speeds database searches six times over other SIMD implementationsBioinformatics, 2006
- An improved algorithm for matching biological sequencesPublished by Elsevier ,2004
- MUSCLE: multiple sequence alignment with high accuracy and high throughputNucleic Acids Research, 2004
- Profile hidden Markov models.Bioinformatics, 1998
- A contig assembly program based on sensitive detection of fragment overlapsGenomics, 1992
- Construction of Phylogenetic TreesScience, 1967