Comparative assessment of methods for aligning multiple genome sequences
- 23 May 2010
- journal article
- research article
- Published by Springer Nature in Nature Biotechnology
- Vol. 28 (6) , 567-572
- https://doi.org/10.1038/nbt.1637
Abstract
Mining information from genomes often begins by aligning the sequences to identify evolutionarily conserved regions. Chen et al. assess the performance of four commonly used multiple sequence alignment tools. Multiple sequence alignment is a difficult computational problem. There have been compelling pleas for methods to assess whole-genome multiple sequence alignments and compare the alignments produced by different tools. We assess the four ENCODE alignments, each of which aligns 28 vertebrates on 554 Mbp of total input sequence. We measure the level of agreement among the alignments and compare their coverage and accuracy. We find a disturbing lack of agreement among the alignments not only in species distant from human, but even in mouse, a well-studied model organism. Overall, the assessment shows that Pecan produces the most accurate or nearly most accurate alignment in all species and genomic location categories, while still providing coverage comparable to or better than that of the other alignments in the placental mammals. Our assessment reveals that constructing accurate whole-genome multiple sequence alignments remains a significant challenge, particularly for noncoding regions and distantly related species.Keywords
This publication has 39 references indexed in Scilit:
- Detection of nonneutral substitution rates on mammalian phylogeniesGenome Research, 2009
- Targeted discovery of novel human exons by comparative genomicsGenome Research, 2007
- Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genomeGenome Research, 2007
- Systematic discovery of regulatory motifs in conserved regions of the human genome, including thousands of CTCF insulator sitesProceedings of the National Academy of Sciences, 2007
- Identification and Classification of Conserved RNA Secondary Structures in the Human GenomePLoS Computational Biology, 2006
- Using Multiple Alignments to Improve Gene PredictionJournal of Computational Biology, 2006
- Mapping of conserved RNA secondary structures predicts thousands of functional noncoding RNAs in the human genomeNature Biotechnology, 2005
- Distribution and intensity of constraint in mammalian genomic sequenceGenome Research, 2005
- Highly Conserved Non-Coding Sequences Are Associated with Vertebrate DevelopmentPLoS Biology, 2004
- Identification and Characterization of Multi-Species Conserved SequencesGenome Research, 2003