Accurate anchoring alignment of divergent sequences
Open Access
- 13 November 2005
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 22 (1) , 29-34
- https://doi.org/10.1093/bioinformatics/bti772
Abstract
Motivation: Obtaining high quality alignments of divergent homologous sequences for cross-species sequence comparison remains a challenge. Results: We propose a novel pairwise sequence alignment algorithm, ACANA (ACcurate ANchoring Alignment), for aligning biological sequences at both local and global levels. Like many fast heuristic methods, ACANA uses an anchoring strategy. However, unlike others, ACANA uses a Smith–Waterman-like dynamic programming algorithm to recursively identify near-optimal regions as anchors for a global alignment. Performance evaluations using a simulated benchmark dataset and real promoter sequences suggest that ACANA is accurate and consistent, especially for divergent sequences. Specifically, we use a simulated benchmark dataset to show that ACANA has the highest sensitivity to align constrained functional sites compared to BLASTZ, CHAOS and DIALIGN for local alignment and compared to AVID, ClustalW, DIALIGN and LAGAN for global alignment. Applied to 6007 pairs of human-mouse orthologous promoter sequences, ACANA identified the largest number of conserved regions (defined as over 70% identity over 100 bp) compared to AVID, ClustalW, DIALIGN and LAGAN. In addition, the average length of conserved region identified by ACANA was the longest. Thus, we suggest that ACANA is a useful tool for identifying functional elements in cross-species sequence analysis, such as predicting transcription factor binding sites in non-coding DNA. Availability: ACANA software and test sequence data are publicly available at Supplementary information: Supplementary materials are available at Bioinformatics online. Contact:li3@niehs.nih.govKeywords
This publication has 35 references indexed in Scilit:
- The many faces of sequence alignmentBriefings in Bioinformatics, 2005
- Fast and sensitive multiple alignment of large genomic sequencesBMC Bioinformatics, 2003
- LAGAN and Multi-LAGAN: Efficient Tools for Large-Scale Multiple Alignment of Genomic DNAGenome Research, 2003
- AVID: A Global Alignment ProgramGenome Research, 2002
- BAliBASE (Benchmark Alignment dataBASE): enhancements for repeats, transmembrane sequences and circular permutationsNucleic Acids Research, 2001
- Human and Mouse Gene Structure: Comparative Analysis and Application to Exon PredictionGenome Research, 2000
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- [27] Local alignment statisticsPublished by Elsevier ,1996
- An efficient algorithm to locate all locally optimal alignments between two sequences allowing for gapsBioinformatics, 1993
- Basic local alignment search toolJournal of Molecular Biology, 1990