Improved spliced alignment from an information theoretic approach
Open Access
- 2 November 2005
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 22 (1) , 13-20
- https://doi.org/10.1093/bioinformatics/bti748
Abstract
Motivation: mRNA sequences and expressed sequence tags represent some of the most abundant experimental data for identifying genes and alternatively spliced products in metazoans. These transcript sequences are frequently studied by aligning them to a genomic sequence template. For existing programs, error-prone, polymorphic and cross-species data, as well as non-canonical splice sites, still present significant barriers to producing accurate, complete alignments. Results: We took a novel approach to spliced alignment that meaningfully combined information from sequence similarity with that obtained from PSSM splice site models. Scoring systems were chosen to maximize their power of discrimination, and dynamic programming (DP) was employed to guarantee optimal solutions would be found. The resultant program, EXALIN, performed better than other popular tools tested under a wide range of conditions that included detection of micro-exons and human–mouse cross-species comparisons. For improved speed with only a marginal decrease in splice site prediction accuracy, EXALIN could perform limited DP guided by a result from BLASTN. Availability: The source code, binaries, scripts, scoring matrices and splice site models for human, mouse, rice and Caenorhabditis elegans utilized in this study are posted at . The software (scripts, source code and binaries) is copyrighted but free for all to use. Contact:gish@blast.wustl.edu Supplementary information:Keywords
This publication has 34 references indexed in Scilit:
- Amino acid substitution matrices from an information theoretic perspectivePublished by Elsevier ,2005
- Gene and alternative splicing annotation with AIRGenome Research, 2005
- Identification of common molecular subsequencesPublished by Elsevier ,2004
- Gene Structure Prediction and Alternative Splicing Analysis Using Genomically Aligned ESTsGenome Research, 2001
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- Basic Local Alignment Search ToolJournal of Molecular Biology, 1990
- Basic local alignment search toolJournal of Molecular Biology, 1990
- Computer Methods for Analyzing Sequence Recognition of Nucleic AcidsAnnual Review of Biophysics, 1988
- Selection of DNA binding sites by regulatory proteinsJournal of Molecular Biology, 1987
- A general method applicable to the search for similarities in the amino acid sequence of two proteinsJournal of Molecular Biology, 1970