Exact Calculation of Distributions on Integers, with Application to Sequence Alignment
- 1 January 2009
- journal article
- research article
- Published by Mary Ann Liebert Inc in Journal of Computational Biology
- Vol. 16 (1) , 1-18
- https://doi.org/10.1089/cmb.2008.0137
Abstract
Computational biology is replete with high-dimensional discrete prediction and inference problems. Dynamic programming recursions can be applied to several of the most important of these, including sequence alignment, RNA secondary-structure prediction, phylogenetic inference, and motif finding. In these problems, attention is frequently focused on some scalar quantity of interest, a score, such as an alignment score or the free energy of an RNA secondary structure. In many cases, score is naturally defined on integers, such as a count of the number of pairing differences between two sequence alignments, or else an integer score has been adopted for computational reasons, such as in the test of significance of motif scores. The probability distribution of the score under an appropriate probabilistic model is of interest, such as in tests of significance of motif scores, or in calculation of Bayesian confidence limits around an alignment. Here we present three algorithms for calculating the exact distribution of a score of this type; then, in the context of pairwise local sequence alignments, we apply the approach so as to find the alignment score distribution and Bayesian confidence limits.Keywords
This publication has 13 references indexed in Scilit:
- Significance of Gapped Sequence AlignmentsJournal of Computational Biology, 2008
- Measuring Global Credibility with Application to Local Sequence AlignmentPLoS Computational Biology, 2008
- Evolution of genes and genomes on the Drosophila phylogenyNature, 2007
- PhyloScan: identification of transcription factor binding sites using cross-species evidenceAlgorithms for Molecular Biology, 2007
- Clustering of RNA Secondary Structures with Application to Messenger RNAsJournal of Molecular Biology, 2006
- RNA secondary structure prediction by centroids in a Boltzmann weighted ensembleRNA, 2005
- MONKEY: identifying conserved transcription-factor binding sites in multiple alignments using a binding site-specific evolutionary modelGenome Biology, 2004
- A reliable sequence alignment method based on probabilities of residue correspondencesProtein Engineering, Design and Selection, 1995
- Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes.Proceedings of the National Academy of Sciences, 1990
- Methods for calculating the probabilities of finding patterns in sequencesBioinformatics, 1989