Estimating the Gumbel scale parameter for local alignment of random sequences by importance sampling with stopping times
Open Access
- 1 December 2009
- journal article
- Published by Institute of Mathematical Statistics in The Annals of Statistics
- Vol. 37 (6A) , 3697-3714
- https://doi.org/10.1214/08-aos663
Abstract
The gapped local alignment score of two random sequences follows a Gumbel distribution. If computers could estimate the parameters of the Gumbel distribution within one second, the use of arbitrary alignment scoring schemes could increase the sensitivity of searching biological sequence databases over the web. Accordingly, this article gives a novel equation for the scale parameter of the relevant Gumbel distribution. We speculate that the equation is exact, although present numerical evidence is limited. The equation involves ascending ladder variates in the global alignment of random sequences. In global alignment simulations, the ladder variates yield stopping times specifying random sequence lengths. Because of the random lengths, and because our trial distribution for importance sampling occurs on a different sample space from our target distribution, our study led to a mapping theorem, which led naturally in turn to an efficient dynamic programming algorithm for the importance sampling weights. Numerical studies using several popular alignment scoring schemes then examined the efficiency and accuracy of the resulting simulations.Keywords
All Related Versions
This publication has 27 references indexed in Scilit:
- Accelerated convergence and robust asymptotic regression of the Gumbel scale parameter for gapped sequence alignmentJournal of Physics A: General Physics, 2004
- An improved algorithm for matching biological sequencesPublished by Elsevier ,2004
- Upper bounds and importance sampling of p-values for DNA and protein sequence alignmentsBernoulli, 2003
- Approximate P-Values for Local Sequence Alignments: Numerical StudiesJournal of Computational Biology, 2001
- Accurate formula for P-values of gapped local sequence and profile alignmentsJournal of Molecular Biology, 2000
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- A Phase Transition for the Score in Matching Random Sequences Allowing DeletionsThe Annals of Applied Probability, 1994
- Basic local alignment search toolJournal of Molecular Biology, 1990
- Some biological sequence metricsAdvances in Mathematics, 1976
- A CONVEXITY PROPERTY OF POSITIVE MATRICESThe Quarterly Journal of Mathematics, 1961