Computing expectation values for RNA motifs using discrete convolutions
Open Access
- 13 May 2005
- journal article
- research article
- Published by Springer Nature in BMC Bioinformatics
- Vol. 6 (1) , 118
- https://doi.org/10.1186/1471-2105-6-118
Abstract
Computational biologists use Expectation values (E-values) to estimate the number of solutions that can be expected by chance during a database scan. Here we focus on computing Expectation values for RNA motifs defined by single-strand and helix lod-score profiles with variable helix spans. Such E-values cannot be computed assuming a normal score distribution and their estimation previously required lengthy simulations. We introduce discrete convolutions as an accurate and fast mean to estimate score distributions of lod-score profiles. This method provides excellent score estimations for all single-strand or helical elements tested and also applies to the combination of elements into larger, complex, motifs. Further, the estimated distributions remain accurate even when pseudocounts are introduced into the lod-score profiles. Estimated score distributions are then easily converted into E-values. A good agreement was observed between computed E-values and simulations for a number of complete RNA motifs. This method is now implemented into the ERPIN software, but it can be applied as well to any search procedure based on ungapped profiles with statistically independent columns.Keywords
This publication has 14 references indexed in Scilit:
- Profile-based detection of microRNA precursors in animal genomesBioinformatics, 2004
- RSEARCH: Finding homologs of single structured RNA sequencesBMC Bioinformatics, 2003
- RNAMotif, an RNA secondary structure definition and search algorithmNucleic Acids Research, 2001
- Direct RNA motif definition and identification from multiple sequence alignments using secondary structure profiles 1 1Edited by J. DoudnaJournal of Molecular Biology, 2001
- Palingol: a declarative programming language to describe nucleic acids' secondary structures and to scan sequence databaseNucleic Acids Research, 1996
- Using substitution probabilities to improve position-specific scoring matricesBioinformatics, 1996
- RNA sequence analysis using covariance modelsNucleic Acids Research, 1994
- Basic local alignment search toolJournal of Molecular Biology, 1990
- Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes.Proceedings of the National Academy of Sciences, 1990
- Pattern searching/alignment with RNA primary and secondary structures: an effective descriptor for tRNABioinformatics, 1990