Method for Calculation of Probability of Matching a Bounded Regular Expression in a Random Data String

1 January 1995

journal article
research article
Published by Mary Ann Liebert Inc in Journal of Computational Biology

Vol. 2 (1) , 25-31
https://doi.org/10.1089/cmb.1995.2.25

Abstract

A method is presented for determining within strict bounds the probability of matching a regular expression with a match start point in a given section of a random data string. The method in general requires time and space exponential in the number of optional characters in the regular expression, but in practice was used to determine bounds for probabilities of matching all the ProSite patterns without difficulty.

Keywords

This publication has 3 references indexed in Scilit:

The PROSITE dictionary of sites and patterns in proteins, its current status
Nucleic Acids Research, 1993
Automated assembly of protein blocks for database searching
Nucleic Acids Research, 1991
Linguistics of Nucleotide Sequences I: The Significance of Deviations from Mean Statistical Characteristics and Prediction of the Frequencies of Occurrence of Words
Journal of Biomolecular Structure and Dynamics, 1989