Approximate Matching of Network Expressions with Spacers
- 1 January 1996
- journal article
- research article
- Published by Mary Ann Liebert Inc in Journal of Computational Biology
- Vol. 3 (1) , 33-51
- https://doi.org/10.1089/cmb.1996.3.33
Abstract
Two algorithmic results are presented that are pertinent to the matching of patterns typically used by biologists to describe regions of macromolecular sequences that encode a given function. The first result is a threshold-sensitive algorithm for approximately matching both network and regular expressions. Network expressions are regular expressions that can be composed only from union and concatenation operators. Kleene closure (i.e., unbounded repetition) is not permitted. The algorithm is threshold-sensitive in that its performance depends on the threshold, k, of the number of differences allowed in an approximate match. This result generalizes the 0(kn) expected-time algorithm of Ukkonen for approximately matching keywords. The second result concerns the problem of matching a pattern that is a network expression whose elements are approximate matches to network or regular expressions interspersed with specifiable distance ranges. For this class of patterns, it is shown how to determine a backtracking procedure whose order of evaluation is optimal in the sense that its expected time is minimal over all such procedures.Keywords
This publication has 10 references indexed in Scilit:
- An algorithm for string matching with a sequence of don't caresInformation Processing Letters, 1991
- Predictive motifs derived from cytosine methyltransferasesNucleic Acids Research, 1989
- Repetitive zinc-binding domains in the protein transcription factor IIIA from Xenopus oocytes.The EMBO Journal, 1985
- Finding approximate patterns in stringsJournal of Algorithms, 1985
- Rapid searches for complex patterns in biological moleculesNucleic Acids Research, 1984
- Fast optimal alignmentNucleic Acids Research, 1984
- The theory and computation of evolutionary distances: Pattern recognitionJournal of Algorithms, 1980
- The String-to-String Correction ProblemJournal of the ACM, 1974
- A general method applicable to the search for similarities in the amino acid sequence of two proteinsJournal of Molecular Biology, 1970