Using sequence compression to speedup probabilistic profile matching
Open Access
- 15 February 2005
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 21 (10) , 2225-2229
- https://doi.org/10.1093/bioinformatics/bti323
Abstract
Motivation: Matching a biological sequence against a probabilistic pattern (or profile) is a common task in computational biology. A probabilistic profile, represented as a scoring matrix, is more suitable than a deterministic pattern to retain the peculiarities of a given segment of a family of biological sequences. Brute-force algorithms take O(NP) to match a sequence of N characters against a profile of length P ≪ N. Results: In this work, we exploit string compression techniques to speedup brute-force profile matching. We present two algorithms, based on run-length and LZ78 encodings, that reduce computational complexity by the compression factor of the encoding. Contact:bogliolo@sti.uniurb.itKeywords
This publication has 10 references indexed in Scilit:
- Some string matching problems from Bioinformatics which still need better solutionsJournal of Discrete Algorithms, 2004
- A Subquadratic Sequence Alignment Algorithm for Unrestricted Scoring MatricesSIAM Journal on Computing, 2003
- The Efficient Computation of Position-Specific Match Scores with the Fast Fourier TransformJournal of Computational Biology, 2002
- Fast probabilistic analysis of sequence function using scoring matricesBioinformatics, 2000
- The role of pattern databases in sequence analysis.Briefings in Bioinformatics, 2000
- IMPALA: matching a protein sequence against a collection of PSI-BLAST-constructed position-specific score matricesBioinformatics, 1999
- Minimal-Risk Scoring Matrices for Sequence AnalysisJournal of Computational Biology, 1999
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- Let Sleeping Files Lie: Pattern Matching in Z-Compressed FilesJournal of Computer and System Sciences, 1996
- Compression of individual sequences via variable-rate codingIEEE Transactions on Information Theory, 1978