Stochastic motif extraction using a genetic algorithm with the MDL principle
- 31 December 2002
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- p. 746-755 vol.1
- https://doi.org/10.1109/hicss.1993.270666
Abstract
A new methodology to extract stochastic motifs from protein sequences is proposed. Instead of pursuing precise motifs, the authors are trying to extract stochastic motifs that inherently include exceptions and are more suitable for representing important regions. J. Rissanen's (1978) minimum description length (MDL) principle is used as the quantitative criterion to avoid overfiltering to sample sequences. To avoid combinatorial explosion in motif extraction, a genetic algorithm is used, which is a kind of probablistic search algorithm based on the biological evolution process. The experimental results demonstrate that the MDL principle greatly increases the convergence speed of a genetic algorithm when extracting stochastic motifs.Keywords
This publication has 4 references indexed in Scilit:
- A learning criterion for stochastic rulesMachine Learning, 1992
- Learning Stochastic Motifs from Genetic SequencesPublished by Elsevier ,1991
- A Universal Prior for Integers and Estimation by Minimum Description LengthThe Annals of Statistics, 1983
- Modeling by shortest data descriptionAutomatica, 1978