Abstract
A new methodology to extract stochastic motifs from protein sequences is proposed. Instead of pursuing precise motifs, the authors are trying to extract stochastic motifs that inherently include exceptions and are more suitable for representing important regions. J. Rissanen's (1978) minimum description length (MDL) principle is used as the quantitative criterion to avoid overfiltering to sample sequences. To avoid combinatorial explosion in motif extraction, a genetic algorithm is used, which is a kind of probablistic search algorithm based on the biological evolution process. The experimental results demonstrate that the MDL principle greatly increases the convergence speed of a genetic algorithm when extracting stochastic motifs.

This publication has 4 references indexed in Scilit: