Mining Protein Sequences for Motifs
- 1 October 2002
- journal article
- research article
- Published by Mary Ann Liebert Inc in Journal of Computational Biology
- Vol. 9 (5) , 707-720
- https://doi.org/10.1089/106652702761034145
Abstract
We use methods from Data Mining and Knowledge Discovery to design an algorithm for detecting motifs in protein sequences. The algorithm assumes that a motif is constituted by the presence of a "good" combination of residues in appropriate locations of the motif. The algorithm attempts to compile such good combinations into a "pattern dictionary" by processing an aligned training set of protein sequences. The dictionary is subsequently used to detect motifs in new protein sequences. Statistical significance of the detection results are ensured by statistically determining the various parameters of the algorithm. Based on this approach, we have implemented a program called GYM. The Helix-Turn-Helix motif was used as a model system on which to test our program. The program was also extended to detect Homeodomain motifs. The detection results for the two motifs compare favorably with existing programs. In addition, the GYM program provides a lot of useful information about a given protein sequence.Keywords
This publication has 24 references indexed in Scilit:
- MultiCoil: A program for predicting two‐and three‐stranded coiled coilsProtein Science, 1997
- Hidden Markov Model Analysis of Motifs in Steroid Dehydrogenases and Their HomologsBiochemical and Biophysical Research Communications, 1997
- An Iterative Method for Improved Protein Structural Motif RecognitionJournal of Computational Biology, 1997
- Algorithms for Protein Structural Motif RecognitionJournal of Computational Biology, 1995
- Detecting Subtle Sequence Signals: a Gibbs Sampling Strategy for Multiple AlignmentScience, 1993
- MOLECULAR BIOLOGY OF THE LysR FAMILY OF TRANSCRIPTIONAL REGULATORSAnnual Review of Microbiology, 1993
- TRANSCRIPTION FACTORS: Structural Families and Principles of DNA RecognitionAnnual Review of Biochemistry, 1992
- A single glutamic acid residue plays a key role in the transcriptional activation function of lambda repressorCell, 1989
- Altered promoter recognition by mutant forms of the σ70 subunit of Escherichia coli RNA polymeraseJournal of Molecular Biology, 1989
- Oxygen radicals alter the cell membrane potential in a renal cell line (LLC-PK1) with differentiated characteristics of proximal tubular cellsBiochimica et Biophysica Acta (BBA) - Biomembranes, 1987