Computing motif correlations in proteins
- 30 September 2003
- journal article
- research article
- Published by Wiley in Journal of Computational Chemistry
- Vol. 24 (16) , 2032-2043
- https://doi.org/10.1002/jcc.10332
Abstract
Protein motifs, which are specific regions and conserved regions, are found by comparing multiple protein sequences. These conserved regions in general play an important role in protein functions and protein folds, for example, for their binding properties or enzymatic activities. The aim here is to find the existence correlations of protein motifs. The knowledge of protein motif/domain sharing should be important in shedding new light on the biologic functions of proteins and offering a basis in analyzing the evolution in the human genome or other genomes. The protein sequences used here are obtained from the PIR‐NREF database and the protein motifs are retrieved from the PROSITE database. We apply data mining approach to discover the occurrence correlations of motif in protein sequences. The correlation of motifs mined can be used in evolution analyses and protein structure prediction. We discuss the latter, i.e., protein structure prediction in this study. The correlations mined are stored and maintained in a database system. The database is now available at http://bioinfo.csie.ncu.edu.tw/ProMotif/. © 2003 Wiley Periodicals, Inc. J Comput Chem 24: 2032–2043, 2003Keywords
This publication has 16 references indexed in Scilit:
- The Protein Information Resource: an integrated public resource of functional annotation of proteinsNucleic Acids Research, 2002
- The PROSITE database, its status in 2002Nucleic Acids Research, 2002
- Review: What Can Structural Classifications Reveal about Protein Evolution?Journal of Structural Biology, 2001
- A database and tools for 3-D protein structure comparison and alignment using the Combinatorial Extension (CE) algorithmNucleic Acids Research, 2001
- SCOP: a Structural Classification of Proteins databaseNucleic Acids Research, 2000
- Pfam: multiple sequence alignments and HMM-profiles of protein domainsNucleic Acids Research, 1998
- Profile hidden Markov models.Bioinformatics, 1998
- CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choiceNucleic Acids Research, 1994
- Mining association rules between sets of items in large databasesACM SIGMOD Record, 1993
- Selection of representative protein data setsProtein Science, 1992