Functional Representation of Enzymes by Specific Peptides
Open Access
- 24 August 2007
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLoS Computational Biology
- Vol. 3 (8) , e167
- https://doi.org/10.1371/journal.pcbi.0030167
Abstract
Predicting the function of a protein from its sequence is a long-standing goal of bioinformatic research. While sequence similarity is the most popular tool used for this purpose, sequence motifs may also subserve this goal. Here we develop a motif-based method consisting of applying an unsupervised motif extraction algorithm (MEX) to all enzyme sequences, and filtering the results by the four-level classification hierarchy of the Enzyme Commission (EC). The resulting motifs serve as specific peptides (SPs), appearing on single branches of the EC. In contrast to previous motif-based methods, the new method does not require any preprocessing by multiple sequence alignment, nor does it rely on over-representation of motifs within EC branches. The SPs obtained comprise on average 8.4 ± 4.5 amino acids, and specify the functions of 93% of all enzymes, which is much higher than the coverage of 63% provided by ProSite motifs. The SP classification thus compares favorably with previous function annotation methods and successfully demonstrates an added value in extreme cases where sequence similarity fails. Interestingly, SPs cover most of the annotated active and binding site amino acids, and occur in active-site neighboring 3-D pockets in a highly statistically significant manner. The latter are assumed to have strong biological relevance to the activity of the enzyme. Further filtering of SPs by biological functional annotations results in reduced small subsets of SPs that possess very large enzyme coverage. Overall, SPs both form a very useful tool for enzyme functional classification and bear responsibility for the catalytic biological function carried out by enzymes. Sequence motifs are known to provide information about functional properties of proteins. In the past, many approaches have looked for deterministic motifs in protein sequences, by searching for functionally over-represented k-mers, with moderate levels of success. Here we revisit and renew the utility of deterministic motifs, by searching for them in a partially unsupervised and context-dependent manner. Using a novel motif extraction algorithm, MEX, deterministic sequence motifs are extracted from Swiss Prot data containing more than 50,000 enzymes. They are then filtered by the Enzyme Commission classification hierarchy to produce sets of specific peptides (SPs). The latter specify enzyme function for 93% of the data, comparing well with existing approaches for enzyme classification. Importantly, SPs are found to have biological significance. A majority of all known active and binding sites of enzymes are covered by SPs, and many SPs are found to lie within spatial pockets in the neighborhood of the active sites. Both these results have extremely high statistical significance. A user-friendly tool that displays the hits of SPs for any protein sequence that is presented as a query, together with the EC assignments due to these SPs, is available at http://adios.tau.ac.il/SPSearch.Keywords
This publication has 38 references indexed in Scilit:
- Enhanced automated function prediction using distantly related sequences and contextual association by PFPProtein Science, 2006
- Identification of common molecular subsequencesPublished by Elsevier ,2004
- Combining Pairwise Sequence Similarity and Support Vector Machines for Detecting Remote Protein Evolutionary and Structural RelationshipsJournal of Computational Biology, 2003
- How Well is Enzyme Function Conserved as a Function of Pairwise Sequence Identity?Journal of Molecular Biology, 2003
- Enzyme Function Less Conserved than AnticipatedJournal of Molecular Biology, 2002
- A Combined Experimental and Computational Strategy to Define Protein Interaction Networks for Peptide Recognition ModulesScience, 2002
- The relationship between protein structure and function: a comprehensive survey with application to the yeast genomeJournal of Molecular Biology, 1999
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- Protein sequence motifsCurrent Opinion in Structural Biology, 1996
- Construction of a dictionary of sequence motifs that characterize groups of related proteinsProtein Engineering, Design and Selection, 1992