Predicting protein function from sequence and structure
Top Cited Papers
- 1 December 2007
- journal article
- review article
- Published by Springer Nature in Nature Reviews Molecular Cell Biology
- Vol. 8 (12) , 995-1005
- https://doi.org/10.1038/nrm2281
Abstract
'Inheritance through homology' is the most common and generally more accessible approach to function prediction, but orthology should be established where possible to improve confidence in predictions. The body of functional annotations of proteins is becoming increasingly computer-readable and is being organized in ways that can enhance the scope of in silico prediction methods. Significant advances in complete genome sequencing have resulted in a new generation of methods that exploit sequence analysis on the genome level. Curated protein family resources can often guide the assignment of protein functions and the detection of motifs or sequence patterns. New approaches are being developed to identify functional residues in proteins; these can then be applied to divide larger protein families into more specific functional subfamilies. There have been exciting new developments in databases of experimentally determined protein–protein interactions, as well as genomic inference methods for predicting these interactions. Non-homology-based function prediction methods that exploit the properties of sequences and not their evolutionary history are also becoming more successful. Recent Structural Genomics Initiatives (SGIs) are attempting to target functionally diverse relatives within protein families. Function prediction from structure can be achieved by global comparison of protein structures to detect homology or through the use of structural templates derived from the active sites of enzymes. It is also possible to explore the protein surface for sequence-conserved patches, clefts and electrostatic potentials. In general terms, it is best to seek and compare the results of several methods to predict the function of novel proteins. Meta-servers simplify this by providing easy access to a range of the best-performing methods. Future developments will see more efficient integration of prediction methods and experimental data; for example, microarrays, yeast two-hybrid screens and tandem affinity purification. Better understanding of the diversification of function in protein families will permit more sophisticated means of predicting function and functional networks.Keywords
This publication has 114 references indexed in Scilit:
- The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolutionNucleic Acids Research, 2007
- ArrayExpress--a public database of microarray experiments and gene expression profilesNucleic Acids Research, 2006
- Rapid detection of similarity in protein structure and function through contact metric distancesNucleic Acids Research, 2006
- Sequence comparison by sequence harmony identifies subtype-specific functional sitesNucleic Acids Research, 2006
- STRING 7--recent developments in the integration and prediction of protein interactionsNucleic Acids Research, 2006
- Dry work in a wet world: computation in systems biologyMolecular Systems Biology, 2006
- Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profilesProceedings of the National Academy of Sciences, 2005
- Inference of Protein Function from Protein StructurePublished by Elsevier ,2005
- T-coffee: a novel method for fast and accurate multiple sequence alignment 1 1Edited by J. ThorntonJournal of Molecular Biology, 2000
- CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choiceNucleic Acids Research, 1994