Predicting protein function from sequence and structure

Top Cited Papers

1 December 2007

journal article
review article
Published by Springer Nature in Nature Reviews Molecular Cell Biology

Vol. 8 (12) , 995-1005
https://doi.org/10.1038/nrm2281

Abstract

'Inheritance through homology' is the most common and generally more accessible approach to function prediction, but orthology should be established where possible to improve confidence in predictions. The body of functional annotations of proteins is becoming increasingly computer-readable and is being organized in ways that can enhance the scope of in silico prediction methods. Significant advances in complete genome sequencing have resulted in a new generation of methods that exploit sequence analysis on the genome level. Curated protein family resources can often guide the assignment of protein functions and the detection of motifs or sequence patterns. New approaches are being developed to identify functional residues in proteins; these can then be applied to divide larger protein families into more specific functional subfamilies. There have been exciting new developments in databases of experimentally determined protein–protein interactions, as well as genomic inference methods for predicting these interactions. Non-homology-based function prediction methods that exploit the properties of sequences and not their evolutionary history are also becoming more successful. Recent Structural Genomics Initiatives (SGIs) are attempting to target functionally diverse relatives within protein families. Function prediction from structure can be achieved by global comparison of protein structures to detect homology or through the use of structural templates derived from the active sites of enzymes. It is also possible to explore the protein surface for sequence-conserved patches, clefts and electrostatic potentials. In general terms, it is best to seek and compare the results of several methods to predict the function of novel proteins. Meta-servers simplify this by providing easy access to a range of the best-performing methods. Future developments will see more efficient integration of prediction methods and experimental data; for example, microarrays, yeast two-hybrid screens and tandem affinity purification. Better understanding of the diversification of function in protein families will permit more sophisticated means of predicting function and functional networks.

Keywords

This publication has 114 references indexed in Scilit:

The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution
Nucleic Acids Research, 2007
ArrayExpress--a public database of microarray experiments and gene expression profiles
Nucleic Acids Research, 2006
Rapid detection of similarity in protein structure and function through contact metric distances
Nucleic Acids Research, 2006
Sequence comparison by sequence harmony identifies subtype-specific functional sites
Nucleic Acids Research, 2006
STRING 7--recent developments in the integration and prediction of protein interactions
Nucleic Acids Research, 2006
Dry work in a wet world: computation in systems biology
Molecular Systems Biology, 2006
Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles
Proceedings of the National Academy of Sciences, 2005
Inference of Protein Function from Protein Structure
Published by Elsevier ,2005
T-coffee: a novel method for fast and accurate multiple sequence alignment 1 1Edited by J. Thornton
Journal of Molecular Biology, 2000
CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice
Nucleic Acids Research, 1994