Protein Meta-Functional Signatures from Combining Sequence, Structure, Evolution, and Amino Acid Property Information

Abstract
Protein function is mediated by different amino acid residues, both their positions and types, in a protein sequence. Some amino acids are responsible for the stability or overall shape of the protein, playing an indirect role in protein function. Others play a functionally important role as part of active or binding sites of the protein. For a given protein sequence, the residues and their degree of functional importance can be thought of as a signature representing the function of the protein. We have developed a combination of knowledge- and biophysics-based function prediction approaches to elucidate the relationships between the structural and the functional roles of individual residues and positions. Such a meta-functional signature (MFS), which is a collection of continuous values representing the functional significance of each residue in a protein, may be used to study proteins of known function in greater detail and to aid in experimental characterization of proteins of unknown function. We demonstrate the superior performance of MFS in predicting protein functional sites and also present four real-world examples to apply MFS in a wide range of settings to elucidate protein sequence–structure–function relationships. Our results indicate that the MFS approach, which can combine multiple sources of information and also give biological interpretation to each component, greatly facilitates the understanding and characterization of protein function. Proteins are the main building blocks and functional molecules of the cell. Function is mediated by specific amino acid residues in a protein sequence, in a manner dependent on both their positions and types. Proteins are traditionally described as a sequence of amino acids and, when known, the experimentally determined coordinates of this covalently linked chain. Here we propose to expand the description of a protein to include a quantitative measure of the functional importance for each constituent amino acid. The resulting signature for a protein sequence or structure is referred to as its meta-functional signature (MFS). We present an ensemble of knowledge- and biophysics-based methods, which exploit different types of evidence for functional importance, as an automated publicly available tool to build such an MFS. We use two benchmark datasets to show that MFS can be used to identify functionally important residues from protein structure or sequence alone. Finally, we assess four diverse real-world biological questions to demonstrate the ability of MFS to give insight into the structural and functional roles of individual residues and positions, by exploiting protein sequence–structure–function relationships.