INTREPID—INformation-theoretic TREe traversal for Protein functional site IDentification
Open Access
- 6 September 2008
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 24 (21) , 2445-2452
- https://doi.org/10.1093/bioinformatics/btn474
Abstract
Motivation: Identification of functionally important residues in proteins plays a significant role in biological discovery. Here, we present INTREPID—an information–theoretic approach for functional site identification that exploits the information in large diverse multiple sequence alignments (MSAs). INTREPID uses a traversal of the phylogeny in combination with a positional conservation score, based on Jensen–Shannon divergence, to rank positions in an MSA. While knowledge of protein 3D structure can significantly improve the accuracy of functional site identification, since structural information is not available for a majority of proteins, INTREPID relies solely on sequence information. We evaluated INTREPID on two tasks: predicting catalytic residues and predicting specificity determinants. Results: In catalytic residue prediction, INTREPID provides significant improvements over Evolutionary Trace, ConSurf as well as over a baseline global conservation method on a set of 100 manually curated enzymes from the Catalytic Site Atlas. In particular, INTREPID is able to better predict catalytic positions that are not globally conserved and hence, attains improved sensitivity at high values of specificity. We also investigated the performance of INTREPID as a function of the evolutionary divergence of the protein family. We found that INTREPID is better able to exploit the diversity in such families and that accuracy improves when homologs with very low sequence identity are included in an alignment. In specificity determinant prediction, when subtype information is known, INTREPID-SPEC, a variant of INTREPID, attains accuracies that are competitive with other approaches for this task. Availability: INTREPID is available for 16919 families in the PhyloFacts resource (http://phylogenomics.berkeley.edu/phylofacts). Contact:sriram_s@cs.berkeley.edu Supplementary information: Relevant online supplementary material is available at http://phylogenomics.berkeley.edu/INTREPID.Keywords
This publication has 44 references indexed in Scilit:
- Characterization and prediction of residues determining protein functional specificityBioinformatics, 2008
- Predicting functionally important residues from sequence conservationBioinformatics, 2007
- Role of glutamine 148 of human 15-hydroxyprostaglandin dehydrogenase in catalytic oxidation of prostaglandin E2Bioorganic & Medicinal Chemistry, 2006
- Determining functional specificity from protein sequencesBioinformatics, 2005
- UniProt: the Universal Protein knowledgebaseNucleic Acids Research, 2004
- Automatic Methods for Predicting Functionally Important ResiduesJournal of Molecular Biology, 2003
- Automated structure-based prediction of functional sites in proteins: applications to assessing the validity of inheriting protein function from homology in genome annotation and to protein dockingJournal of Molecular Biology, 2001
- The ASTRAL compendium for protein structure and sequence analysisNucleic Acids Research, 2000
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- A method to predict functional residues in proteinsNature Structural & Molecular Biology, 1995