ESG: extended similarity group method for automated protein function prediction
Open Access
- 12 May 2009
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 25 (14) , 1739-1745
- https://doi.org/10.1093/bioinformatics/btp309
Abstract
Motivation: Importance of accurate automatic protein function prediction is ever increasing in the face of a large number of newly sequenced genomes and proteomics data that are awaiting biological interpretation. Conventional methods have focused on high sequence similarity-based annotation transfer which relies on the concept of homology. However, many cases have been reported that simple transfer of function from top hits of a homology search causes erroneous annotation. New methods are required to handle the sequence similarity in a more robust way to combine together signals from strongly and weakly similar proteins for effectively predicting function for unknown proteins with high reliability. Results: We present the extended similarity group (ESG) method, which performs iterative sequence database searches and annotates a query sequence with Gene Ontology terms. Each annotation is assigned with probability based on its relative similarity score with the multiple-level neighbors in the protein similarity graph. We will depict how the statistical framework of ESG improves the prediction accuracy by iteratively taking into account the neighborhood of query protein in the sequence similarity space. ESG outperforms conventional PSI-BLAST and the protein function prediction (PFP) algorithm. It is found that the iterative search is effective in capturing multiple-domains in a query protein, enabling accurately predicting several functions which originate from different domains. Availability: ESG web server is available for automated protein function prediction at http://dragon.bio.purdue.edu/ESG/ Contact:cspark@cau.ac.kr; dkihara@purdue.edu Supplementary information: Supplementary data are available at Bioinformatics online.Keywords
This publication has 32 references indexed in Scilit:
- PFP: Automated prediction of gene ontology functional annotations with confidence scores using protein sequence dataProteins-Structure Function and Bioinformatics, 2008
- Sequence Similarity Network Reveals Common Ancestry of Multidomain ProteinsPLoS Computational Biology, 2008
- Gene3D: comprehensive structural and functional annotation of genomesNucleic Acids Research, 2007
- KEGG for linking genomes to life and the environmentNucleic Acids Research, 2007
- InterPro and InterProScanPublished by Springer Nature ,2007
- The relationship between protein sequences and their gene ontology functionsBMC Bioinformatics, 2006
- New avenues in protein function predictionProtein Science, 2006
- Enhanced automated function prediction using distantly related sequences and contextual association by PFPProtein Science, 2006
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- Basic local alignment search toolJournal of Molecular Biology, 1990