ConFunc—functional annotation in the twilight zone
Open Access
- 8 February 2008
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 24 (6) , 798-806
- https://doi.org/10.1093/bioinformatics/btn037
Abstract
Motivation: The success of genome sequencing has resulted in many protein sequences without functional annotation. We present ConFunc, an automated Gene Ontology (GO)-based protein function prediction approach, which uses conserved residues to generate sequence profiles to infer function. ConFunc split sets of sequences identified by PSI-BLAST into sub-alignments according to their GO annotations. Conserved residues are identified for each GO term sub-alignment for which a position specific scoring matrix is generated. This combination of steps produces a set of feature (GO annotation) derived profiles from which protein function is predicted. Results: We assess the ability of ConFunc, BLAST and PSI-BLAST to predict protein function in the twilight zone of sequence similarity. ConFunc significantly outperforms BLAST & PSI-BLAST obtaining levels of recall and precision that are not obtained by either method and maximum precision 24% greater than BLAST. Further for a large test set of sequences with homologues of low sequence identity, at high levels of presicision, ConFunc obtains recall six times greater than BLAST. These results demonstrate the potential for ConFunc to form part of an automated genomics annotation pipeline. Availability:http://www.sbg.bio.ic.ac.uk/confunc Contact:m.sternberg@imperial.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.Keywords
This publication has 40 references indexed in Scilit:
- The relationship between Precision-Recall and ROC curvesPublished by Association for Computing Machinery (ACM) ,2006
- FunShift: a database of function shift analysis on protein subfamiliesNucleic Acids Research, 2004
- ConSeq: the identification of functionally and structurally important residues in protein sequencesBioinformatics, 2004
- Automated structure-based prediction of functional sites in proteins: applications to assessing the validity of inheriting protein function from homology in genome annotation and to protein dockingJournal of Molecular Biology, 2001
- Intrinsic errors in genome annotationTrends in Genetics, 2001
- Practical limits of function predictionProteins-Structure Function and Bioinformatics, 2000
- Gene Ontology: tool for the unification of biologyNature Genetics, 2000
- Errors in genome annotationTrends in Genetics, 1999
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- Basic local alignment search toolJournal of Molecular Biology, 1990