A categorization approach to automated ontological function annotation
- 1 June 2006
- journal article
- Published by Wiley in Protein Science
- Vol. 15 (6) , 1544-1549
- https://doi.org/10.1110/ps.062184006
Abstract
Automated function prediction (AFP) methods increasingly use knowledge discovery algorithms to map sequence, structure, literature, and/or pathway information about proteins whose functions are unknown into functional ontologies, typically (a portion of) the Gene Ontology (GO). While there are a growing number of methods within this paradigm, the general problem of assessing the accuracy of such prediction algorithms has not been seriously addressed. We present first an application for function prediction from protein sequences using the POSet Ontology Categorizer (POSOC) to produce new annotations by analyzing collections of GO nodes derived from annotations of protein BLAST neighborhoods. We then also present hierarchical precision and hierarchical recall as new evaluation metrics for assessing the accuracy of any predictions in hierarchical ontologies, and discuss results on a test set of protein sequences. We show that our method provides substantially improved hierarchical precision (measure of predictions made that are correct) when applied to the nearest BLAST neighbors of target proteins, as compared with simply imputing that neighborhood's annotations to the target. Moreover, when our method is applied to a broader BLAST neighborhood, hierarchical precision is enhanced even further. In all cases, such increased hierarchical precision performance is purchased at a modest expense of hierarchical recall (measure of all annotations that get predicted at all).Keywords
This publication has 10 references indexed in Scilit:
- Protein annotation as term categorization in the gene ontology using word proximity networksBMC Bioinformatics, 2005
- Inference of Protein Function from Protein StructurePublished by Elsevier ,2005
- Improving Protein Function Prediction using the Hierarchical Structure of the Gene OntologyPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- GOtcha: a new method for prediction of protein function assessed by the annotation of seven genomesBMC Bioinformatics, 2004
- The Gene Ontology CategorizerBioinformatics, 2004
- Modeling the percolation of annotation errors in a database of protein sequencesBioinformatics, 2002
- Gene Ontology: tool for the unification of biologyNature Genetics, 2000
- The eighth text REtrieval conference (TREC-8)Published by National Institute of Standards and Technology (NIST) ,2000
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- Metrics on partially ordered sets—A surveyDiscrete Mathematics, 1981