Predicting Gene Function From Patterns of Annotation
Open Access
- 14 April 2003
- journal article
- research article
- Published by Cold Spring Harbor Laboratory in Genome Research
- Vol. 13 (5) , 896-904
- https://doi.org/10.1101/gr.440803
Abstract
The Gene Ontology (GO) Consortium has produced a controlled vocabulary for annotation of gene function that is used in many organism-specific gene annotation databases. This allows the prediction of gene function based on patterns of annotation. For example, if annotations for two attributes tend to occur together in a database, then a gene holding one attribute is likely to hold the other as well. We modeled the relationships among GO attributes with decision trees and Bayesian networks, using the annotations in theSaccharomycesGenome Database (SGD) and in FlyBase as training data. We tested the models using cross-validation, and we manually assessed 100 gene–attribute associations that were predicted by the models but that were not present in the SGD or FlyBase databases. Of the 100 manually assessed associations, 41 were judged to be true, and another 42 were judged to be plausible.[Detailed lists of hypotheses including the curators' comments on each hypothesis, are available athttp://llama.med.harvard.edu/∼king/predictions.html.]Keywords
This publication has 18 references indexed in Scilit:
- Predicting Gene Ontology Functions from ProDom and CDD Protein DomainsGenome Research, 2002
- The Mouse Genome Database (MGD): the model organism database for the laboratory mouseNucleic Acids Research, 2002
- The FlyBase database of the Drosophila genome projects and community literatureNucleic Acids Research, 2002
- Associating Genes with Gene Ontology Codes Using a Maximum Entropy Analysis of Biomedical LiteratureGenome Research, 2002
- WormBase: network access to the genome and biology of Caenorhabditis elegansNucleic Acids Research, 2001
- Using Bayesian Networks to Analyze Expression DataJournal of Computational Biology, 2000
- SGD: Saccharomyces Genome DatabaseNucleic Acids Research, 1998
- Construction of a Bayesian network for mammographic diagnosis of breast cancerComputers in Biology and Medicine, 1997
- Approximating probabilistic inference in Bayesian belief networks is NP-hardArtificial Intelligence, 1993
- Estimating the Dimension of a ModelThe Annals of Statistics, 1978