The lexical properties of the gene ontology.
- 1 January 2002
- journal article
- p. 504-8
Abstract
The Gene Ontology (GO) is a construct developed for the purpose of annotating molecular information about genes and their products. The ontology is a shared resource developed by the GO Consortium, a group of scientists who work on a variety of model organisms. In this paper we investigate the nature of the strings found in the Gene Ontology and evaluate them for their usefulness in natural language processing (NLP). We extend previous work that identified a set of properties that reliably identifies natural language phrases in the Unified Medical Language System (UMLS). The results indicate that a large percentage (79%) of GO terms are potentially useful for NLP applications. Some 35% of the GO terms were found in a corpus derived from the MEDLINE bibliographic database, and 27% of the terms were found in the current edition of the UMLS.This publication has 7 references indexed in Scilit:
- Saccharomyces Genome Database (SGD) provides secondary gene annotation using the Gene Ontology (GO)Nucleic Acids Research, 2002
- Creating the Gene Ontology Resource: Design and ImplementationGenome Research, 2001
- Evaluating UMLS strings for natural language processing.2001
- A knowledge model for analysis and simulation of regulatory networksBioinformatics, 2000
- Ontology-based knowledge representation for bioinformaticsBriefings in Bioinformatics, 2000
- How knowledge drives understanding—matching medical ontologies with the needs of medical language processingArtificial Intelligence in Medicine, 1998
- The Nature of Lexical KnowledgeMethods of Information in Medicine, 1998