Functional Annotation of the Arabidopsis Genome Using Controlled Vocabularies
Top Cited Papers
Open Access
- 1 June 2004
- journal article
- Published by Oxford University Press (OUP) in Plant Physiology
- Vol. 135 (2) , 745-755
- https://doi.org/10.1104/pp.104.040071
Abstract
Controlled vocabularies are increasingly used by databases to describe genes and gene products because they facilitate identification of similar genes within an organism or among different organisms. One of The Arabidopsis Information Resource's goals is to associate all Arabidopsis genes with terms developed by the Gene Ontology Consortium that describe the molecular function, biological process, and subcellular location of a gene product. We have also developed terms describing Arabidopsis anatomy and developmental stages and use these to annotate published gene expression data. As of March 2004, we used computational and manual annotation methods to make 85,666 annotations representing 26,624 unique loci. We focus on associating genes to controlled vocabulary terms based on experimental data from the literature and use The Arabidopsis Information Resource-developed PubSearch software to facilitate this process. Each annotation is tagged with a combination of evidence codes, evidence descriptions, and references that provide a robust means to assess data quality. Annotation of all Arabidopsis genes will allow quantitative comparisons between sets of genes derived from sources such as microarray experiments. The Arabidopsis annotation data will also facilitate annotation of newly sequenced plant genomes by using sequence similarity to transfer annotations to homologous genes. In addition, complete and up-to-date annotations will make unknown genes easy to identify and target for experimentation. Here, we describe the process of Arabidopsis functional annotation using a variety of data sources and illustrate several ways in which this information can be accessed and used to infer knowledge about Arabidopsis and other plant species.Keywords
This publication has 14 references indexed in Scilit:
- Assigning Function to Yeast Proteins by Integration of TechnologiesMolecular Cell, 2003
- Mouse Proteome AnalysisGenome Research, 2003
- Annotation of the Arabidopsis GenomePlant Physiology, 2003
- The Gene Ontology Annotation (GOA) Project: Implementation of GO in SWISS-PROT, TrEMBL, and InterProGenome Research, 2003
- The Zebrafish Information Network (ZFIN): the zebrafish model organism databaseNucleic Acids Research, 2003
- Saccharomyces Genome Database (SGD) provides secondary gene annotation using the Gene Ontology (GO)Nucleic Acids Research, 2002
- Creating the Gene Ontology Resource: Design and ImplementationGenome Research, 2001
- Functional and structural genomics using PEDANTBioinformatics, 2001
- Analysis of the genome sequence of the flowering plant Arabidopsis thalianaNature, 2000
- Predicting Subcellular Localization of Proteins Based on their N-terminal Amino Acid SequenceJournal of Molecular Biology, 2000