Large-Scale Protein Annotation through Gene Ontology
Open Access
- 1 May 2002
- journal article
- Published by Cold Spring Harbor Laboratory in Genome Research
- Vol. 12 (5) , 785-794
- https://doi.org/10.1101/gr.86902
Abstract
Recent progress in genomic sequencing, computational biology, and ontology development has presented an opportunity to investigate biological systems from a unique perspective, that is, examining genomes and transcriptomes through the multiple and hierarchical structure of Gene Ontology (GO). We report here our development of GO Engine, a computational platform for GO annotation, and analysis of the resultant GO annotations of human proteins. Protein annotation was centered on sequence homology with GO-annotated proteins and protein domain analysis. Text information analysis and a multiparameter cellular localization predictive tool were also used to increase the annotation accuracy, and to predict novel annotations. The majority of proteins corresponding to full-length mRNA in GenBank, and the majority of proteins in the NR database (nonredundant database of proteins) were annotated with one or more GO nodes in each of the three GO categories. The annotations of GenBank and SWISS-PROT proteins are available to the public at the GO Consortium web site.Keywords
This publication has 20 references indexed in Scilit:
- Saccharomyces Genome Database (SGD) provides secondary gene annotation using the Gene Ontology (GO)Nucleic Acids Research, 2002
- The Ensembl genome database projectNucleic Acids Research, 2002
- The FlyBase database of the Drosophila genome projects and community literatureNucleic Acids Research, 2002
- A literature network of human genes for high-throughput analysis of gene expressionNature Genetics, 2001
- Comparative DNA Sequence Analysis of Mouse and Human Protocadherin Gene ClustersGenome Research, 2001
- The Sequence of the Human GenomeScience, 2001
- Initial sequencing and analysis of the human genomeNature, 2001
- The InterPro database, an integrated documentation resource for protein families, domains and functional sitesNucleic Acids Research, 2001
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- A general method applicable to the search for similarities in the amino acid sequence of two proteinsJournal of Molecular Biology, 1970