The Universal Protein Resource (UniProt): an expanding universe of protein information
Top Cited Papers
Open Access
- 1 January 2006
- journal article
- research article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 34 (90001) , D187-D191
- https://doi.org/10.1093/nar/gkj161
Abstract
The Universal Protein Resource (UniProt) provides a central resource on protein sequences and functional annotation with three database components, each addressing a key need in protein bioinformatics. The UniProt Knowledgebase (UniProtKB), comprising the manually annotated UniProtKB/Swiss-Prot section and the automatically annotated UniProtKB/TrEMBL section, is the preeminent storehouse of protein annotation. The extensive cross-references, functional and feature annotations and literature-based evidence attribution enable scientists to analyse proteins and query across databases. The UniProt Reference Clusters (UniRef) speed similarity searches via sequence space compression by merging sequences that are 100% (UniRef100), 90% (UniRef90) or 50% (UniRef50) identical. Finally, the UniProt Archive (UniParc) stores all publicly available protein sequences, containing the history of sequence data with links to the source databases. UniProt databases continue to grow in size and in availability of information. Recent and upcoming changes to database contents, formats, controlled vocabularies and services are described. New download availability includes all major releases of UniProtKB, sequence collections by taxonomic division and complete proteomes. A bibliography mapping service has been added, and an ID mapping service will be available soon. UniProt databases can be accessed online athttp://www.uniprot.org or downloaded at ftp://ftp.uniprot.org/pub/databases/.Keywords
This publication has 18 references indexed in Scilit:
- Database resources of the National Center for Biotechnology InformationNucleic Acids Research, 2006
- Reactome: a knowledgebase of biological pathwaysNucleic Acids Research, 2004
- Database resources of the National Center for Biotechnology InformationNucleic Acids Research, 2004
- The RCSB Protein Data Bank: a redesigned query system and relational database based on the mmCIF schemaNucleic Acids Research, 2004
- Fungal BLAST and Model Organism BLASTP Best Hits: new comparison resources at the Saccharomyces Genome Database (SGD)Nucleic Acids Research, 2004
- The EMBL Nucleotide Sequence DatabaseNucleic Acids Research, 2004
- The iProClass integrated database for protein functional analysisComputational Biology and Chemistry, 2004
- The Pfam protein families databaseNucleic Acids Research, 2004
- The Arabidopsis Information Resource (TAIR): a model organism database providing a centralized, curated gateway to Arabidopsis biology, research materials and communityNucleic Acids Research, 2003
- A novel method for automatic functional annotation of proteins.Bioinformatics, 1999