NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy
Top Cited Papers
Open Access
- 24 November 2011
- journal article
- research article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 40 (D1) , D130-D135
- https://doi.org/10.1093/nar/gkr1079
Abstract
The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database is a collection of genomic, transcript and protein sequence records. These records are selected and curated from public sequence archives and represent a significant reduction in redundancy compared to the volume of data archived by the International Nucleotide Sequence Database Collaboration. The database includes over 16 000 organisms, 2.4 × 106 genomic records, 13 × 106 proteins and 2 × 106 RNA records spanning prokaryotes, eukaryotes and viruses (RefSeq release 49, September 2011). The RefSeq database is maintained by a combined approach of automated analyses, collaboration and manual curation to generate an up-to-date representation of the sequence, its features, names and cross-links to related sources of information. We report here on recent growth, the status of curating the human RefSeq data set, more extensive feature annotation and current policy for eukaryotic genome annotation via the NCBI annotation pipeline. More information about the resource is available online (see http://www.ncbi.nlm.nih.gov/RefSeq/).Keywords
This publication has 14 references indexed in Scilit:
- SignalP 4.0: discriminating signal peptides from transmembrane regionsNature Methods, 2011
- Modernizing Reference Genome AssembliesPLoS Biology, 2011
- Entrez Gene: gene-centered information at NCBINucleic Acids Research, 2010
- Expression of Conjoined Genes: Another Mechanism for Gene Regulation in EukaryotesPLOS ONE, 2010
- genenames.org: the HGNC resources in 2011Nucleic Acids Research, 2010
- Locus Reference Genomic sequences: an improved basis for describing human DNA variantsGenome Medicine, 2010
- The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomesGenome Research, 2009
- Pseudogene.org: a comprehensive database and comparison platform for pseudogene annotationNucleic Acids Research, 2006
- dbSNP: the NCBI database of genetic variationNucleic Acids Research, 2001
- Introducing RefSeq and LocusLink: curated human genome resources at the NCBITrends in Genetics, 2000