NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy

Top Cited Papers

Open Access

24 November 2011

journal article
research article
Published by Oxford University Press (OUP) in Nucleic Acids Research

Vol. 40 (D1) , D130-D135
https://doi.org/10.1093/nar/gkr1079

Abstract

The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database is a collection of genomic, transcript and protein sequence records. These records are selected and curated from public sequence archives and represent a significant reduction in redundancy compared to the volume of data archived by the International Nucleotide Sequence Database Collaboration. The database includes over 16 000 organisms, 2.4 × 10⁶ genomic records, 13 × 10⁶ proteins and 2 × 10⁶ RNA records spanning prokaryotes, eukaryotes and viruses (RefSeq release 49, September 2011). The RefSeq database is maintained by a combined approach of automated analyses, collaboration and manual curation to generate an up-to-date representation of the sequence, its features, names and cross-links to related sources of information. We report here on recent growth, the status of curating the human RefSeq data set, more extensive feature annotation and current policy for eukaryotic genome annotation via the NCBI annotation pipeline. More information about the resource is available online (see http://www.ncbi.nlm.nih.gov/RefSeq/).

Keywords

This publication has 14 references indexed in Scilit:

SignalP 4.0: discriminating signal peptides from transmembrane regions
Nature Methods, 2011
Modernizing Reference Genome Assemblies
PLoS Biology, 2011
Entrez Gene: gene-centered information at NCBI
Nucleic Acids Research, 2010
Expression of Conjoined Genes: Another Mechanism for Gene Regulation in Eukaryotes
PLOS ONE, 2010
genenames.org: the HGNC resources in 2011
Nucleic Acids Research, 2010
Locus Reference Genomic sequences: an improved basis for describing human DNA variants
Genome Medicine, 2010
The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes
Genome Research, 2009
Pseudogene.org: a comprehensive database and comparison platform for pseudogene annotation
Nucleic Acids Research, 2006
dbSNP: the NCBI database of genetic variation
Nucleic Acids Research, 2001
Introducing RefSeq and LocusLink: curated human genome resources at the NCBI
Trends in Genetics, 2000