DDBJ new system and service refactoring
Open Access
- 23 November 2012
- journal article
- research article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 41 (D1) , D25-D29
- https://doi.org/10.1093/nar/gks1152
Abstract
The DNA data bank of Japan (DDBJ, http://www.ddbj.nig.ac.jp) maintains a primary nucleotide sequence database and provides analytical resources for biological information to researchers. This database content is exchanged with the US National Center for Biotechnology Information (NCBI) and the European Bioinformatics Institute (EBI) within the framework of the International Nucleotide Sequence Database Collaboration (INSDC). Resources provided by the DDBJ include traditional nucleotide sequence data released in the form of 27 316 452 entries or 16 876 791 557 base pairs (as of June 2012), and raw reads of new generation sequencers in the sequence read archive (SRA). A Japanese researcher published his own genome sequence via DDBJ-SRA on 31 July 2012. To cope with the ongoing genomic data deluge, in March 2012, our computer previous system was totally replaced by a commodity cluster-based system that boasts 122.5 TFlops of CPU capacity and 5 PB of storage space. During this upgrade, it was considered crucial to replace and refactor substantial portions of the DDBJ software systems as well. As a result of the replacement process, which took more than 2 years to perform, we have achieved significant improvements in system performance.Keywords
This publication has 15 references indexed in Scilit:
- The NCBI Taxonomy databaseNucleic Acids Research, 2011
- BioProject and BioSample databases at NCBI: facilitating capture and organization of metadataNucleic Acids Research, 2011
- The BioSample Database (BioSD) at the European Bioinformatics InstituteNucleic Acids Research, 2011
- Major submissions tool developments at the European nucleotide archiveNucleic Acids Research, 2011
- The sequence read archive: explosive growth of sequencing dataNucleic Acids Research, 2011
- Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specificationsNature Biotechnology, 2011
- DDBJ progress reportNucleic Acids Research, 2010
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choiceNucleic Acids Research, 1994