PDBj Mine: design and implementation of relational database interface for Protein Data Bank Japan

Open Access

1 January 2010

journal article
research article
Published by Oxford University Press (OUP) in Database: The Journal of Biological Databases and Curation

Vol. 2010, baq021
https://doi.org/10.1093/database/baq021

Abstract

This article is a tutorial for PDBj Mine, a new database and its interface for Protein Data Bank Japan (PDBj). In PDBj Mine, data are loaded from files in the PDBMLplus format (an extension of PDBML, PDB's canonical XML format, enriched with annotations), which are then served for the user of PDBj via the worldwide web (WWW). We describe the basic design of the relational database (RDB) and web interfaces of PDBj Mine. The contents of PDBMLplus files are first broken into XPath entities, and these paths and data are indexed in the way that reflects the hierarchical structure of the XML files. The data for each XPath type are saved into the corresponding relational table that is named as the XPath itself. The generation of table definitions from the PDBMLplus XML schema is fully automated. For efficient search, frequently queried terms are compiled into a brief summary table. Casual users can perform simple keyword search, and 'Advanced Search' which can specify various conditions on the entries. More experienced users can query the database using SQL statements which can be constructed in a uniform manner. Thus, PDBj Mine achieves a combination of the flexibility of XML documents and the robustness of the RDB. Database URL: http://www.pdbj.org/

Keywords

This publication has 11 references indexed in Scilit:

Realism about PDB
Nature Biotechnology, 2007
The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data
Nucleic Acids Research, 2006
A highly sensitive selection method for directed evolution of homing endonucleases
Nucleic Acids Research, 2005
The Universal Protein Resource (UniProt)
Nucleic Acids Research, 2004
PDBML: the representation of archival macromolecular structure data in XML
Bioinformatics, 2004
eF-site and PDBjViewer: database and viewer for protein functional sites
Bioinformatics, 2004
The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data
Nucleic Acids Research, 2004
XRel
ACM Transactions on Internet Technology, 2001
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
Nucleic Acids Research, 1997
The STAR file: a new format for electronic data transfer and archiving
Journal of Chemical Information and Computer Sciences, 1991