How Large Is the Metabolome? A Critical Analysis of Data Exchange Practices in Chemistry

Open Access

5 May 2009

journal article
research article
Published by Public Library of Science (PLoS) in PLOS ONE

Vol. 4 (5) , e5440
https://doi.org/10.1371/journal.pone.0005440

Abstract

Calculating the metabolome size of species by genome-guided reconstruction of metabolic pathways misses all products from orphan genes and from enzymes lacking annotated genes. Hence, metabolomes need to be determined experimentally. Annotations by mass spectrometry would greatly benefit if peer-reviewed public databases could be queried to compile target lists of structures that already have been reported for a given species. We detail current obstacles to compile such a knowledge base of metabolites. As an example, results are presented for rice. Two rice (oryza sativa) subspecies have been fully sequenced, oryza japonica and oryza indica. Several major small molecule databases were compared for listing known rice metabolites comprising PubChem, Chemical Abstracts, Beilstein, Patent databases, Dictionary of Natural Products, SetupX/BinBase, KNApSAcK DB, and finally those databases which were obtained by computational approaches, i.e. RiceCyc, KEGG, and Reactome. More than 5,000 small molecules were retrieved when searching these databases. Unfortunately, most often, genuine rice metabolites were retrieved together with non-metabolite database entries such as pesticides. Overlaps from database compound lists were very difficult to compare because structures were either not encoded in machine-readable format or because compound identifiers were not cross-referenced between databases. We conclude that present databases are not capable of comprehensively retrieving all known metabolites. Metabolome lists are yet mostly restricted to genome-reconstructed pathways. We suggest that providers of (bio)chemical databases enrich their database identifiers to PubChem IDs and InChIKeys to enable cross-database queries. In addition, peer-reviewed journal repositories need to mandate submission of structures and spectra in machine readable format to allow automated semantic annotation of articles containing chemical structures. Such changes in publication standards and database architectures will enable researchers to compile current knowledge about the metabolome of species, which may extend to derived information such as spectral libraries, organ-specific metabolites, and cross-study comparisons.

Keywords

This publication has 43 references indexed in Scilit:

Optical Structure Recognition Software To Recover Chemical Information: OSRA, An Open Source Solution
Journal of Chemical Information and Modeling, 2009
Promoting coherent minimum reporting guidelines for biological and biomedical investigations: the MIBBI project
Nature Biotechnology, 2008
The MetaCyc Database of metabolic pathways and enzymes and the BioCyc collection of Pathway/Genome Databases
Nucleic Acids Research, 2007
MetaCrop: a detailed database of crop plant metabolism
Nucleic Acids Research, 2007
Chemical Markup, XML, and the World Wide Web. 7. CMLSpect, an XML Vocabulary for Spectral Data
Journal of Chemical Information and Modeling, 2007
HMDB: the Human Metabolome Database
Nucleic Acids Research, 2007
SemanticEye: A Semantic Web Application to Rationalize and Enhance Chemical Electronic Publishing
Journal of Chemical Information and Modeling, 2006
Clarification of Pathway-Specific Inhibition by Fourier Transform Ion Cyclotron Resonance/Mass Spectrometry-Based Metabolic Phenotyping Studies
Plant Physiology, 2006
The Pathway Tools cellular overview diagram and Omics Viewer
Nucleic Acids Research, 2006
The Blue Obelisk—Interoperability in Chemical Informatics
Journal of Chemical Information and Modeling, 2006