The PSI semantic validator: A framework to check MIAPE compliance of proteomics data
Open Access
- 17 November 2009
- journal article
- research article
- Published by Wiley in Proteomics
- Vol. 9 (22) , 5112-5119
- https://doi.org/10.1002/pmic.200900189
Abstract
The Human Proteome Organization's Proteomics Standards Initiative (PSI) promotes the development of exchange standards to improve data integration and interoperability. PSI specifies the suitable level of detail required when reporting a proteomics experiment (via the Minimum Information About a Proteomics Experiment), and provides extensible markup language (XML) exchange formats and dedicated controlled vocabularies (CVs) that must be combined to generate a standard compliant document. The framework presented here tackles the issue of checking that experimental data reported using a specific format, CVs and public bio‐ontologies (e.g. Gene Ontology, NCBI taxonomy) are compliant with the Minimum Information About a Proteomics Experiment recommendations. The semantic validator not only checks the XML syntax but it also enforces rules regarding the use of an ontology class or CV terms by checking that the terms exist in the resource and that they are used in the correct location of a document. Moreover, this framework is extremely fast, even on sizable data files, and flexible, as it can be adapted to any standard by customizing the parameters it requires: an XML Schema Definition, one or more CVs or ontologies, and a mapping file describing in a formal way how the semantic resources and the format are interrelated. As such, the validator provides a general solution to the common problem in data exchange: how to validate the correct usage of a data standard beyond simple XML Schema Definition validation. The framework source code and its various applications can be found at http://psidev.info/validator.Keywords
This publication has 28 references indexed in Scilit:
- Guidelines for reporting the use of gel electrophoresis in proteomicsNature Biotechnology, 2008
- Guidelines for reporting the use of mass spectrometry informatics in proteomicsNature Biotechnology, 2008
- Guidelines for reporting the use of mass spectrometry in proteomicsNature Biotechnology, 2008
- mzML: A single, unifying data format for mass spectrometer outputProteomics, 2008
- The Ontology Lookup Service: more data and better tools for controlled vocabulary queriesNucleic Acids Research, 2008
- The OBO Foundry: coordinated evolution of ontologies to support biomedical data integrationNature Biotechnology, 2007
- Broadening the horizon – level 2.5 of the HUPO-PSI format for molecular interactionsBMC Biology, 2007
- An Update on Data Standards for Gel ElectrophoresisProteomics, 2007
- The minimum information about a proteomics experiment (MIAPE)Nature Biotechnology, 2007
- The minimum information required for reporting a molecular interaction experiment (MIMIx)Nature Biotechnology, 2007