MEMOPS: Data modelling and automatic code generation
Open Access
- 1 December 2010
- journal article
- research article
- Vol. 7 (3) , 112-134
- https://doi.org/10.2390/biecoll-jib-2010-123
Abstract
In recent years the amount of biological data has exploded to the point where much useful information can only be extracted by complex computational analyses. Such analyses are greatly facilitated by metadata standards, both in terms of the ability to compare data originating from different sources, and in terms of exchanging data in standard forms, e.g. when running processes on a distributed computing infrastructure. However, standards thrive on stability whereas science tends to constantly move, with new methods being developed and old ones modified. Therefore maintaining both metadata standards, and all the code that is required to make them useful, is a non-trivial problem. Memops is a framework that uses an abstract definition of the metadata (described in UML) to generate internal data structures and subroutine libraries for data access (application programming interfaces - APIs - currently in Python, C and Java) and data storage (in XML files or databases). For the individual project these libraries obviate the need for writing code for input parsing, validity checking or output. Memops also ensures that the code is always internally consistent, massively reducing the need for code reorganisation. Across a scientific domain a Memops-supported data model makes it easier to support complex standards that can capture all the data produced in a scientific area, share them among all programs in a complex software pipeline, and carry them forward to deposition in an archive. The principles behind the Memops generation code will be presented, along with example applications in Nuclear Magnetic Resonance (NMR) spectroscopy and structural biology.Keywords
This publication has 15 references indexed in Scilit:
- The NMR restraints grid at BMRB for 5,266 protein and nucleic acid PDB entriesJournal of Biomolecular NMR, 2009
- CASD-NMR: critical assessment of automated structure determination by NMRNature Methods, 2009
- Relationship between chemical shift value and accessible surface area for all amino acid atomsBMC Structural Biology, 2009
- BioMagResBankNucleic Acids Research, 2007
- Version 1.2 of the Crystallography and NMR systemNature Protocols, 2007
- A global analysis of NMR distance constraints from the PDBJournal of Biomolecular NMR, 2007
- ARIA2: Automated NOE assignment and data integration in NMR structure calculationBioinformatics, 2006
- Standards for systems biologyNature Reviews Genetics, 2006
- A framework for scientific data modeling and automated software developmentBioinformatics, 2004
- HADDOCK: A Protein−Protein Docking Approach Based on Biochemical or Biophysical InformationJournal of the American Chemical Society, 2003