Tracking provenance in a virtual data grid
Open Access
- 21 August 2007
- journal article
- research article
- Published by Wiley in Concurrency and Computation: Practice and Experience
- Vol. 20 (5) , 565-575
- https://doi.org/10.1002/cpe.1256
Abstract
The virtual data model allows data sets to be described prior to, and separately from, their physical materialization. We have implemented this model in a Virtual Data Language (VDL) and associated supporting tools, which provide for both the storage, query, and retrieval of virtual data set descriptions, and the automated, on‐demand materialization of virtual data sets. We use a standardized data provenance challenge exercise to illustrate the powerful queries that can be performed on the data maintained by these tools, which for a single virtual data set can include three elements: the computational procedure(s) that must be executed to materialize the data set, the runtime log(s) produced by the execution of the computation(s), and optional metadata annotation(s) that associate application semantics with data and procedures. Copyright © 2007 John Wiley & Sons, Ltd.Keywords
This publication has 9 references indexed in Scilit:
- Tackling the Provenance Challenge one layer at a timeConcurrency and Computation: Practice and Experience, 2007
- Special Issue: The First Provenance ChallengeConcurrency and Computation: Practice and Experience, 2007
- PASSing the provenance challengeConcurrency and Computation: Practice and Experience, 2007
- Automatic capture and reconstruction of computational provenanceConcurrency and Computation: Practice and Experience, 2007
- Extracting causal graphs from an open provenance data modelConcurrency and Computation: Practice and Experience, 2007
- Applying the Virtual Data Provenance ModelPublished by Springer Nature ,2006
- Pegasus: Mapping Scientific Workflows onto the GridPublished by Springer Nature ,2004
- Condor-G: A Computation Management Agent for Multi-Institutional GridsCluster Computing, 2002
- Globus: a Metacomputing Infrastructure ToolkitThe International Journal of Supercomputer Applications and High Performance Computing, 1997