Applying content management to automated provenance capture

Open Access

2 August 2007

journal article
research article
Published by Wiley in Concurrency and Computation: Practice and Experience

Vol. 20 (5) , 541-554
https://doi.org/10.1002/cpe.1230

Abstract

Workflows and data pipelines are becoming increasingly valuable to computational and experimental sciences. These automated systems are capable of generating significantly more data within the same amount of time compared to their manual counterparts. Automatically capturing and recording data provenance and annotation as part of these workflows are critical for data management, verification, and dissemination. We have been prototyping a workflow provenance system, targeted at biological workflows, that extends our content management technologies and other open source tools. We applied this prototype to the provenance challenge to demonstrate an end‐to‐end system that supports dynamic provenance capture, persistent content management, and dynamic searches of both provenance and metadata. We describe our prototype, which extends the Kepler system for the execution environment, the Scientific Annotation Middleware (SAM) content management software for data services, and an existing HTTP‐based query protocol. Our implementation offers several unique capabilities, and through the use of standards, is able to provide access to the provenance record with a variety of commonly available client tools. Copyright © 2007 John Wiley & Sons, Ltd.

Keywords

This publication has 24 references indexed in Scilit:

From computation models to models of provenance: the RWS approach
Concurrency and Computation: Practice and Experience, 2007
PASSing the provenance challenge
Concurrency and Computation: Practice and Experience, 2007
gLite Job Provenance—a job-centric view
Concurrency and Computation: Practice and Experience, 2007
Query capabilities of the Karma provenance framework
Concurrency and Computation: Practice and Experience, 2007
Automatic capture and reconstruction of computational provenance
Concurrency and Computation: Practice and Experience, 2007
Addressing the provenance challenge using ZOOM
Concurrency and Computation: Practice and Experience, 2007
Automatic capture and efficient storage of e‐Science experiment provenance
Concurrency and Computation: Practice and Experience, 2007
Provenance in collection‐oriented scientific workflows
Concurrency and Computation: Practice and Experience, 2007
Scientific workflow management and the Kepler system
Concurrency and Computation: Practice and Experience, 2005
Lineage tracing for general data warehouse transformations
The VLDB Journal, 2003