Applying content management to automated provenance capture
Open Access
- 2 August 2007
- journal article
- research article
- Published by Wiley in Concurrency and Computation: Practice and Experience
- Vol. 20 (5) , 541-554
- https://doi.org/10.1002/cpe.1230
Abstract
Workflows and data pipelines are becoming increasingly valuable to computational and experimental sciences. These automated systems are capable of generating significantly more data within the same amount of time compared to their manual counterparts. Automatically capturing and recording data provenance and annotation as part of these workflows are critical for data management, verification, and dissemination. We have been prototyping a workflow provenance system, targeted at biological workflows, that extends our content management technologies and other open source tools. We applied this prototype to the provenance challenge to demonstrate an end‐to‐end system that supports dynamic provenance capture, persistent content management, and dynamic searches of both provenance and metadata. We describe our prototype, which extends the Kepler system for the execution environment, the Scientific Annotation Middleware (SAM) content management software for data services, and an existing HTTP‐based query protocol. Our implementation offers several unique capabilities, and through the use of standards, is able to provide access to the provenance record with a variety of commonly available client tools. Copyright © 2007 John Wiley & Sons, Ltd.Keywords
This publication has 24 references indexed in Scilit:
- From computation models to models of provenance: the RWS approachConcurrency and Computation: Practice and Experience, 2007
- PASSing the provenance challengeConcurrency and Computation: Practice and Experience, 2007
- gLite Job Provenance—a job-centric viewConcurrency and Computation: Practice and Experience, 2007
- Query capabilities of the Karma provenance frameworkConcurrency and Computation: Practice and Experience, 2007
- Automatic capture and reconstruction of computational provenanceConcurrency and Computation: Practice and Experience, 2007
- Addressing the provenance challenge using ZOOMConcurrency and Computation: Practice and Experience, 2007
- Automatic capture and efficient storage of e‐Science experiment provenanceConcurrency and Computation: Practice and Experience, 2007
- Provenance in collection‐oriented scientific workflowsConcurrency and Computation: Practice and Experience, 2007
- Scientific workflow management and the Kepler systemConcurrency and Computation: Practice and Experience, 2005
- Lineage tracing for general data warehouse transformationsThe VLDB Journal, 2003