An approach for pipelining nested collections in scientific workflows
- 1 September 2005
- journal article
- Published by Association for Computing Machinery (ACM) in ACM SIGMOD Record
- Vol. 34 (3) , 12-17
- https://doi.org/10.1145/1084805.1084809
Abstract
We describe an approach for pipelining nested data collections in scientific workflows. Our approach logically delimits arbitrarily nested collections of data tokens using special, paired control tokens inserted into token streams, and provides workflow components with high-level operations for managing these collections. Our framework provides new capabilities for: (1) concurrent operation on collections; (2) on-the-fly customization of workflow component behavior; (3) improved handling of exceptions and faults; and (4) transparent passing of provenance and metadata within token streams. We demonstrate our approach using a workflow for inferring phylogenetic trees. We also describe future extensions to support richer typing mechanisms for facilitating sharing and reuse of workflow components between disciplines. This work represents a step towards our larger goal of exploiting collection-oriented dataflow programming as a new paradigm for scientific workflow systems, an approach we believe will significantly reduce the complexity of creating and reusing workflows and workflow components.Keywords
This publication has 7 references indexed in Scilit:
- Actor-Oriented Design of Scientific WorkflowsPublished by Springer Nature ,2005
- Taverna: a tool for the composition and enactment of bioinformatics workflowsBioinformatics, 2004
- Regular expression pattern matching for XMLJournal of Functional Programming, 2003
- Issues in data stream managementACM SIGMOD Record, 2003
- Nexus: An Extensible File Format for Systematic InformationSystematic Biology, 1997
- Principles of programming with complex objects and collection typesTheoretical Computer Science, 1995
- Dataflow process networksProceedings of the IEEE, 1995