A MapReduce-Enabled Scientific Workflow Composition Framework
- 1 July 2009
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- p. 663-670
- https://doi.org/10.1109/icws.2009.90
Abstract
MapReduce has recently gained a lot of attention as a parallel programming model for scalable data-intensive business and scientific analysis. In order to benefit from this powerful programming model in a scientific workflow environment, we propose a MapReduce-enabled scientific workflow composition framework consisting of: i) a dataflow based scientific workflow model that separates the declaration of the workflow interface from the definition of its functional body; ii) a set of dataflow constructs, including Map, Reduce, Loop, and Conditional, and their composition semantics to enable MapReduce-style scientific workflows; iii) an XML-based scientific workflow specification language, called WSL, in which both Map and Reduce are fully composable with other dataflow constructs in both flat and hierarchical manners. Besides leveraging the power of MapReduce to the workflow level, our workflow composition framework is unique in that workflows are the only operands for composition; in this way, our approach elegantly solves the two-world problem of existing composition frameworks, in which composition needs to deal with both the world of tasks and the world of workflows. The proposed framework is implemented and a case study is conducted to validate our techniques.Keywords
This publication has 12 references indexed in Scilit:
- A Reference Architecture for Scientific Workflow Management Systems and the VIEW SOA SolutionIEEE Transactions on Services Computing, 2009
- Introduction and evaluation of MartletPublished by Association for Computing Machinery (ACM) ,2007
- VisTrailsPublished by Association for Computing Machinery (ACM) ,2006
- Programming scientific and distributed workflow with Triana servicesConcurrency and Computation: Practice and Experience, 2005
- Scientific workflow management and the Kepler systemConcurrency and Computation: Practice and Experience, 2005
- A notation and system for expressing and executing cleanly typed workflows on messy scientific dataACM SIGMOD Record, 2005
- YAWL: yet another workflow languageInformation Systems, 2005
- Pegasus: A Framework for Mapping Complex Scientific Workflows onto Distributed SystemsScientific Programming, 2005
- Taverna: a tool for the composition and enactment of bioinformatics workflowsBioinformatics, 2004
- The spawning pheromone cysteine‐glutathione disulfide (‘nereithione’) arouses a multicomponent nuptial behavior and electrophysiological activity inNereis succineamalesThe FASEB Journal, 1999