Workflows in bioinformatics: meta-analysis and prototype implementation of a workflow generator
Open Access
- 7 April 2005
- journal article
- research article
- Published by Springer Nature in BMC Bioinformatics
- Vol. 6 (1) , 87
- https://doi.org/10.1186/1471-2105-6-87
Abstract
Background: Computational methods for problem solving need to interleave information access and algorithm execution in a problem-specific workflow. The structures of these workflows are defined by a scaffold of syntactic, semantic and algebraic objects capable of representing them. Despite the proliferation of GUIs (Graphic User Interfaces) in bioinformatics, only some of them provide workflow capabilities; surprisingly, no meta-analysis of workflow operators and components in bioinformatics has been reported. Results: We present a set of syntactic components and algebraic operators capable of representing analytical workflows in bioinformatics. Iteration, recursion, the use of conditional statements, and management of suspend/resume tasks have traditionally been implemented on an ad hoc basis and hard-coded; by having these operators properly defined it is possible to use and parameterize them as generic re-usable components. To illustrate how these operations can be orchestrated, we present GPIPE, a prototype graphic pipeline generator for PISE that allows the definition of a pipeline, parameterization of its component methods, and storage of metadata in XML formats. This implementation goes beyond the macro capacities currently in PISE. As the entire analysis protocol is defined in XML, a complete bioinformatic experiment (linked sets of methods, parameters and results) can be reproduced or shared among users. Availability:http://if-web1.imb.uq.edu.au/Pise/5.a/gpipe.html (interactive), ftp://ftp.pasteur.fr/pub/GenSoft/unix/misc/Pise/ (download). Conclusion: From our meta-analysis we have identified syntactic structures and algebraic operators common to many workflows in bioinformatics. The workflow components and algebraic operators can be assimilated into re-usable software components. GPIPE, a prototype implementation of this framework, provides a GUI builder to facilitate the generation of workflows and integration of heterogeneous analytical tools.Keywords
This publication has 8 references indexed in Scilit:
- Pegasys: software for executing and integrating analyses of biological sequencesBMC Bioinformatics, 2004
- myGrid: personalised bioinformatics on the information gridBioinformatics, 2003
- A task framework for the web interface W2HBioinformatics, 2003
- A classification of tasks in bioinformaticsBioinformatics, 2001
- A Web interface generator for molecular biology programs in UnixBioinformatics, 2001
- EMBOSS: The European Molecular Biology Open Software SuiteTrends in Genetics, 2000
- W2H: WWW interface to the GCG sequence analysis package.Bioinformatics, 1998
- Top-level ontological categoriesInternational Journal of Human-Computer Studies, 1995