Exploiting data-flow for fault-tolerance in a wide-area parallel system
- 24 December 2002
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
Abstract
Wide-area parallel processing systems will soon be available to researchers to solve a range of problems. In these systems, it is certain that host failures and other faults will be a common occurrence. Unfortunately, most parallel processing systems have not been designed with fault-tolerance in mind. Mentat is a high-performance object-oriented parallel processing system that is based on an extension of the data-flow model. The functional nature of data-flow enables both parallelism and fault-tolerance. In this paper, we exploit the data-flow underpinning of Mentat to provide easy-to-use and transparent fault-tolerance. We present results on both a small-scale network and a wide-area heterogeneous environment that consists of three sites: the National Center for Supercomputing Applications, the University of Virginia and the NASA Langley Research Center.Keywords
This publication has 16 references indexed in Scilit:
- HeNCE: graphical development tools for network-based concurrent computingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- PLinda 2.0: a transactional/checkpointing approach to fault tolerant LindaPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Replicated K-resilient objects in ArjunaPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- CALYPSO: a novel software system for fault-tolerant parallel processing on distributed platformsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Campus-Wide Computing : Early Results Using Legion At the University of VirginiaThe International Journal of Supercomputer Applications and High Performance Computing, 1997
- Portable run-time support for dynamic object-oriented parallel processingACM Transactions on Computer Systems, 1996
- Fault tolerance via replication in coarse grain data-flowPublished by Springer Nature ,1996
- Experimental evaluation of a reusability-oriented parallel programming environmentIEEE Transactions on Software Engineering, 1990
- Implementing Fault-Tolerant Distributed ObjectsIEEE Transactions on Software Engineering, 1985
- Parallel Processing with Large-Grain Data Flow TechniquesComputer, 1984