CALYPSO: a novel software system for fault-tolerant parallel processing on distributed platforms
- 19 November 2002
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- No. 10828907,p. 122-129
- https://doi.org/10.1109/hpdc.1995.518702
Abstract
The importance of adapting networks of workstations for use as parallel processing platforms is well established. However current solutions do not always address important issues that exist in real networks. External factors like the sharing of resources, unpredictable behavior of the network and failures, are present in multiuser networks and must be addressed. CALYPSO is a prototype software system for writing and executing parallel programs on non-dedicated platforms, based on COTS networked workstations operating systems, and compilers. Among notable properties of the system are: (1) simple programming paradigm incorporating shared memory constructs and separating the programming and the execution parallelism, (2) transparent utilization of unreliable shared resources by providing dynamic load balancing and fault tolerance, and (3) effective performance for large classes of coarse-grained computations. We present the system and report our initial experiments and performance results in settings that closely resemble the dynamic behavior of a "real" network. Under varying work-load conditions, resource availability and process failures, the efficiency of the test program we present ranged from 84% to 94% bench-marked against a sequential program.Keywords
This publication has 14 references indexed in Scilit:
- Highly efficient asynchronous execution of large-grained parallel programsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- PLinda 2.0: a transactional/checkpointing approach to fault tolerant LindaPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Parallel processing on networks of workstations: a fault-tolerant, high performance approachPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- The PVM concurrent computing system: Evolution, experiences, and trendsParallel Computing, 1994
- The distributed computing environment naming architectureDistributed Systems Engineering, 1993
- Network-based concurrent computing on the PVM systemConcurrency: Practice and Experience, 1992
- Clock construction in fully asynchronous parallel systems and PRAM simulationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1992
- The Clouds distributed operating systemComputer, 1991
- Linda in contextCommunications of the ACM, 1989
- Efficient dispersal of information for security, load balancing, and fault toleranceJournal of the ACM, 1989