An experimental evaluation of the REE SIFT environment for spaceborne applications
- 25 June 2003
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- p. 585-594
- https://doi.org/10.1109/dsn.2002.1029004
Abstract
Presents an experimental evaluation of a software-implemented fault tolerance (SIFT) environment built around a set of self-checking processes called ARMORs running on different machines that provide error detection and recovery services to themselves and to spaceborne scientific applications. The experiments are split into three groups of error injections, with each group successively stressing the SIFT error detection and recovery more than the previous group. The results show that the SIFT environment adds negligible overhead to the application during failure-free runs. Only 11 cases were observed in which either the application failed to start or the SIFT environment failed to recognize that the application had completed. Further investigations showed that assertions within the SIFT processes-coupled with object-based incremental checkpointing-were effective in preventing system failures by protecting dynamic data within the SIFT processes.Keywords
This publication has 18 references indexed in Scilit:
- Experimental evaluation of a COTS system for space applicationsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- A fault tolerance framework for CORBAPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- CoCheck: checkpointing and process migration for MPIPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- MPI/FT/sup TM/: architecture and taxonomies for fault-tolerant, message-passing middleware for performance-portable parallel computingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Micro-checkpointing: checkpointing for multithreaded applicationsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- NFTAPE: a framework for assessing dependability in distributed systems with lightweight fault injectorsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Chameleon: a software infrastructure for adaptive fault toleranceIEEE Transactions on Parallel and Distributed Systems, 1999
- GUARDS: a generic upgradable architecture for real-time dependable systemsIEEE Transactions on Parallel and Distributed Systems, 1999
- The Delta-4 approach to dependability in open distributed computing systemsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1988
- The design of radiation-hardened ICs for space: a compendium of approachesProceedings of the IEEE, 1988