State restoration in a COTS-based N-modular architecture
- 27 November 2002
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
Abstract
Mechanisms for restoring the state of a channel in an N-modular redundant architecture are necessary to prevent redundancy attrition due to transient faults and to allow failed channels to be brought back on line after repair. This paper considers software-implemented mechanisms for state restoration (SR) in a generic fault-tolerant architecture in which both the underlying hardware and operating system are commercial off-the-shelf (COTS) components. State restoration involves copying the values of state variables from the active channel(s) across to the joining channel. Concurrent updating of state variables by application tasks is considered. Two state restoration schemes are considered: Running SR and Recursive SR. In the former, each state variable is copied exactly once while concurrent updates are written through to the joining channel. In the latter state variables are copied once and then recopied recursively until no concurrent updates are detected Author(s) Bondavalli, A. Istituto CNUCE, CNR, Pisa, Italy Di Giandomenico, F. ; Grandoni, F. ; Powell, D. ; Rabejac, C.Keywords
This publication has 9 references indexed in Scilit:
- Hardware assisted recovery from transient errors in redundant processing systemsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Real time recovery of fault tolerant processing elementsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Reconfiguration and transient recovery in state machine architecturesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- A Fault-Masking and Transient-Recovery Model for Digital Flight-Control SystemsPublished by Springer Nature ,1993
- Fault-tolerant parallel processorJournal of Guidance, Control, and Dynamics, 1991
- Aliasing probability for multiple input signature analyzerIEEE Transactions on Computers, 1990
- The MAFT architecture for distributed fault toleranceIEEE Transactions on Computers, 1988
- Formal Specification and Mechanical Verification of SIFT: A Fault-Tolerant Flight Control SystemIEEE Transactions on Computers, 1982
- Measures of the Effectiveness of Fault Signature AnalysisIEEE Transactions on Computers, 1980