Closure and convergence: a foundation of fault-tolerant computing

1 November 1993

journal article
Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Software Engineering

Vol. 19 (11) , 1015-1027
https://doi.org/10.1109/32.256850

Abstract

The authors formally define what it means for a system to tolerate a class of faults. The definition consists of two conditions. The first is that if a fault occurs when the system state is within the set of legal states, the resulting state is within some larger set and, if faults continue to occur, the system state remains within that larger set (closure). The second is that if faults stop occurring, the system eventually reaches a state within the legal set (convergence). The applicability of the definition for specifying and verifying the fault-tolerance properties of a variety of digital and computer systems is demonstrated. Using the definition, the authors obtain a simple classification of fault-tolerant systems. Methods for the systematic design of such systems are discussed.<>

Keywords

This publication has 25 references indexed in Scilit:

Self-stabilization
ACM Computing Surveys, 1993
Stabilizing communication protocols
IEEE Transactions on Computers, 1991
Understanding fault-tolerant distributed systems
Communications of the ACM, 1991
Uniform self-stabilizing rings
ACM Transactions on Programming Languages and Systems, 1989
Simulating authenticated broadcasts to derive simple fault-tolerant algorithms
Distributed Computing, 1987
Impossibility of distributed consensus with one faulty process
Journal of the ACM, 1985
Fault Tolerance Terminology Proposals
Published by Springer Nature ,1985
Fail-stop processors
ACM Transactions on Computer Systems, 1983
Self-stabilizing systems in spite of distributed control
Communications of the ACM, 1974
Solution of a problem in concurrent programming control
Communications of the ACM, 1965