Fault-tolerant wait-free shared objects
- 1 May 1998
- journal article
- Published by Association for Computing Machinery (ACM) in Journal of the ACM
- Vol. 45 (3) , 451-500
- https://doi.org/10.1145/278298.278305
Abstract
Wait-free implementations of shared objects tolerate the failure of processes, but not the failure of base objects from which they are implemented. We consider the problem of implementing shared objects that tolerate the failure of both processes and base objects.We identify two classes of object failures:responsiveandnonresponsive. With responsive failures, a faulty object responds to every operation, but its responses may be incorrect. With nonresponsive failures, a faulty object may also “hang” without responding. In each class, we definecrash, omission,andarbitrarymodes of failure.We show that all responsive failure modes can be tolerated. More precisely, for all responsive failure modes ℱ, object typesT, andt≥ 0, we show how to implement a shared object of typeTwhich ist-tolerant for ℱ. Such an object remains correct and wait-free even if up totbase objects fail according to ℱ. In contrast to responsive failures, we show that even the most benign non-responsive failure mode cannot be tolerated. We also show that randomization can be used to circumvent this impossibility result.Graceful degradationis a desirable property of fault-tolerant implementations: the implemented object never fails more severely than the base objects it is derived from, even if all the base objects fail. For several failure modes, we show wheter this property can be achieved, and, if so, how.Keywords
This publication has 27 references indexed in Scilit:
- Computing with faulty shared objectsJournal of the ACM, 1995
- Implementing fault-tolerant services using the state machine approach: a tutorialACM Computing Surveys, 1990
- Simulating authenticated broadcasts to derive simple fault-tolerant algorithmsDistributed Computing, 1987
- On the minimal synchronism needed for distributed consensusJournal of the ACM, 1987
- On interprocess communicationDistributed Computing, 1986
- Easy impossibility proofs for distributed consensus problemsDistributed Computing, 1986
- Impossibility of distributed consensus with one faulty processJournal of the ACM, 1985
- A New Solution to Lamport's Concurrent Programming Problem Using Small Shared VariablesACM Transactions on Programming Languages and Systems, 1983
- Concurrent reading and writingCommunications of the ACM, 1977
- Concurrent control with “readers” and “writers”Communications of the ACM, 1971