On the reliability of consensus-based fault-tolerant distributed computing systems
- 1 October 1987
- journal article
- Published by Association for Computing Machinery (ACM) in ACM Transactions on Computer Systems
- Vol. 5 (4) , 394-416
- https://doi.org/10.1145/29868.31332
Abstract
The designer of a fault-tolerant distributed system faces numerous alternatives. Using a stochastic model of processor failure times, we investigate design choices such as replication level, protocol running time, randomized versus deterministic protocols, fault detection, and authentication. We use the probability with which a system produces the correct output as our evaluation criterion. This contrasts with previous fault-tolerance results that guarantee correctness only if the percentage of faulty processors in the system can be bounded. Our results reveal some subtle and counterintuitive interactions between the design parameters and system reliability.Keywords
This publication has 14 references indexed in Scilit:
- Stopping times of distributed consensus protocols: A probabilistic analysisInformation Processing Letters, 1987
- Knowledge and Common Knowledge in a Byzantine Environment I: Crash failuresPublished by Elsevier ,1986
- A Simple and Efficient Randomized Byzantine Agreement AlgorithmIEEE Transactions on Software Engineering, 1985
- Synchronizing clocks in the presence of faultsJournal of the ACM, 1985
- Using Time Instead of Timeout for Fault-Tolerant Distributed Systems.ACM Transactions on Programming Languages and Systems, 1984
- Authenticated Algorithms for Byzantine AgreementSIAM Journal on Computing, 1983
- The Byzantine Generals ProblemACM Transactions on Programming Languages and Systems, 1982
- A lower bound for the time to assure interactive consistencyInformation Processing Letters, 1982
- Reaching Agreement in the Presence of FaultsJournal of the ACM, 1980
- Time, clocks, and the ordering of events in a distributed systemCommunications of the ACM, 1978