A fault-tolerant protocol for atomic broadcast
- 1 July 1990
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Parallel and Distributed Systems
- Vol. 1 (3) , 271-285
- https://doi.org/10.1109/71.80156
Abstract
A general protocol for atomic broadcast in networks is presented. The protocol tolerates loss, duplication, reordering, delay of messages, and network partitioning in an arbitrary network of fail-stop sites (i.e. no Byzantine site behavior is tolerated). The protocol is based on majority-concensus decisions to commit on unique ordering of received broadcast messages. Under normal operating conditions, the protocol requires three phases to complete and approximately 4N/V messages where N is the number of sites. This overhead is distributed among the messages of which the delivery decision is made and the heavier the broadcast message traffic, the lower the overhead per broadcast message becomes. Under abnormal operating conditions, a decentralized termination protocol (also presented) is invoked. A performance analysis of this protocol is presented, showing that this protocol commits with high probability under realistic operating conditions without invoking termination protocol if N is sufficiently large. The protocol retains its efficiency in wide-area networks where broadcast communication media are unavailable.<>Keywords
This publication has 14 references indexed in Scilit:
- Message ordering in a multicast environmentPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Broadcast protocols for distributed systemsIEEE Transactions on Parallel and Distributed Systems, 1990
- Probabilistic clock synchronizationDistributed Computing, 1989
- Maintaining availability in partitioned replicated databasesACM Transactions on Database Systems, 1989
- Reliable communication in the presence of failuresACM Transactions on Computer Systems, 1987
- Impossibility of distributed consensus with one faulty processJournal of the ACM, 1985
- Reliable broadcast protocolsACM Transactions on Computer Systems, 1984
- Multiversion concurrency control—theory and algorithmsACM Transactions on Database Systems, 1983
- A Formal Model of Crash Recovery in a Distributed SystemIEEE Transactions on Software Engineering, 1983
- Transactions and consistency in distributed database systemsACM Transactions on Database Systems, 1982