A fault-tolerant protocol for atomic broadcast

1 July 1990

journal article
Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Parallel and Distributed Systems

Vol. 1 (3) , 271-285
https://doi.org/10.1109/71.80156

Abstract

A general protocol for atomic broadcast in networks is presented. The protocol tolerates loss, duplication, reordering, delay of messages, and network partitioning in an arbitrary network of fail-stop sites (i.e. no Byzantine site behavior is tolerated). The protocol is based on majority-concensus decisions to commit on unique ordering of received broadcast messages. Under normal operating conditions, the protocol requires three phases to complete and approximately 4N/V messages where N is the number of sites. This overhead is distributed among the messages of which the delivery decision is made and the heavier the broadcast message traffic, the lower the overhead per broadcast message becomes. Under abnormal operating conditions, a decentralized termination protocol (also presented) is invoked. A performance analysis of this protocol is presented, showing that this protocol commits with high probability under realistic operating conditions without invoking termination protocol if N is sufficiently large. The protocol retains its efficiency in wide-area networks where broadcast communication media are unavailable.<>

Keywords

This publication has 14 references indexed in Scilit:

Message ordering in a multicast environment
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2003
Broadcast protocols for distributed systems
IEEE Transactions on Parallel and Distributed Systems, 1990
Probabilistic clock synchronization
Distributed Computing, 1989
Maintaining availability in partitioned replicated databases
ACM Transactions on Database Systems, 1989
Reliable communication in the presence of failures
ACM Transactions on Computer Systems, 1987
Impossibility of distributed consensus with one faulty process
Journal of the ACM, 1985
Reliable broadcast protocols
ACM Transactions on Computer Systems, 1984
Multiversion concurrency control—theory and algorithms
ACM Transactions on Database Systems, 1983
A Formal Model of Crash Recovery in a Distributed System
IEEE Transactions on Software Engineering, 1983
Transactions and consistency in distributed database systems
ACM Transactions on Database Systems, 1982