Fault-tolerant distributed systems based on broadcast communication

Abstract
Distributed systems present problems of maintaining consistency of distributed data in the presence of faults. These problems are currently solved by agreement protocols that require many messages to be exchanged between processors with adverse effects on system performance. An approach is presented to the design of fault-tolerant distributed systems that avoids this message exchange, resulting in systems that are substantially more efficient. This approach is based on broadcast communication over a local area network such as the Ethernet, and on two novel protocols: the Trans protocol which provides efficient reliable broadcast communication, and the Total protocol which, with high probability, promptly takes a total order on messages and achieves distributed agreement even in the presence of a fault. Reliable distributed operations, such as locking, update, and commitment, require only a single broadcast message rather than the several tens of messages required by current algorithms.

This publication has 13 references indexed in Scilit: