Specifying and using a partitionable group communication service
- 1 May 2001
- journal article
- Published by Association for Computing Machinery (ACM) in ACM Transactions on Computer Systems
- Vol. 19 (2) , 171-216
- https://doi.org/10.1145/377769.377776
Abstract
Group communication services are becoming accepted as effective building blocks for the construction of fault-tolerant distributed applications. Many specifications for group communication services have been proposed. However, there is still no agreement about what these specifications should say, especially in cases where the services are partitionable , i.e., where communication failures may lead to simultaneous creation of groups with disjoint memberships, such that each group is unware of the existence of any other group. In this paper, we present a new, succinct specification for a view-oriented partitionable group communication service. The service associates each message with a particular view of the group membership. All send and receive events for a message occur within the associated view. The service provides a total order on the messages within each view, and each processor receives a prefix of this order. Our specification separates safety requirements from performance and fault-tolerance requirements. The safety requirements are expressed by an abstract, global state machine . To present the performance and fault-tolerance requirements, we include failure-status input actions in the specification; we then give properties saying that consensus on the view and timely message delivery are guaranteed in an execution provided that the execution stabilizes to a situation in which the failure-status stops changing and corresponds to consistently partioned system. Because consensus is not required in every execution, the specification is not subject to the existing impossibility results for partionable systems. Our specification has a simple implementation, based on the membership algorithm of Christian and Schmuck. We show the utility of the specification by constructing an ordered-broadcast application, using an algorithm (based on algorithms of Amir, Dolev, Keidar, and others) that reconciles information derived from different instantiations of the group. The application manages the view-change activity to build a shared sequence of messages, i.e., the per-view total orders of the group service are combined to give a universal total order. We prove the correctness and analyze the performance and fault-tolerance of the resulting application.Keywords
This publication has 22 references indexed in Scilit:
- Synchronous and asynchronousCommunications of the ACM, 1996
- The Transis approach to high availability cluster communicationCommunications of the ACM, 1996
- HorusCommunications of the ACM, 1996
- TotemCommunications of the ACM, 1996
- The Totem single-ring ordering and membership protocolACM Transactions on Computer Systems, 1995
- Forward and Backward SimulationsInformation and Computation, 1995
- The inherent cost of strong-partial view-synchronous communicationPublished by Springer Nature ,1995
- Lightweight causal and atomic group multicastACM Transactions on Computer Systems, 1991
- Implementing fault-tolerant services using the state machine approach: a tutorialACM Computing Surveys, 1990
- Time, clocks, and the ordering of events in a distributed systemCommunications of the ACM, 1978