Design and implementation of a consistent time service for fault-tolerant distributed systems
- 22 June 2004
- proceedings article
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- Vol. 19 (5) , 341-350
- https://doi.org/10.1109/dsn.2003.1209945
Abstract
Clock-related operations are one of the many sources of replica non-determinism and of replica incon- sistency in fault-tolerant distributed systems. In passive replication, if the primary server crashes, the next clock value returned by the new primary server might have actually rolled back in time, which can lead to undesirable consequences for the replicated application. The same problem can happen for active replication when the result of the first replica to respond is taken as the next clock value. In this paper we describe the design and implementation of a Consistent Time Service for fault- tolerant distributed systems. The Consistent Time Service introduces a group clock that is consistent across the replicas and that ensures the determinism of the replicas with respect to clock-related opera- tions. The group clock is monotonically increasing, is transparent to the application, and is fault-tolerant. The Consistent Time Service guarantees the consistency of the group clock even when faults occur, when new replicas are added into the group, and when failed replicas recover.Keywords
This publication has 15 references indexed in Scilit:
- End-to-end latency of a fault-tolerant CORBA infrastructurePublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Enforcing determinism for the consistent replication of multithreaded CORBA applicationsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- A posteriori agreement for fault-tolerant clock synchronization on broadcast networksPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Using atomic broadcast to implement a posteriori agreement for clock synchronizationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Latency analysis of the Totem single-ring protocolIEEE/ACM Transactions on Networking, 2001
- Hypervisor-based fault toleranceACM Transactions on Computer Systems, 1996
- The Totem single-ring ordering and membership protocolACM Transactions on Computer Systems, 1995
- Ordering and timeliness requirements of dependable real-time programsReal-Time Systems, 1994
- Delta-4: A Generic Architecture for Dependable Distributed ComputingPublished by Springer Nature ,1991
- Optimal clock synchronizationJournal of the ACM, 1987