Enforcing perfect failure detection
- 13 November 2002
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
Abstract
Perfect failure detectors can correctly decide whether a computer is crashed. However; it is impossible to imple- ment a perfect failure detector in purely asynchronous sys- tems. We show how to enforce perfect failure detection in timed distributed systems with hardware watchdogs. The two main system model assumptions are (I) each computer can measure time intervals with a known maximum error, and (2) each computer has a watchdog that crashes the computer unless the watchdog is periodically updated. We have implemented a system that satisfies both assumptions using a combination of off-the-shelfsofrwareare and hardware.Keywords
This publication has 11 references indexed in Scilit:
- Simulating fail-stop in asynchronous distributed systemsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- The timed asynchronous distributed system modelIEEE Transactions on Parallel and Distributed Systems, 1999
- Unreliable failure detectors for reliable distributed systemsJournal of the ACM, 1996
- The weakest failure detector for solving consensusPublished by Association for Computing Machinery (ACM) ,1992
- Unreliable failure detectors for asynchronous systems (preliminary version)Published by Association for Computing Machinery (ACM) ,1991
- Leases: an efficient fault-tolerant mechanism for distributed file cache consistencyPublished by Association for Computing Machinery (ACM) ,1989
- Replication and fault-tolerance in the ISIS systemPublished by Association for Computing Machinery (ACM) ,1985
- Impossibility of distributed consensus with one faulty processJournal of the ACM, 1985
- Distributed snapshotsACM Transactions on Computer Systems, 1985
- Time, clocks, and the ordering of events in a distributed systemCommunications of the ACM, 1978