Abstract
Perfect failure detectors can correctly decide whether a computer is crashed. However; it is impossible to imple- ment a perfect failure detector in purely asynchronous sys- tems. We show how to enforce perfect failure detection in timed distributed systems with hardware watchdogs. The two main system model assumptions are (I) each computer can measure time intervals with a known maximum error, and (2) each computer has a watchdog that crashes the computer unless the watchdog is periodically updated. We have implemented a system that satisfies both assumptions using a combination of off-the-shelfsofrwareare and hardware.

This publication has 11 references indexed in Scilit: