A distributed system-level diagnosis algorithm for arbitrary network topologies
- 1 January 1995
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Computers
- Vol. 44 (2) , 312-334
- https://doi.org/10.1109/12.364542
Abstract
In this paper, a distributed algorithm is described for detecting and diagnosing faulty processors in an arbitrary network. Fault-free processors perform simple periodic tests on one another; when a fault is detected or a newly-repaired processor joins the network, this new information is disseminated $mbi{in}$$mbi{parallel}$ throughout the network. It is formally proven that the algorithm is correct, and it is also shown that the algorithm is optimal in terms of the time required for all of the fault-free processors in the network to learn of a new event. Simulation results are given for arbitrary network topologies.Index Terms驴Computer fault diagnosis, computer fault tolerance, computer networks, distributed computing, system-level fault diagnosis, distributed algorithm, fault detection.
Keywords
This publication has 0 references indexed in Scilit: