Efficient distributed diagnosis in the presence of random faults
- 30 December 2002
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- No. 07313071,p. 462-469
- https://doi.org/10.1109/ftcs.1993.627349
Abstract
A probabilistic setting for distributed system-level diagnosis is studied. Processors in a multiprocessor system fail independently with constant probability 0 < p < 1/2. They can test their neighbors and a fault-free processor has probability 0 < q < 1 of detecting a fault of a failed processor in a single test. The aim of distributed diagnosis is for every fault-free processor to know the status of every other processor. This is achieved by communicating test results throughout the system. Faulty processors can behave arbitrarily, even maliciously, both as testers and as message transmitters. The author proposes a diagnosis scheme using O(n) time, O(n log n) tests, O(n/sup 2/) message bits and working correctly with probability converging to 1, as n grows. It is shown that each of these three characteristics has the lowest possible order of magnitude.Keywords
This publication has 11 references indexed in Scilit:
- Almost certain diagnosis for intermittently faulty systemsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- A probabilistic method for fault diagnosis of multiprocessor systemsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- On minimizing testing rounds for fault identificationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Distributed probabilistic fault diagnosis for multiprocessor systemsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- A guided tour of chernoff boundsInformation Processing Letters, 1990
- Almost Sure Fault Tolerance in Random GraphsSIAM Journal on Computing, 1987
- The Comparison Approach to Multiprocessor Fault DiagnosisIEEE Transactions on Computers, 1987
- Distributed fault-tolerance for large multiprocessor systemsPublished by Association for Computing Machinery (ACM) ,1980
- On Models for Diagnosable Systems and Probabilistic Fault DiagnosisIEEE Transactions on Computers, 1976
- On the Connection Assignment Problem of Diagnosable SystemsIEEE Transactions on Electronic Computers, 1967