Abstractions for Node Level Passive Fault Detection in Distributed Systems

1 June 1983

journal article
Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Computers

Vol. C-32 (6) , 543-550
https://doi.org/10.1109/tc.1983.1676276

Abstract

We introduce a scheme for passive node-level fault detection in a distributed system. With each system node associate a low-cost, low-complexity observer which monitors the pattern of incoming and outgoing messages and compares it against an abstracted model of the node's behavior. We develop a fault detection procedure, which is probabilistic because of nondeterminism in the simplified node model. Abstraction reduces model complexity, but renders some errors undetectable by the observer. In the paper we characterize these undetectable errors. Succeeding studies show how to select model abstractions to lower the number of undetectable errors.

Keywords

This publication has 0 references indexed in Scilit: