Spare capacity as a means of fault detection and diagnosis in multiprocessor systems

1 June 1989

journal article
Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Computers

Vol. 38 (6) , 881-891
https://doi.org/10.1109/12.24300

Abstract

A technique for detecting and diagnosing faults at the processor level in a multiprocessor system is described. A process is assigned whenever possible to two processors: the processor to which it would normally be assigned (primarily) and an additional processor that would otherwise be idle (secondary). Two strategies are described and analyzed: one that is preemptive and another that is nonpreemptive. It is shown that, for moderately loaded systems, a sufficient percentage of processes can be performed redundantly using the system's spare capacity to provide a basis for fault detection and diagnosis with virtually no degradation of response time. A multiprocessor that uses the approach for detecting faults at the processor loads is described.

Keywords

This publication has 5 references indexed in Scilit:

The Comparison Approach to Multiprocessor Fault Diagnosis
IEEE Transactions on Computers, 1987
Roving Emulation as a Fault Detection Mechanism
IEEE Transactions on Computers, 1986
Greedy Diagnosis as the Basis of an Intermittent-Fault/ Transient-Upset Tolerant System Design
IEEE Transactions on Computers, 1983
Schemes for fault-tolerant computing: A comparison of modularly redundant and t-diagnosable systems
Information and Control, 1981
A comparison connection assignment for diagnosis of multiprocessor systems
Published by Association for Computing Machinery (ACM) ,1980