Spare capacity as a means of fault detection and diagnosis in multiprocessor systems
- 1 June 1989
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Computers
- Vol. 38 (6) , 881-891
- https://doi.org/10.1109/12.24300
Abstract
A technique for detecting and diagnosing faults at the processor level in a multiprocessor system is described. A process is assigned whenever possible to two processors: the processor to which it would normally be assigned (primarily) and an additional processor that would otherwise be idle (secondary). Two strategies are described and analyzed: one that is preemptive and another that is nonpreemptive. It is shown that, for moderately loaded systems, a sufficient percentage of processes can be performed redundantly using the system's spare capacity to provide a basis for fault detection and diagnosis with virtually no degradation of response time. A multiprocessor that uses the approach for detecting faults at the processor loads is described.Keywords
This publication has 5 references indexed in Scilit:
- The Comparison Approach to Multiprocessor Fault DiagnosisIEEE Transactions on Computers, 1987
- Roving Emulation as a Fault Detection MechanismIEEE Transactions on Computers, 1986
- Greedy Diagnosis as the Basis of an Intermittent-Fault/ Transient-Upset Tolerant System DesignIEEE Transactions on Computers, 1983
- Schemes for fault-tolerant computing: A comparison of modularly redundant and t-diagnosable systemsInformation and Control, 1981
- A comparison connection assignment for diagnosis of multiprocessor systemsPublished by Association for Computing Machinery (ACM) ,1980