Fault tolerance in multiprocessor systems without dedicated redundancy
- 1 March 1988
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Computers
- Vol. 37 (3) , 358-362
- https://doi.org/10.1109/12.2174
Abstract
An algorithm called RAFT (recursive algorithm for fault tolerance) for achieving fault tolerance in multiprocessor systems is described. Through the use of a combination of dynamic space- and time- redundancy techniques, RAFT achieves fault tolerance in the presence of permanent as well as intermittent faults. Performance and reliability of multiprocessor systems using RAFT are determined as a function of individual processor reliability and the total number of fault modes in a processor. RAFT-based systems are superior to triple modular redundancy (TMR) systems in hardware economy and provide comparable reliability. A multiprocessor architecture adopting RAFT is given.Keywords
This publication has 4 references indexed in Scilit:
- Software implementation of a recursive fault tolerance algorithm on a network of computersACM SIGARCH Computer Architecture News, 1986
- Derivation and Calibration of a Transient Error Reliability ModelIEEE Transactions on Computers, 1982
- Schemes for fault-tolerant computing: A comparison of modularly redundant and t-diagnosable systemsInformation and Control, 1981
- Reliability analysis and architecture of a hybrid-redundant digital systemPublished by Association for Computing Machinery (ACM) ,1970