Software implementation of a recursive fault tolerance algorithm on a network of computers
- 1 May 1986
- journal article
- Published by Association for Computing Machinery (ACM) in ACM SIGARCH Computer Architecture News
- Vol. 14 (2) , 65-72
- https://doi.org/10.1145/17356.17364
Abstract
RAFT is a recursive algorithm for fault tolerance that uses a combination of dynamic space and time redundancy techniques for detecting faulty processors and recovering from errors. U * is a multicomputer testbed consisting of a network of AT&T 3B2 computers running a network operating system based on the UNIX system. This paper describes a software implementation of RAFT on U * , and demonstrates the effectiveness of a RAFT-like scheme for designing fault-tolerant multicomputer systems. Results of Monte Carlo experiments, conducted on this system that validated the theoretical basis of RAFT, are presented. Experimentally observed performance penalty, incurred due to fault tolerance, is also presented.Keywords
This publication has 0 references indexed in Scilit: