Abstract
In this paper we consider the rollback propagation and the performance of a fault-tolerant multiprocessor with a rollback recovery mechanism (FTMR 2 M)[1], which was designed to be tolerant of hardware failure with minimum time overhead. Rollback propagation between cooperating processes is usually required to ensure correct recovery from failure. To minimize the waste of processor time and storage overhead required for handling sophisticated rollback propagations, the FTMR 2 M always keeps one recoverable state. Approaches for evaluating the recovery overhead and analyzing the performance of FTMR 2 M are presented. Two methods for detecting rollback propagations and multi-step rollbacks between cooperating processes are also proposed.

This publication has 4 references indexed in Scilit: