Evaluation of Error Recovery Blocks Used for Cooperating Processes

1 November 1984

journal article
Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Software Engineering

Vol. SE-10 (6) , 692-700
https://doi.org/10.1109/tse.1984.5010298

Abstract

Three alternatives for implementing recovery blocks (RB's) are conceivable for backward error recovery in concurrent processing. These are the asynchronous, synchronous, and the pseudorecovery point implementations. Asynchronous RB's are based on the concept of maximum autonomy in each of concurrent processes. Consequently, establishment of RB's in a process is made independently of others and unbounded rollback propagations become a serious problem. In order to completely avoid unbounded rollback propagations, it is necessary to synchronize the establishment of recovery blocks in all cooperating processes. Process autonomy is sacrificed and processes are forced to wait for commitments from others to establish a recovery line, leading to inefficiency in time utilization. As a compromise between asynchronous and synchronous RB's we propose to insert pseudorecovery points (PRP's) so that unbounded rollback propagations may be avoided while maintaining process autonomy. We developed probabilistic models for analyzing these three methods under standard assumptions in computer performance analysis, i.e., exponential distributions for related random variables. With these models we have estimated 1) the interval between two successive recovery lines for asynchronous RB's, 2) mean loss in computation power for the synchronized method, and 3) additional overhead and rollback distance in case PRP's are used.

Keywords

This publication has 9 references indexed in Scilit:

A program structure for error detection and recovery
Published by Springer Nature ,2005
Design and Evaluation of a Fault-Tolerant Multiprocessor Using Hardware Recovery Blocks
IEEE Transactions on Computers, 1984
Approaches to Mechanization of the Conversation Scheme Based on Monitors
IEEE Transactions on Software Engineering, 1982
Rollback propagation detection and performance evaluation of FTMR ² M—a fault-tolerant multiprocessor
ACM SIGARCH Computer Architecture News, 1982
A Survey of Techniques for Synchronization and Recovery in Decentralized Computer Systems
ACM Computing Surveys, 1981
State Restoration in Systems of Communicating Processes
IEEE Transactions on Software Engineering, 1980
Reliability Issues in Computing System Design
ACM Computing Surveys, 1978
Process backup in producer-consumer systems
Published by Association for Computing Machinery (ACM) ,1977
System structure for software fault tolerance
IEEE Transactions on Software Engineering, 1975