Evaluation of Error Recovery Blocks Used for Cooperating Processes
- 1 November 1984
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Software Engineering
- Vol. SE-10 (6) , 692-700
- https://doi.org/10.1109/tse.1984.5010298
Abstract
Three alternatives for implementing recovery blocks (RB's) are conceivable for backward error recovery in concurrent processing. These are the asynchronous, synchronous, and the pseudorecovery point implementations. Asynchronous RB's are based on the concept of maximum autonomy in each of concurrent processes. Consequently, establishment of RB's in a process is made independently of others and unbounded rollback propagations become a serious problem. In order to completely avoid unbounded rollback propagations, it is necessary to synchronize the establishment of recovery blocks in all cooperating processes. Process autonomy is sacrificed and processes are forced to wait for commitments from others to establish a recovery line, leading to inefficiency in time utilization. As a compromise between asynchronous and synchronous RB's we propose to insert pseudorecovery points (PRP's) so that unbounded rollback propagations may be avoided while maintaining process autonomy. We developed probabilistic models for analyzing these three methods under standard assumptions in computer performance analysis, i.e., exponential distributions for related random variables. With these models we have estimated 1) the interval between two successive recovery lines for asynchronous RB's, 2) mean loss in computation power for the synchronized method, and 3) additional overhead and rollback distance in case PRP's are used.Keywords
This publication has 9 references indexed in Scilit:
- A program structure for error detection and recoveryPublished by Springer Nature ,2005
- Design and Evaluation of a Fault-Tolerant Multiprocessor Using Hardware Recovery BlocksIEEE Transactions on Computers, 1984
- Approaches to Mechanization of the Conversation Scheme Based on MonitorsIEEE Transactions on Software Engineering, 1982
- Rollback propagation detection and performance evaluation of FTMR 2 M—a fault-tolerant multiprocessorACM SIGARCH Computer Architecture News, 1982
- A Survey of Techniques for Synchronization and Recovery in Decentralized Computer SystemsACM Computing Surveys, 1981
- State Restoration in Systems of Communicating ProcessesIEEE Transactions on Software Engineering, 1980
- Reliability Issues in Computing System DesignACM Computing Surveys, 1978
- Process backup in producer-consumer systemsPublished by Association for Computing Machinery (ACM) ,1977
- System structure for software fault toleranceIEEE Transactions on Software Engineering, 1975