Optimal Checkpointing of Real-Time Tasks

1 November 1987

journal article
Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Computers

Vol. C-36 (11) , 1328-1341
https://doi.org/10.1109/tc.1987.5009472

Abstract

Analytical models for the design and evaluation of checkpointing of real-time tasks are developed. First, the execution of a real-time task is modeled under a common assumption of perfect coverage of on-line detection mechanisms (which is termed a basic model). Then, the model is generalized (to an extended model) to include more realistic cases, i.e., imperfect coverages of on-line detection mechanisms and acceptance tests. Finally, we determine an optimal placement of checkpoints to minimize the mean task execution time while the probability of an unreliable result (or lack of confidence) is kept below a specified level. In the basic model, it is shown that equidistant intercheckpoint intervals are optimal, whereas this is not necessarily true in the extended model. An algorithm for calculating the optimal number of checkpoints and intercheckpoint intervals is presented with some numerical examples for the extended model.

Keywords

This publication has 13 references indexed in Scilit:

Error Detection Process—Model, Design, and Its Impact on Computer Performance
IEEE Transactions on Computers, 1984
Performance analysis of checkpointing strategies
ACM Transactions on Computer Systems, 1984
Design and Evaluation of a Fault-Tolerant Multiprocessor Using Hardware Recovery Blocks
IEEE Transactions on Computers, 1984
On the Optimum Checkpoint Interval
Journal of the ACM, 1979
Performance of rollback recovery systems under intermittent failures
Communications of the ACM, 1978
FTMP—A highly reliable fault-tolerant multiprocess for aircraft
Proceedings of the IEEE, 1978
SIFT: Design and analysis of a fault-tolerant computer for aircraft control
Proceedings of the IEEE, 1978
System structure for software fault tolerance
IEEE Transactions on Software Engineering, 1975
Analytic models for rollback and recovery strategies in data base systems
IEEE Transactions on Software Engineering, 1975
A first order approximation to the optimum checkpoint interval
Communications of the ACM, 1974