Optimal Checkpointing of Real-Time Tasks
- 1 November 1987
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Computers
- Vol. C-36 (11) , 1328-1341
- https://doi.org/10.1109/tc.1987.5009472
Abstract
Analytical models for the design and evaluation of checkpointing of real-time tasks are developed. First, the execution of a real-time task is modeled under a common assumption of perfect coverage of on-line detection mechanisms (which is termed a basic model). Then, the model is generalized (to an extended model) to include more realistic cases, i.e., imperfect coverages of on-line detection mechanisms and acceptance tests. Finally, we determine an optimal placement of checkpoints to minimize the mean task execution time while the probability of an unreliable result (or lack of confidence) is kept below a specified level. In the basic model, it is shown that equidistant intercheckpoint intervals are optimal, whereas this is not necessarily true in the extended model. An algorithm for calculating the optimal number of checkpoints and intercheckpoint intervals is presented with some numerical examples for the extended model.Keywords
This publication has 13 references indexed in Scilit:
- Error Detection Process—Model, Design, and Its Impact on Computer PerformanceIEEE Transactions on Computers, 1984
- Performance analysis of checkpointing strategiesACM Transactions on Computer Systems, 1984
- Design and Evaluation of a Fault-Tolerant Multiprocessor Using Hardware Recovery BlocksIEEE Transactions on Computers, 1984
- On the Optimum Checkpoint IntervalJournal of the ACM, 1979
- Performance of rollback recovery systems under intermittent failuresCommunications of the ACM, 1978
- FTMP—A highly reliable fault-tolerant multiprocess for aircraftProceedings of the IEEE, 1978
- SIFT: Design and analysis of a fault-tolerant computer for aircraft controlProceedings of the IEEE, 1978
- System structure for software fault toleranceIEEE Transactions on Software Engineering, 1975
- Analytic models for rollback and recovery strategies in data base systemsIEEE Transactions on Software Engineering, 1975
- A first order approximation to the optimum checkpoint intervalCommunications of the ACM, 1974