On the Execution of Large Batch Programs in Unreliable Computing Systems
- 1 July 1984
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Software Engineering
- Vol. SE-10 (4) , 444-450
- https://doi.org/10.1109/tse.1984.5010258
Abstract
The execution of long-running batch programs imposes severe reliability constraints on a computing system since the occurrence of a failure during its execution is more likely and that once occurred, a failure would destroy all the processing perfonned thus far. This paper studies the execution delay and machine resources consumed in supporting the running of large batch programs in a computing environment interrupted by failures. The effect of checkpoints and their optimal insertion are also considered. The results are applicable to arbitrary law of failure.Keywords
This publication has 6 references indexed in Scilit:
- On the Optimum Checkpoint IntervalJournal of the ACM, 1979
- Performance of rollback recovery systems under intermittent failuresCommunications of the ACM, 1978
- Analytic models for rollback and recovery strategies in data base systemsIEEE Transactions on Software Engineering, 1975
- A first order approximation to the optimum checkpoint intervalCommunications of the ACM, 1974
- Rollback and Recovery Strategies for Computer ProgramsIEEE Transactions on Computers, 1972
- Applied Dynamic ProgrammingPublished by Walter de Gruyter GmbH ,1962