Checkpointing and its applications
- 19 November 2002
- proceedings article
- Published by Institute of Electrical and Electronics Engineers (IEEE)
Abstract
This paper describes our experience with the implemen- tation and applications of the Unix checkpointing library libckp, and identifies two concepts that have proven to be the key to making checkpointing a powerful tool. First, including all persistent state, i.e., user files, as part of the process state that can be checkpointed and recovered pro- vides a truly transparent and consistent rollback. Second, excluding part of the persistent state from the process state allows user programs to process future inputs from a de- sirable state, which leads to interesting new applications of checkpointing. We use real-life examples to demonstrate the use oflibckp for bypassing premature software exits, for fast initialization and for memory rejuvenation.Keywords
This publication has 12 references indexed in Scilit:
- A recoverable object storePublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Efficient and effective placement for very large circuitsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Lazy checkpoint coordination for bounding rollback propagationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Pattern independent maximum current estimation in power and ground buses of CMOS VLSI circuits: Algorithms, signal correlations, and their resolutionIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 1995
- Transparent process migration: Design alternatives and the sprite implementationSoftware: Practice and Experience, 1991
- Data diversity: an approach to software fault toleranceIEEE Transactions on Computers, 1988
- Checkpointing and Rollback-Recovery for Distributed SystemsIEEE Transactions on Software Engineering, 1987
- The N-Version Approach to Fault-Tolerant SoftwareIEEE Transactions on Software Engineering, 1985
- Distributed snapshotsACM Transactions on Computer Systems, 1985
- System structure for software fault toleranceIEEE Transactions on Software Engineering, 1975