ickp: a consistent checkpointer for multicomputers
- 1 January 1994
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Parallel & Distributed Technology: Systems & Applications
- Vol. 2 (2) , 62-67
- https://doi.org/10.1109/88.311574
Abstract
There has been much research on checkpointing algorithms for parallel and distributed systems; but surprisingly few implementations for uniprocessors, multiprocessors, and distributed systems, and none at all for multicomputers. We discuss ickp, our consistent checkpointer for the Intel iPSC/860, which is the first general-purpose checkpointer for a multicomputer. It is a checkpointing library that may be invoked asynchronously from the host processor, at a periodic interval, or by a library call. It implements three consistent checkpointing algorithms, two optimizations to reduce checkpoint time and overhead, and recovery.Keywords
This publication has 8 references indexed in Scilit:
- The performance of consistent checkpointingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Adaptive independent checkpointing for reducing rollback propagationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- CATCH-compiler-assisted techniques for checkpointingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- On-line data compression in a log-structured file systemPublished by Association for Computing Machinery (ACM) ,1992
- An efficient checkpointing method for multicomputers with wormhole routingInternational Journal of Parallel Programming, 1991
- Recovery in distributed systems using optimistic message logging and checkpointingJournal of Algorithms, 1990
- Fault tolerance under UNIXACM Transactions on Computer Systems, 1989
- Distributed snapshotsACM Transactions on Computer Systems, 1985