Using logging and asynchronous checkpointing to implement recoverable distributed shared memory
- 30 December 2002
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
Abstract
Distributed shared memory provides a useful paradigm for developing distributed applications. As the number of processors in the system and running time of distributed applications increase, the likelihood of processor failure increases. A method of recovering processes running in a distributed shared memory environment which minimizes lost work and the cost of recovery is desirable so that long-running applications are not adversely affected by processor failure. A technique for achieving recoverable distributed shared memory which utilizes asynchronous process checkpoints and logging of pages accessed via read operations on the shared address space is presented. The scheme supports independent process recovery without forcing rollback of operational processes during recovery. The method is particularly useful in environments where taking process checkpoints is expensive.Keywords
This publication has 14 references indexed in Scilit:
- Crash recovery with little overheadPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Fast recovery in distributed shared virtual memory systemsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Fault tolerant distributed shared memory algorithmsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Recovery in distributed systems using optimistic message logging and checkpointingJournal of Algorithms, 1990
- Algorithms implementing distributed shared memoryComputer, 1990
- Recoverable distributed shared virtual memoryIEEE Transactions on Computers, 1990
- Distributed Checkpointing for Globally Consistent States of DatabasesIEEE Transactions on Software Engineering, 1989
- Efficient distributed recovery using message loggingPublished by Association for Computing Machinery (ACM) ,1989
- Optimistic recovery in distributed systemsACM Transactions on Computer Systems, 1985
- A Majority consensus approach to concurrency control for multiple copy databasesACM Transactions on Database Systems, 1979