Checkpointing distributed applications on mobile computers
- 17 December 2002
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
Abstract
The integration of mobile/portable computing devices within existing data networks can be expected to spawn distributed applications that execute on mobile hosts (MHs). For reliability, it is vital that the global state of such applications be checkpointed from time to time. A global checkpoint consists of a set of local checkpoints, one per participant. This paper first identifies the problems in recording a consistent global state of mobile distributed applications. The location of a MH within the static network varies with time and therefore, a MH will first need to be located ("searched") in order to obtain its local checkpoint. Moreover, MHs often (voluntarily) disconnect from the network; a disconnected MH is not reachable from the rest of the network This means that a (disconnected) MH may not be available to provide its local checkpoint. Lastly, a MH is not equipped with stable storage; disk space at a MH is not considered stable due to vulnerability of MHs to loss, theft and physical damage. Therefore, an alternative stable repository is required to save local checkpoints of MHs. This paper presents a checkpointing algorithm for MHs that satisfies these constraints.Keywords
This publication has 18 references indexed in Scilit:
- Mobile wireless computing: challenges in data managementCommunications of the ACM, 1994
- Database system issues in nomadic computingPublished by Association for Computing Machinery (ACM) ,1993
- Wireless CoyoteCommunications of the ACM, 1993
- Impact of mobility on distributed computationsACM SIGOPS Operating Systems Review, 1993
- Disconnected operation in the Coda File SystemACM Transactions on Computer Systems, 1992
- Lightweight causal and atomic group multicastACM Transactions on Computer Systems, 1991
- Recovery in distributed systems using optimistic message logging and checkpointingJournal of Algorithms, 1990
- Optimistic recovery in distributed systemsACM Transactions on Computer Systems, 1985
- Distributed snapshotsACM Transactions on Computer Systems, 1985
- Time, clocks, and the ordering of events in a distributed systemCommunications of the ACM, 1978