An introduction to the analysis and debug of distributed computations
- 19 November 2002
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- Vol. 2, 545-553 vol.2
- https://doi.org/10.1109/icapp.1995.472239
Abstract
Distributed programs are much more difficult to design, understand and implement than sequential or parallel ones. This is mainly due to the uncertainty created by the asynchrony inherent to distributed machines. So appropriate concepts and tools have to be devised to help the programmer of distributed applications in his task. This paper is motivated by the practical problem called distributed debugging. It presents concepts and tools that help the programmer to analyze distributed executions. Two basic problems are addressed: replay of a distributed execution (how to reproduce an equivalent execution despite of asynchrony) and the detection of a stable or unstable property of a distributed execution. Concepts and tools presented are fundamental when designing an environment for distributed program development. This paper is essentially a survey presenting a state of the art in replay mechanisms and detection of unstable properties on global states of distributed executions.Keywords
This publication has 14 references indexed in Scilit:
- Inevitable global states: a concept to detect unstable properties of distributed computations in an observer independent wayPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Characterizing and detecting the set of global states seen by all observers of a distributed computationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- On the Fly Testing of Regular Patterns in Distributed ComputationsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1994
- Local states in distributed computationsACM SIGOPS Operating Systems Review, 1994
- Debugging tool for distributed Estelle programsComputer Communications, 1993
- Detecting atomic sequences of predicates in distributed computationsPublished by Association for Computing Machinery (ACM) ,1993
- Efficient execution replay technique for distributed memory architecturesPublished by Springer Nature ,1991
- Debugging Parallel Programs with Instant ReplayIEEE Transactions on Computers, 1987
- Distributed snapshotsACM Transactions on Computer Systems, 1985
- Time, clocks, and the ordering of events in a distributed systemCommunications of the ACM, 1978