BugNet: Continuously Recording Program Execution for Deterministic Replay Debugging
Top Cited Papers
- 27 July 2005
- proceedings article
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- p. 284-295
- https://doi.org/10.1109/isca.2005.16
Abstract
Significant time is spent by companies trying to reproduce and fix the bugs that occur for released code. To assist developers, we propose the BugNet architecture to continuously record information on production runs. The information collected before the crash of a program can be used by the developers working in their execution environment to deterministically replay the last several million instructions executed before the crash. BugNet is based on the insight that recording the register file contents at any point in time, and then recording the load values that occur after that point can enable deterministic replaying of a programýs execution. BugNet focuses on being able to replay the applicationýs execution and the libraries it uses, but not the operating system. But our approach provides the ability to replay an applicationýs execution across context switches and interrupts. Hence, BugNet obviates the need for tracking program I/O, interrupts and DMA transfers, which would have otherwise required more complex hardware support. In addition, BugNet does not require a final core dump of the system state for replaying, which significantly reduces the amount of data that must be sent back to the developer.Keywords
This publication has 19 references indexed in Scilit:
- AccMon: Automatically Detecting Memory-Related Bugs via Program Counter-Based InvariantsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- PinPublished by Association for Computing Machinery (ACM) ,2005
- ReVive: cost-effective architectural support for rollback recovery in shared-memory multiprocessorsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- A "flight data recorder" for enabling full-system multiprocessor deterministic replayPublished by Association for Computing Machinery (ACM) ,2003
- ReEnactPublished by Association for Computing Machinery (ACM) ,2003
- Simics: A full system simulation platformComputer, 2002
- SafetyNet: improving the availability of shared memory multiprocessors with global checkpoint/recoveryPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- EraserACM Transactions on Computer Systems, 1997
- Optimal tracing and replay for debugging shared-memory parallel programsPublished by Association for Computing Machinery (ACM) ,1993
- Debugging Parallel Programs with Instant ReplayIEEE Transactions on Computers, 1987