Recovering device drivers
- 1 November 2006
- journal article
- Published by Association for Computing Machinery (ACM) in ACM Transactions on Computer Systems
- Vol. 24 (4) , 333-360
- https://doi.org/10.1145/1189256.1189257
Abstract
This article presents a new mechanism that enables applications to run correctly when device drivers fail. Because device drivers are the principal failing component in most systems, reducing driver-induced failures greatly improves overall reliability. Earlier work has shown that an operating system can survive driver failures [Swift et al. 2005], but the applications that depend on them cannot. Thus, while operating system reliability was greatly improved, application reliability generally was not.To remedy this situation, we introduce a new operating system mechanism called ashadow driver. A shadow driver monitors device drivers and transparently recovers from driver failures. Moreover, it assumes the role of the failed driver during recovery. In this way, applications using the failed driver, as well as the kernel itself, continue to function as expected.We implemented shadow drivers for the Linux operating system and tested them on over a dozen device drivers. Our results show that applications and the OS can indeed survive the failure of a variety of device drivers. Moreover, shadow drivers impose minimal performance overhead. Lastly, they can be introduced with only modest changes to the OS kernel and with no changes at all to existing device drivers.Keywords
This publication has 6 references indexed in Scilit:
- Improving the reliability of commodity operating systemsACM Transactions on Computer Systems, 2005
- Making a Case for Efficient SupercomputingQueue, 2003
- How fail-stop are faulty programs?Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Software's invisible usersIEEE Software, 2001
- Hypervisor-based fault toleranceACM Transactions on Computer Systems, 1996
- Fault tolerance under UNIXACM Transactions on Computer Systems, 1989