Duplex: a reusable fault tolerance extension framework for network access devices
- 22 June 2004
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- p. 501-510
- https://doi.org/10.1109/dsn.2003.1209960
Abstract
A growing variety of edge network access devices appear on the marketplace that perform various functions which are meant to complement generic routers' capabilities, such as firewalling, intrusion detection, virus scanning, network ad- dress translation, traffic shaping and route optimization. Be- cause these edge network access devices are deployed on the critical path between a user site and its Internet ser- vice provider, high availability is crucial to their design. This paper describes the design, construction and evalua- tion of a general implementation framework for supporting fault tolerance on edge network devices. This implementa- tion framework, called Duplex, is designed to be indepen- dent of the functionality of the hosting edge network access device, such that only a minimal amount of programming is required to tailor this framework to a specific edge network access device implementation. Duplex can tolerate power failure, hardware failure, and software failure by support- ing device mirroring and watchdog timer-based link bypass- ing. Empirical performance measurements of an instance of Duplex that is embedded in a commercial bandwidth man- agement device show that the run-time overhead of its fault tolerance mechanisms is less than 1 msec 90% of the time, and the failure detection and recovery period is less than 1.3 sec when running at 100 Mbps.Keywords
This publication has 10 references indexed in Scilit:
- Recursive restartability: turning the reboot sledgehammer into a scalpelPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- Implementation and evaluation of transparent fault-tolerant Web service with kernel-level supportPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Highly available process support systems: implementing backup mechanismsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- A scalable and highly available web serverPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- A survey of rollback-recovery protocols in message-passing systemsACM Computing Surveys, 2002
- Cisco Hot Standby Router Protocol (HSRP)Published by RFC Editor ,1998
- Recovery in distributed systems using optimistic message logging and checkpointingJournal of Algorithms, 1990
- Fault tolerance under UNIXACM Transactions on Computer Systems, 1989
- Replication and fault-tolerance in the ISIS systemPublished by Association for Computing Machinery (ACM) ,1985
- A NonStop kernelPublished by Association for Computing Machinery (ACM) ,1981