Error recovery in critical infrastructure systems
- 28 November 2002
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
Abstract
Critical infrastructure applications provide services upon which society depends heavily; such applications require survivability in the face of faults that might cause a loss of service. These applications are themselves dependent on distributed information systems for all aspects of their operation and so survivability of the information systems is an important issue. Fault tolerance is a key mechanism by which survivability can be achieved in these information systems. Much of the literature on fault-tolerant distributed systems focuses on local error recovery by masking the effects of faults. We describe a direction for error recovery in the face of catastrophic faults, where the effects of the faults cannot be masked using available resources. The goal is to provide continued service that is either an alternate or degraded service by reconfiguring the system rather than masking faults. We outline the requirements for a reconfigurable system architecture and present an error recovery system that enables systematic structuring of error recovery specifications and implementations Author(s) Knight, J.C. Dept. of Comput. Sci., Virginia Univ., Charlottesville, VA, USA Elder, M.C. ; Xing DuKeywords
This publication has 31 references indexed in Scilit:
- Integrating security in a group oriented distributed systemPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Surviving network partitioningComputer, 1998
- Piranha: a CORBA tool for high availabilityComputer, 1997
- Specifying distributed software architecturesPublished by Springer Nature ,1995
- Correctness and composition of software architecturesACM SIGSOFT Software Engineering Notes, 1994
- A security architecture for fault-tolerant systemsACM Transactions on Computer Systems, 1994
- Regis: a constructive development environment for distributed programsDistributed Systems Engineering, 1994
- Beyond definition/useACM SIGPLAN Notices, 1994
- The POLYLITH software busACM Transactions on Programming Languages and Systems, 1994
- An environment for developing fault-tolerant softwareIEEE Transactions on Software Engineering, 1991