Principles of fault tolerance

Abstract
The demand for continuously available electronic systems increases every day. Transaction processing, communications systems, and critical processes all require nonstop, fault tolerant operation. Creating a fault tolerant or highly available system can be achieved by following four basic principles: redundancy, fault isolation, fault detection and annunciation, and on-line repair. This paper is a tutorial that presents those four principles after reviewing some fundamentals of reliability and availability. It concludes with an expanded discussion on implementing redundancy. Special considerations for high availability and fault tolerance in distributed power systems are highlighted.

This publication has 5 references indexed in Scilit: