Definition and analysis of hardware- and software-fault-tolerant architectures

Abstract
A structured definition of hardware- and software-fault-tolerant architectures is presented. Software-fault-tolerance methods are discussed, resulting in definitions for soft and solid faults. A soft software fault has a negligible likelihood or recurrence and is recoverable, whereas a solid software fault is recurrent under normal operations or cannot be recovered. A set of hardware- and software-fault-tolerant architectures is presented, and three of them are analyzed and evaluated. Architectures tolerating a single fault and architectures tolerating two consecutive faults are discussed separately. A sidebar addresses the cost issues related to software fault tolerance. The approach taken throughout is as general as possible, dealing with specific classes of faults or techniques only when necessary.

This publication has 6 references indexed in Scilit: