Simple models of hardware and software fault tolerance

Abstract
This paper presents a quantitative analysis of three different architectural approaches to the integration of hardware and software fault tolerance. Using a common set of assumptions, and hypothetical parameter values, the authors compare the reliability of DRB (Distributed Recovery Blocks), NVP (N-version programming) and NSCP (N self-checking Programming). A combination of fault trees and Markov reward models is used to consider transient and permanent physical faults, and independent and related software faults. The fault tree models capture the combinations of software faults and hardware transients that can upset a single task computation. The structure states of the Markov reward process captures the longer term behavior of the system as it is reconfigured in response to permanent faults. In addition to a base case, several different scenarios are considered, including perfect specifications, independent versions, perfect decider and perfect coverage. For most cases, DRB is found to be the most reliable.

This publication has 13 references indexed in Scilit: