Simple models of hardware and software fault tolerance

17 December 2002

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

Vol. se 13, 124-129
https://doi.org/10.1109/rams.1994.291094

Abstract

This paper presents a quantitative analysis of three different architectural approaches to the integration of hardware and software fault tolerance. Using a common set of assumptions, and hypothetical parameter values, the authors compare the reliability of DRB (Distributed Recovery Blocks), NVP (N-version programming) and NSCP (N self-checking Programming). A combination of fault trees and Markov reward models is used to consider transient and permanent physical faults, and independent and related software faults. The fault tree models capture the combinations of software faults and hardware transients that can upset a single task computation. The structure states of the Markov reward process captures the longer term behavior of the system as it is reconfigured in response to permanent faults. In addition to a base case, several different scenarios are considered, including perfect specifications, independent versions, perfect decider and perfect coverage. For most cases, DRB is found to be the most reliable.

Keywords

This publication has 13 references indexed in Scilit:

Hardware and software fault tolerance: a unified architectural approach
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2003
X-ware reliability and availability modeling
IEEE Transactions on Software Engineering, 1992
Reliability estimation of fault-tolerant systems: tools and techniques
Computer, 1990
Distributed execution of recovery blocks: an approach for uniform treatment of hardware and software faults in real-time applications
IEEE Transactions on Computers, 1989
Survey of software tools for evaluating reliability, availability, and serviceability
ACM Computing Surveys, 1988
Reliability Modeling Using SHARPE
IEEE Transactions on Reliability, 1987
Fault-Tolerant SoFtware Reliability Modeling
IEEE Transactions on Software Engineering, 1987
Evaluation of Error Recovery Blocks Used for Cooperating Processes
IEEE Transactions on Software Engineering, 1984
Dependability Evaluation of Software Systems in Operation
IEEE Transactions on Software Engineering, 1984
System structure for software fault tolerance
IEEE Transactions on Software Engineering, 1975