An Optimal Approach to Fault Tolerant Software Systems Design

Abstract
A systematic method of providing software system fault recovery with maximal fault coverage subject to resource constraints of overall recovery cost and additional fault rate is presented. This method is based on a model for software systems which provides a measure of the fault coverage properties of the system in the presence of computer hardware faults. Techniques for system parameter measurements are given. An optimization problem results which is a doubly-constrained 0,1 Knapsack problem. Quantitative results are presented demonstrating the effectiveness of the approach.