Abstract
The design of programs that are tolerant of hardware fault occurrences and processor crashes is investigated. Using a stable storage management system as a running example, a new approach is suggested for specifying, understanding, and verifying the correctness of fault-tolerant software. The approach extends previously developed axiomatic reasoning methods to the design of fault-tolerant systems by modeling faults as being operations that are performed at random time intervals on any computing system by the system's adverse environment.