Abstract
The Sequoia computer is a tightly coupled multiprocessor that avoids most of the fault-tolerance disadvantages of tight coupling by using a fault-tolerant hardware-design approach. An overview is give of how the hardware architecture and operating system (OS) work together to provide a high degree of fault tolerance with good system performance. A description of hardware is followed by a discussion of the multiprocessor synchronization problem. Kernel support for fault recovery and the recovery process itself are examined. It is shown the kernel, through a combination of locking, shadowed memory, and controlled flushing of non-write-through cache, maintains a consistent main memory state recoverable from any single-point failure. The user shared memory is also discussed.

This publication has 6 references indexed in Scilit: