Detailed design and evaluation of redundant multithreading alternatives
Top Cited Papers
- 1 May 2002
- journal article
- Published by Association for Computing Machinery (ACM) in ACM SIGARCH Computer Architecture News
- Vol. 30 (2) , 99-110
- https://doi.org/10.1145/545214.545227
Abstract
Exponential growth in the number of on-chip transistors, coupled with reductions in voltage levels, makes each generation of microprocessors increasingly vulnerable to transient faults. In a multithreaded environment, we can detect these faults by running two copies of the same program as separate threads, feeding them identical inputs, and comparing their outputs, a technique we call Redundant Multithreading (RMT).This paper studies RMT techniques in the context of both single- and dual-processor simultaneous multithreaded (SMT) single-chip devices. Using a detailed, commercial-grade, SMT processor design we uncover subtle RMT implementation complexities, and find that RMT can be a more significant burden for single-processor devices than prior studies indicate. However, a novel application of RMT techniques in a dual-processor device, which we term chip-level redundant threading (CRT), shows higher performance than lockstepping the two cores, especially on multithreaded workloads.Keywords
This publication has 14 references indexed in Scilit:
- A study of time-redundant fault tolerance techniques for high-performance pipelined computersPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- The Alpha 21364 network architectureIEEE Micro, 2002
- Asim: a performance model frameworkComputer, 2002
- Slipstream processorsPublished by Association for Computing Machinery (ACM) ,2000
- Transient fault detection via simultaneous multithreadingPublished by Association for Computing Machinery (ACM) ,2000
- IBM's S/390 G5 microprocessor designIEEE Micro, 1999
- Exploiting choicePublished by Association for Computing Machinery (ACM) ,1996
- Simultaneous multithreadingPublished by Association for Computing Machinery (ACM) ,1995
- Instruction issue logic for high-performance, interruptible, multiple functional unit, pipelined computersIEEE Transactions on Computers, 1990
- Concurrent error detection using watchdog processors-a surveyIEEE Transactions on Computers, 1988