Abstract
To achieve high performance on a single process, superscalar processors now rely on very complex out-of-order execution. Using more and more speculative execution (e.g. value prediction) will be needed for further improvements. On the other hand, most operating systems now offer time-shared multiprocess environments. For the moment most of the time is spent in a single thread, but this should change, as the computer will perform more and more independent tasks. Moreover, desktop applications tend to be multithreaded. A lot of users should then be more concerned with the performance throughput on the workload than with the performance of the processor on a single process. Simultaneous multithreading (SMT) is a promising approach to deliver high throughput from superscalar pipelines. In this paper, we show that when executing 4 threads on an SMT processor, out-of-order execution induces small performance benefits over in-order execution. Then, for application domains where performance throughput is more important than ultimate performance on a single application, SMT combined with in-order execution may be a more cost-effective alternative than ultimate aggressive out-of-order superscalar processors or out-of-order execution SMT.

This publication has 3 references indexed in Scilit: