Instruction distribution heuristics for quad-cluster, dynamically-scheduled, superscalar processors

Abstract
We investigate instruction distribution methods for quad- cluster, dynamically-scheduled superscalar processors. We study a variety of methods with different cost, performance and complexity characteristics. We investigate both non-adaptive and adaptive methods and their sensitivity both to inter-cluster communication latencies and pipeline depth. Furthermore, we develop a set of models that allow us to identify how well each method attacks issue-bandwidth and inter-cluster communication restrictions. We find that a relatively simple method that changes clusters every other three instructions offers only a 17% performance slowdown compared to a non- clustered configuration operating at the same frequency. Moreover, we show that by utilizing adaptive methods it is possible to further reduce this gap down to about 14%. Furthermore, performance appears to be more sensitive to inter-cluster communication latencies rather than to pipeline depth. The best performing method offers a slowdown of about 24% when inter-cluster communication latency is two cycle. This gap is only 20% when two additional stages are introduced in the front-end pipeline.

This publication has 7 references indexed in Scilit: