Hardware for speculative parallelization of partially-parallel loops in DSM multiprocessors

Abstract
Recently, we introduced a novel framework for speculative parallelization in hardware (Y. Zhang et al., 1998). The scheme is based on a software based run time parallelization scheme that we proposed earlier (L. Rauchwerger and D. Padue, 1995). The idea is to execute the code (loops) speculatively in parallel. As parallel execution proceeds, extra hardware added to the directory based cache coherence of the DSM machine detects if there is a dependence violation. If such a violation occurs, execution is interrupted, the state is rolled back in software to the most recent safe state, and the code is re-executed serially from that point. The safe state is typically established at the beginning of the loop. Such a scheme is somewhat related to speculative parallelization inside a multiprocessor chip, which also relies on extending the cache coherence protocol to detect dependence violations. Our scheme, however, is targeted to large scale DSM parallelism. In addition, it does not have some of the limitations of the proposed chip-multiprocessor schemes. Such limitations include the need to bound the size of the speculative state to fit in a buffer or L1 cache, and a strict in-order task commit policy that may result in load imbalance among processors. Unfortunately, our scheme has higher recovery costs if a dependence violation is detected, because execution has to backtrack to a safe state that is usually the beginning of the loop. Therefore, the aim of the paper is to extend our previous hardware scheme to effectively handle codes (loops) with a modest number of cross-iteration dependences.
Keywords

This publication has 5 references indexed in Scilit: