The combined effectiveness of unimodular transformations, tiling, and software prefetching
- 23 December 2002
- proceedings article
- Published by Institute of Electrical and Electronics Engineers (IEEE)
Abstract
Unimodular transformations, tiling, and software prefetching are loop optimizations known to be effective in increasing parallelism, reducing cache miss rates, and eliminating processor stall time. Although these optimizations individually are quite effective, there is the expectation that even better improvements can be obtained by combining them together. In this paper we show that indeed this is the case when unimodular transformations are combined with either tiling or software prefetching. However, our results also show that although combining tiling with prefetching tends to improve the performance of tiling alone, it is also the case that in some situations tiling can degrade the cache performance of software prefetching. The reasons for this unexpected behavior are three fold: 1) tiling introduces interference misses inside the localized space which are difficult to characterize with current techniques based on locality analysis; 2) prefetch predicates are computed using only estimates on the amount of capacity misses, so the latency induced by cache interference is not completely covered; and 3) tiling limits the maximum amount of latency that can be masked with prefetching.Keywords
This publication has 18 references indexed in Scilit:
- On estimating and enhancing cache effectivenessPublished by Springer Nature ,2006
- MASA: a multithreaded processor architecture for parallel symbolic computingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- To copy or not to copyPublished by Association for Computing Machinery (ACM) ,1993
- A loop transformation theory and an algorithm to maximize parallelismIEEE Transactions on Parallel and Distributed Systems, 1991
- The cache performance and optimizations of blocked algorithmsPublished by Association for Computing Machinery (ACM) ,1991
- More iteration space tilingPublished by Association for Computing Machinery (ACM) ,1989
- Synchronization, coherence, and event ordering in multiprocessorsComputer, 1988
- Supernode partitioningPublished by Association for Computing Machinery (ACM) ,1988
- Data Coherence Problem in a Multicache SystemIEEE Transactions on Computers, 1985
- A New Solution to Coherence Problems in Multicache SystemsIEEE Transactions on Computers, 1978