Cache-oblivious algorithms

Abstract
This paper presents asymptotically optimal algo- rithms for rectangular matrix transpose, FFT, and sorting o n computers with multiple levels of caching. Unlike previous optimal algorithms, these algorithms are cache oblivious: no variables dependent on hardware parameters, such as cache size and cache-line length, need to be tuned to achieve opti- mality. Nevertheless, these algorithms use an optimal amount of work and move data optimally among multiple levels of cache. For a cache with size Z and cache-line length L where Z Ω L2 the number of cache misses for an m n ma- trix transpose is Θ 1 mn L. The number of cache misses for either an n-point FFT or the sorting of n numbers is Θ 1 n L 1 logZ n . We also give an Θ mnp -work al- gorithm to multiply an m n matrix by an n p matrix that incurs Θ 1 mn np mp L mnp L Z cache faults. We introduce an "ideal-cache" model to analyze our algo- rithms. We prove that an optimal cache-oblivious algorithm designed for two levels of memory is also optimal for multi- ple levels and that the assumption of optimal replacement in the ideal-cache model can be simulated efficiently by LRU re- placement. We also provide preliminary empirical results on the effectiveness of cache-oblivious algorithms in practi ce.

This publication has 27 references indexed in Scilit: