Cache-oblivious algorithms
Top Cited Papers
- 20 January 2003
- proceedings article
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- Vol. 3, 285-297
- https://doi.org/10.1109/sffcs.1999.814600
Abstract
This paper presents asymptotically optimal algo- rithms for rectangular matrix transpose, FFT, and sorting o n computers with multiple levels of caching. Unlike previous optimal algorithms, these algorithms are cache oblivious: no variables dependent on hardware parameters, such as cache size and cache-line length, need to be tuned to achieve opti- mality. Nevertheless, these algorithms use an optimal amount of work and move data optimally among multiple levels of cache. For a cache with size Z and cache-line length L where Z Ω L2 the number of cache misses for an m n ma- trix transpose is Θ 1 mn L. The number of cache misses for either an n-point FFT or the sorting of n numbers is Θ 1 n L 1 logZ n . We also give an Θ mnp -work al- gorithm to multiply an m n matrix by an n p matrix that incurs Θ 1 mn np mp L mnp L Z cache faults. We introduce an "ideal-cache" model to analyze our algo- rithms. We prove that an optimal cache-oblivious algorithm designed for two levels of memory is also optimal for multi- ple levels and that the assumption of optimal replacement in the ideal-cache model can be simulated efficiently by LRU re- placement. We also provide preliminary empirical results on the effectiveness of cache-oblivious algorithms in practi ce.Keywords
This publication has 27 references indexed in Scilit:
- Towards an optimal bit-reversal permutation programPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Recursive array layouts and fast parallel matrix multiplicationPublished by Association for Computing Machinery (ACM) ,1999
- Nonlinear array layouts for hierarchical memory systemsPublished by Association for Computing Machinery (ACM) ,1999
- Locality of Reference in LU Decomposition with Partial PivotingSIAM Journal on Matrix Analysis and Applications, 1997
- Algorithms for parallel memory, II: Hierarchical multilevel memoriesAlgorithmica, 1994
- Algorithms for parallel memory, I: Two-level memoriesAlgorithmica, 1994
- Large-Scale Sorting in Uniform Memory HierarchiesJournal of Parallel and Distributed Computing, 1993
- Amortized efficiency of list update and paging rulesCommunications of the ACM, 1985
- Gaussian elimination is not optimalNumerische Mathematik, 1969
- An algorithm for the machine calculation of complex Fourier seriesMathematics of Computation, 1965