Implicit and explicit optimizations for stencil computations

22 October 2006

conference paper
Published by Association for Computing Machinery (ACM)

p. 51-60
https://doi.org/10.1145/1178597.1178605

Abstract

Stencil-based kernels constitute the core of many scientific applications on block-structured grids. Unfortunately, these codes achieve a low fraction of peak performance, due primarily to the disparity between processor and main memory speeds. We examine several optimizations on both the conventional cache-based memory systems of the Itanium 2, Opteron, and Power5, as well as the heterogeneous multicore design of the Cell processor. The optimizations target cache reuse across stencil sweeps, including both an implicit cache oblivious approach and a cache-aware algorithm blocked to match the cache structure. Finally, we consider stencil computations on a machine with an explicitly-managed memory hierarchy, the Cell processor. Overall, results show that a cache-aware approach is significantly faster than a cache oblivious approach and that the explicitly managed memory on Cell is more efficient: Relative to the Power5, it has almost 2x more memory bandwidth and is 3.7x faster.

Keywords

This publication has 6 references indexed in Scilit:

The potential of the cell processor for scientific computing
Published by Association for Computing Machinery (ACM) ,2006
Impact of modern memory subsystems on cache optimizations for stencil computations
Published by Association for Computing Machinery (ACM) ,2005
Adaptive mesh refinement for hyperbolic partial differential equations
Published by Elsevier ,2004
Cache-Efficient Multigrid Algorithms
The International Journal of High Performance Computing Applications, 2004
Evaluation of Cache-based Superscalar and Cacheless Vector Architectures for Scientific Computations
Published by Association for Computing Machinery (ACM) ,2003
New tiling techniques to improve cache temporal locality
Published by Association for Computing Machinery (ACM) ,1999