Implicit and explicit optimizations for stencil computations
- 22 October 2006
- conference paper
- Published by Association for Computing Machinery (ACM)
Abstract
Stencil-based kernels constitute the core of many scientific applications on block-structured grids. Unfortunately, these codes achieve a low fraction of peak performance, due primarily to the disparity between processor and main memory speeds. We examine several optimizations on both the conventional cache-based memory systems of the Itanium 2, Opteron, and Power5, as well as the heterogeneous multicore design of the Cell processor. The optimizations target cache reuse across stencil sweeps, including both an implicit cache oblivious approach and a cache-aware algorithm blocked to match the cache structure. Finally, we consider stencil computations on a machine with an explicitly-managed memory hierarchy, the Cell processor. Overall, results show that a cache-aware approach is significantly faster than a cache oblivious approach and that the explicitly managed memory on Cell is more efficient: Relative to the Power5, it has almost 2x more memory bandwidth and is 3.7x faster.Keywords
This publication has 6 references indexed in Scilit:
- The potential of the cell processor for scientific computingPublished by Association for Computing Machinery (ACM) ,2006
- Impact of modern memory subsystems on cache optimizations for stencil computationsPublished by Association for Computing Machinery (ACM) ,2005
- Adaptive mesh refinement for hyperbolic partial differential equationsPublished by Elsevier ,2004
- Cache-Efficient Multigrid AlgorithmsThe International Journal of High Performance Computing Applications, 2004
- Evaluation of Cache-based Superscalar and Cacheless Vector Architectures for Scientific ComputationsPublished by Association for Computing Machinery (ACM) ,2003
- New tiling techniques to improve cache temporal localityPublished by Association for Computing Machinery (ACM) ,1999