Cache-Efficient Multigrid Algorithms
- 1 February 2004
- journal article
- other
- Published by SAGE Publications in The International Journal of High Performance Computing Applications
- Vol. 18 (1) , 115-133
- https://doi.org/10.1177/1094342004041295
Abstract
Multigrid is widely used as an efficient solver for sparse linear systems arising from the discretization of elliptic boundary value problems. Linear relaxation methods such as Gauss–Seidel and Red–Black Gauss–Seidel form the principal computational component of multigrid, and thus affect its efficiency. In the context of multigrid, these iterative solvers are executed for a small number of iterations (2–8). We exploit this property of the algorithm to develop a cache-efficient multigrid method, by focusing on improving the memory behavior of the linear relaxation methods. The efficiency in our cache-efficient linear relaxation algorithm comes from two sources: reducing the number of data cache and TLB misses, and reducing the number of memory references by keeping values registerresident. Our optimizations are applicable to multigrid applied to linear systems arising from constant coefficient elliptic PDEs on structured grids. Experiments on five modern computing platforms show a performance improvement of 1.15–2.7 times over a standard implementation of Full Multigrid V-Cycle.Keywords
This publication has 12 references indexed in Scilit:
- Nonlinear array layouts for hierarchical memory systemsPublished by Association for Computing Machinery (ACM) ,1999
- Optimizing Transformations of Stencil Operations for Parallel Object-Oriented Scientific Frameworks on Cache-Based ArchitecturesPublished by Springer Nature ,1998
- Quantifying the multi-level nature of tiling interactionsPublished by Springer Nature ,1998
- Efficient Out-of-Core Algorithms for Linear Relaxation Using Blocking CoversJournal of Computer and System Sciences, 1997
- Performance analysis using the MIPS R10000 performance countersPublished by Association for Computing Machinery (ACM) ,1996
- Cache profiling and the SPEC benchmarks: a case studyComputer, 1994
- Fortran at ten gigaflopsPublished by Association for Computing Machinery (ACM) ,1991
- A data locality optimizing algorithmPublished by Association for Computing Machinery (ACM) ,1991
- The cache performance and optimizations of blocked algorithmsPublished by Association for Computing Machinery (ACM) ,1991
- Evaluating associativity in CPU cachesIEEE Transactions on Computers, 1989