Abstract
Multigrid methods combine a number of standard sparse matrix techniques. Usual implementations separate the individual components (e.g., an iterative methods, residual computation, and interpolation between grids) into nicely structured routines. However, many computers today employ quite sophisticated and potentially large caches whose correct use are instrumental in gaining much of the peak performance of the processors. We investigate when it makes sense to combine several of the multigrid components into one, using block oriented algorithms. We determine how large (or small) the blocks must be in order for the data in the block to just fit into the processor's primary cache. By re-using the data in cache several times, a potential savings in run time can be predicted. This is analyzed for a set of examples.

This publication has 6 references indexed in Scilit: