Matrix multiplication

Abstract

Modern machines present two challenges to algorithm engineers and compiler writers: They have superscalar, super-pipelined structure, and they have elaborate memory subsystems specifically designed to reduce latency and increase bandwidth. Matrix multiplication is a classical benchmark for experimenting with techniques used to exploit machine architecture and to overcome the limitations of contemporary memory subsystems.This research aims at advancing the state of the art of algorithm engineering by balancing instruction level parallelism, two levels of data tiling, copying to provably avoid any cache conflicts, and prefetching in parallel to computational operations, in order to fully exploit the memory bandwidth. Measurements on IBM's RS/6000 43P workstation show that the resultant matrix multiplication algorithm outperforms IBM's ESSL by 6.8-31.8%, is less sensitive to the size of the input data, and scales better.In this paper we introduce a cache aware algorithm for matrix multiplication. We also suggest generic guidelines that may be applied to compute intensive algorithm to efficiently utilize the data cache. We believe that some of our concepts may be embodied in compilers.

Keywords

This publication has 8 references indexed in Scilit:

Data prefetching and multilevel blocking for linear algebra operations
Published by Association for Computing Machinery (ACM) ,1996
Reducing cache conflicts in data cache prefetching
ACM SIGARCH Computer Architecture News, 1994
Improving performance of linear algebra algorithms for dense matrices, using algorithmic prefetch
IBM Journal of Research and Development, 1994
To copy or not to copy
Published by Association for Computing Machinery (ACM) ,1993
Design and evaluation of a compiler algorithm for prefetching
Published by Association for Computing Machinery (ACM) ,1992
The cache performance and optimizations of blocked algorithms
Published by Association for Computing Machinery (ACM) ,1991
Software prefetching
Published by Association for Computing Machinery (ACM) ,1991
Software pipelining: an effective scheduling technique for VLIW machines
Published by Association for Computing Machinery (ACM) ,1988