Anatomy of high-performance matrix multiplication

Top Cited Papers

16 May 2008

journal article
research article
Published by Association for Computing Machinery (ACM) in ACM Transactions on Mathematical Software

Vol. 34 (3) , 1-25
https://doi.org/10.1145/1356052.1356053

Abstract

We present the basic principles that underlie the high-performance implementation of the matrix-matrix multiplication that is part of the widely used GotoBLAS library. Design decisions are justified by successively refining a model of architectures with multilevel memories. A simple but effective algorithm for executing this operation results. Implementations on a broad selection of architectures are shown to achieve near-peak performance.

Keywords

Funding Information

Advanced Cyberinfrastructure (ACI-0305163CCF-0342369CCF-0540926)
Lawrence Livermore National Laboratory, Office of Science (B546489)
Division of Computing and Communication Foundations (ACI-0305163CCF-0342369CCF-0540926)

This publication has 12 references indexed in Scilit:

Families of algorithms related to the inversion of a Symmetric Positive Definite matrix
ACM Transactions on Mathematical Software, 2008
A Family of High-Performance Matrix Multiplication Algorithms
Published by Springer Nature ,2006
Extracting SMP parallelism for dense linear algebra algorithms from high-level specifications
Published by Association for Computing Machinery (ACM) ,2005
Representing linear algebra algorithms in code: the FLAME application program interfaces
ACM Transactions on Mathematical Software, 2005
The science of deriving dense linear algebra algorithms
ACM Transactions on Mathematical Software, 2005
FLAME
ACM Transactions on Mathematical Software, 2001
A Note On Parallel Matrix Inversion
SIAM Journal on Scientific Computing, 2001
New trends in high performance computing
Parallel Computing, 2001
GEMM-based level 3 BLAS
ACM Transactions on Mathematical Software, 1998
A set of level 3 basic linear algebra subprograms
ACM Transactions on Mathematical Software, 1990