High-performance implementation of the level-3 BLAS

Top Cited Papers

22 July 2008

journal article
research article
Published by Association for Computing Machinery (ACM) in ACM Transactions on Mathematical Software

Vol. 35 (1) , 1-14
https://doi.org/10.1145/1377603.1377607

Abstract

A simple but highly effective approach for transforming high-performance implementations on cache-based architectures of matrix-matrix multiplication into implementations of other commonly used matrix-matrix computations (the level-3 BLAS) is presented. Exceptional performance is demonstrated on various architectures.

Keywords

Funding Information

Division of Computing and Communication Foundations (CCF-0540926)

This publication has 5 references indexed in Scilit:

Anatomy of high-performance matrix multiplication
ACM Transactions on Mathematical Software, 2008
Recursive Blocked Algorithms and Hybrid Data Structures for Dense Matrix Library Software
SIAM Review, 2004
FLAME
ACM Transactions on Mathematical Software, 2001
GEMM-based level 3 BLAS
ACM Transactions on Mathematical Software, 1998
A set of level 3 basic linear algebra subprograms
ACM Transactions on Mathematical Software, 1990