High-performance implementation of the level-3 BLAS
Top Cited Papers
- 22 July 2008
- journal article
- research article
- Published by Association for Computing Machinery (ACM) in ACM Transactions on Mathematical Software
- Vol. 35 (1) , 1-14
- https://doi.org/10.1145/1377603.1377607
Abstract
A simple but highly effective approach for transforming high-performance implementations on cache-based architectures of matrix-matrix multiplication into implementations of other commonly used matrix-matrix computations (the level-3 BLAS) is presented. Exceptional performance is demonstrated on various architectures.Keywords
Funding Information
- Division of Computing and Communication Foundations (CCF-0540926)
This publication has 5 references indexed in Scilit:
- Anatomy of high-performance matrix multiplicationACM Transactions on Mathematical Software, 2008
- Recursive Blocked Algorithms and Hybrid Data Structures for Dense Matrix Library SoftwareSIAM Review, 2004
- FLAMEACM Transactions on Mathematical Software, 2001
- GEMM-based level 3 BLASACM Transactions on Mathematical Software, 1998
- A set of level 3 basic linear algebra subprogramsACM Transactions on Mathematical Software, 1990