A parallel block implementation of Level-3 BLAS for MIMD vector processors

1 June 1994

journal article
Published by Association for Computing Machinery (ACM) in ACM Transactions on Mathematical Software

Vol. 20 (2) , 178-193
https://doi.org/10.1145/178365.174413

Abstract

We describe an implementation of Level-3 BLAS (Basic Linear Algebra Subprograms) based on the use of the matrix-matrix multiplication kernel (GEMM). Blocking techniques are used to express the BLAS in terms of operations involving triangular blocks and calls to GEMM. A principal advantage of this approach is that most manufacturers provide at least an efficient serial version of GEMM so that our implementation can capture a significant percentage of the computer performance. A parameter which controls the blocking allows an efficient exploitation of the memory hierarchy of the various target computers. Furthermore, this blocked version of Level-3 BLAS is naturally parallel. We present results on the ALLIANT FX/80, the CONVEX C220, the CRAY-2, and the IBM 3090/VF. For GEMM, we always use the manufacturer-supplied versions. For the operations dealing with triangular blocks, we use assembler or tuned Fortran (using loop-unrolling) codes, depending on the efficiency of the available libraries.

Keywords

This publication has 11 references indexed in Scilit:

Linear algebra calculations on the BBN TC2000
Published by Springer Nature ,1992
Parallel Algorithms for Dense Linear Algebra Computations
SIAM Review, 1990
A set of level 3 basic linear algebra subprograms
ACM Transactions on Mathematical Software, 1990
Algorithm 679: A set of level 3 basic linear algebra subprograms: model implementation and test programs
ACM Transactions on Mathematical Software, 1990
Level 3 Blas in Lu Factorization On the Cray-2, Eta-10P, and Ibm 3090-200/Vf
The International Journal of Supercomputing Applications, 1989
An extended set of FORTRAN basic linear algebra subprograms
ACM Transactions on Mathematical Software, 1988
The Use of BLAS3 in Linear Algebra on a Parallel Processor with a Hierarchical Memory
SIAM Journal on Scientific and Statistical Computing, 1987
The WY Representation for Products of Householder Matrices
SIAM Journal on Scientific and Statistical Computing, 1987
Implementing Linear Algebra Algorithms for Dense Matrices on a Vector Pipeline Machine
SIAM Review, 1984
Basic Linear Algebra Subprograms for Fortran Usage
ACM Transactions on Mathematical Software, 1979