A parallel block implementation of Level-3 BLAS for MIMD vector processors
- 1 June 1994
- journal article
- Published by Association for Computing Machinery (ACM) in ACM Transactions on Mathematical Software
- Vol. 20 (2) , 178-193
- https://doi.org/10.1145/178365.174413
Abstract
We describe an implementation of Level-3 BLAS (Basic Linear Algebra Subprograms) based on the use of the matrix-matrix multiplication kernel (GEMM). Blocking techniques are used to express the BLAS in terms of operations involving triangular blocks and calls to GEMM. A principal advantage of this approach is that most manufacturers provide at least an efficient serial version of GEMM so that our implementation can capture a significant percentage of the computer performance. A parameter which controls the blocking allows an efficient exploitation of the memory hierarchy of the various target computers. Furthermore, this blocked version of Level-3 BLAS is naturally parallel. We present results on the ALLIANT FX/80, the CONVEX C220, the CRAY-2, and the IBM 3090/VF. For GEMM, we always use the manufacturer-supplied versions. For the operations dealing with triangular blocks, we use assembler or tuned Fortran (using loop-unrolling) codes, depending on the efficiency of the available libraries.Keywords
This publication has 11 references indexed in Scilit:
- Linear algebra calculations on the BBN TC2000Published by Springer Nature ,1992
- Parallel Algorithms for Dense Linear Algebra ComputationsSIAM Review, 1990
- A set of level 3 basic linear algebra subprogramsACM Transactions on Mathematical Software, 1990
- Algorithm 679: A set of level 3 basic linear algebra subprograms: model implementation and test programsACM Transactions on Mathematical Software, 1990
- Level 3 Blas in Lu Factorization On the Cray-2, Eta-10P, and Ibm 3090-200/VfThe International Journal of Supercomputing Applications, 1989
- An extended set of FORTRAN basic linear algebra subprogramsACM Transactions on Mathematical Software, 1988
- The Use of BLAS3 in Linear Algebra on a Parallel Processor with a Hierarchical MemorySIAM Journal on Scientific and Statistical Computing, 1987
- The WY Representation for Products of Householder MatricesSIAM Journal on Scientific and Statistical Computing, 1987
- Implementing Linear Algebra Algorithms for Dense Matrices on a Vector Pipeline MachineSIAM Review, 1984
- Basic Linear Algebra Subprograms for Fortran UsageACM Transactions on Mathematical Software, 1979