Abstract
We study various implementations of block Gaussian elimination on full matrices and examine their perfor mance on three parallel computers, the Alliant FX/80, the CRAY-2, and the IBM 3090-400/VF. These imple mentations are expressed in terms of Level 3 BLAS matrix-matrix kernels. We consider the use of parallel Level 3 BLAS kernels and compare the parallelism ob tained within the computational kernels with that ob tained when parallelizing over the kernels. We show that the use of parallel Level 3 BLAS allows portability without sacrifice of efficiency, even in a parallel envi ronment, and that high speeds can be obtained if tuned versions of the kernels are available.

This publication has 13 references indexed in Scilit: