Level 3 Blas in Lu Factorization On the Cray-2, Eta-10P, and Ibm 3090-200/Vf

Abstract
We study various implementations of block Gaussian elimination on full matrices and examine their perfor mance on three vector supercomputers, the CRAY-2, the ETA-10P, and the IBM 3090-200/VF. We show that the use of Level 3 BLAS kernels allows portability without sacrifice of efficiency and that good speeds can be ob tained if tuned versions of the kernels are available. In deed our results show that without using any assembler language outside the kernels we can approach the per formance of assembler-coded routines on all machines.