Pumma: Parallel universal matrix multiplication algorithms on distributed memory concurrent computers

1 October 1994

journal article
research article
Published by Wiley in Concurrency: Practice and Experience

Vol. 6 (7) , 543-570
https://doi.org/10.1002/cpe.4330060702

Abstract

The paper describes Parallel Universal Matrix Multiplication Algorithms (PUMMA) on distributed memory concurrent computers. The PUMMA package includes not only the non‐transposed matrix multiplication routine C = A ⋅ B, but also transposed multiplication routines C = A^T ⋅ B, C = A ⋅ B^T, and C = A^T ⋅ B^T, for a block cyclic data distribution. The routines perform efficiently for a wide range of processor configurations and block sizes. The PUMMA together provide the same functionality as the Level 3 BLAS routine xGEMM. Details of the parallel implementation of the routines are given, and results are presented for runs on the Intel Touchstone Delta computer.

Keywords

This publication has 9 references indexed in Scilit:

Basic Linear Algebra Comrnunication Subprograms
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2005
Basic Matrix Subprograms for Distributed Memory Systems
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2005
A matrix product algorithm and its comparative performance on hypercubes
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2003
A look at scalable dense linear algebra libraries
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2003
The Multicomputer Toolbox approach to concurrent BLAS and LACS
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2003
Parallel matrix transpose algorithms on distributed memory concurrent computers
Published by Office of Scientific and Technical Information (OSTI) ,1993
A proposal for a user-level, message passing interface in a distributed memory environment
Published by Office of Scientific and Technical Information (OSTI) ,1993
A set of level 3 basic linear algebra subprograms
ACM Transactions on Mathematical Software, 1990
Matrix algorithms on a hypercube I: Matrix multiplication
Parallel Computing, 1987