Pumma: Parallel universal matrix multiplication algorithms on distributed memory concurrent computers
- 1 October 1994
- journal article
- research article
- Published by Wiley in Concurrency: Practice and Experience
- Vol. 6 (7) , 543-570
- https://doi.org/10.1002/cpe.4330060702
Abstract
The paper describes Parallel Universal Matrix Multiplication Algorithms (PUMMA) on distributed memory concurrent computers. The PUMMA package includes not only the non‐transposed matrix multiplication routine C = A ⋅ B, but also transposed multiplication routines C = AT ⋅ B, C = A ⋅ BT, and C = AT ⋅ BT, for a block cyclic data distribution. The routines perform efficiently for a wide range of processor configurations and block sizes. The PUMMA together provide the same functionality as the Level 3 BLAS routine xGEMM. Details of the parallel implementation of the routines are given, and results are presented for runs on the Intel Touchstone Delta computer.Keywords
This publication has 9 references indexed in Scilit:
- Basic Linear Algebra Comrnunication SubprogramsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- Basic Matrix Subprograms for Distributed Memory SystemsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- A matrix product algorithm and its comparative performance on hypercubesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- A look at scalable dense linear algebra librariesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- The Multicomputer Toolbox approach to concurrent BLAS and LACSPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Parallel matrix transpose algorithms on distributed memory concurrent computersPublished by Office of Scientific and Technical Information (OSTI) ,1993
- A proposal for a user-level, message passing interface in a distributed memory environmentPublished by Office of Scientific and Technical Information (OSTI) ,1993
- A set of level 3 basic linear algebra subprogramsACM Transactions on Mathematical Software, 1990
- Matrix algorithms on a hypercube I: Matrix multiplicationParallel Computing, 1987