Floating-Point Matrix Multiplication in a Polymorphic Processor

1 December 2007

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

p. 249-252
https://doi.org/10.1109/fpt.2007.4439258

Abstract

We consider 64-bit floating-point matrix multiplication in the context of polymorphic processor architectures. Our proposal provides a complete and performance efficient solution of the matrix multiplication problem, including hardware design and software interface. We adopt previous ideas1, originally proposed for loosely coupled processors and message passing communications. We employ these ideas into a tightly coupled custom computing unit (CCU) in the Molen polymorphic processor. Furthermore, we introduce a controller, which facilitates the efficient operation of the multiplier processing elements (PEs) in a polymorphic environment. The design is evaluated theoretically and through real hardware experiments. More precisely, we fit 9 processing elements in an XC2VP30-6 device; this configuration suggests theoretical peak performance of 1.80 GFLOPS. In practice, we measured sustained performance of up to 1.79 GFLOPS for the matrix multiplication on real hardware, including the software overhead. Theoretical analysis and experimental results suggest that the design efficiency scales better for large problem sizes.

Keywords

This publication has 5 references indexed in Scilit:

Scalable and Modular Algorithms for Floating-Point Matrix Multiplication on Reconfigurable Computing Systems
IEEE Transactions on Parallel and Distributed Systems, 2007
64-bit floating-point FPGA matrix multiplication
Published by Association for Computing Machinery (ACM) ,2005
The MOLEN polymorphic processor
IEEE Transactions on Computers, 2004
GEMM-based level 3 BLAS
ACM Transactions on Mathematical Software, 1998
A set of level 3 basic linear algebra subprograms
ACM Transactions on Mathematical Software, 1990