Hardware Acceleration for Finite Element Electromagnetics: Efficient Sparse Matrix Floating-Point Computations with Field Programmable Gate Arrays

proceedings article
Published by Institute of Electrical and Electronics Engineers (IEEE)

Abstract

Custom hardware acceleration of electromagnetics computations leverages favorable industry trends, which indicate reconfigurable hardware devices such as field programmable gate arrays (FPGAs) may soon outperform general purpose CPUs. We present a new striping method for efficient sparse matrix-vector multiplication implemented in a deeply pipelined FPGA design. The effectiveness of the new method is illustrated for a representative set of finite element matrices. I. INTRODUCTION Fueled by continual CPU performance improvements, finite element (FE) practitioners perpetually strive to simulate increasingly complex electromagnetic systems. Solution via serial processing on current PCs can yield impractical run- times due to the large number of degrees of freedom involved. Various approaches, such as parallel processing, hold promise for overcoming this barrier. Relatively lower-cost alternatives recently gaining attention include solution acceleration via implementation in custom hardware such as FPGAs (1). However, to realize the full potential of such approaches, the underlying algorithms must be inherently parallelizable. Sparse matrix-vector multiplication (SMVM) is a kernel for many iterative numerical techniques, such as the conjugate gradient (CG) method, used to solve large, sparse linear systems arising in FE formulations. In fact, SMVM can be a dominant cost associated with obtaining FE solutions, which if implemented in custom hardware may lead to significant run-time reductions. However, sparse matrix storage schemes effective for software-based implementations are not 'regular' enough to allow for efficient parallel manipulation in FPGA- based designs. To overcome this, various so-called striping algorithms have been proposed. The purpose of this contribution is to introduce a new striping scheme for FE matrices that improves the parallel speed-up possible with a FPGA-based SMVM design we have implemented. The design comprises a pipeline of 8 processing elements (PEs). Each PE contains a cascade of deeply pipelined floating-point arithmetic units (FPUs). To increase processor efficiency and maintain the peak floating-point performance of the PE pipeline, the sparse matrix should be represented in the least number of stripes possible.

Keywords

This publication has 1 reference indexed in Scilit:

Design study of ultrahigh-speed microwave simulator engine
IEEE Transactions on Magnetics, 2002