Performance and accuracy of hardware-oriented native-, emulated- and mixed-precision solvers in FEM simulations

1 August 2007

journal article
research article
Published by Taylor & Francis in International Journal of Parallel, Emergent and Distributed Systems

Vol. 22 (4) , 221-256
https://doi.org/10.1080/17445760601122076

Abstract

In this survey paper, we compare native double precision solvers with emulated- and mixed-precision solvers of linear systems of equations as they typically arise in finite element discretisations. The emulation utilises two single float numbers to achieve higher precision, while the mixed precision iterative refinement computes residuals and updates the solution vector in double precision but solves the residual systems in single precision. Both techniques have been known since the 1960s, but little attention has been devoted to their performance aspects. Motivated by changing paradigms in processor technology and the emergence of highly-parallel devices with outstanding single float performance, we adapt the emulation and mixed precision techniques to coupled hardware configurations, where the parallel devices serve as scientific co-processors. The performance advantages are examined with respect to speedups over a native double precision implementation (time aspect) and reduced area requirements for a chip (space aspect). The paper begins with an overview of the theoretical background, algorithmic approaches and suitable hardware architectures. We then employ several conjugate gradient (CG) and multigrid solvers and study their behaviour for different parameter settings of the iterative refinement technique. Concrete speedup factors are evaluated on the coupled hardware configuration of a general-purpose CPU and a graphics processor. The dual performance aspect of potential area savings is assessed on a field programmable gate array (FPGA). In the last part, we test the applicability of the proposed mixed precision schemes with ill-conditioned matrices. We conclude that the mixed precision approach works very well with the parallel co-processors gaining speedup factors of four to five, and area savings of three to four, while maintaining the same accuracy as a reference solver executing everything in double precision.

Keywords

This publication has 26 references indexed in Scilit:

Error bounds from extra-precise iterative refinement
ACM Transactions on Mathematical Software, 2006
Scientific computation for simulations on programmable graphics hardware
Simulation Modelling Practice and Theory, 2005
Lightweight Floating-Point Arithmetic: Case Study of Inverse Discrete Cosine Transform
EURASIP Journal on Advances in Signal Processing, 2002
The Raw microprocessor: a computational fabric for software circuits and general-purpose programs
IEEE Micro, 2002
Reconfigurable computing
ACM Computing Surveys, 2002
Design, implementation and testing of extended and mixed precision BLAS
ACM Transactions on Mathematical Software, 2002
Efficient High Accuracy Solutions with ${\text{GMRES}}(m)$
SIAM Journal on Scientific and Statistical Computing, 1992
A floating-point technique for extending the available precision
Numerische Mathematik, 1971
Iterative refinement of the solution of a positive definite system of equations
Numerische Mathematik, 1966
Quasi double-precision in floating point addition
BIT Numerical Mathematics, 1965