Concurrent number cruncher: a GPU implementation of a general sparse linear solver
- 2 June 2009
- journal article
- research article
- Published by Taylor & Francis in International Journal of Parallel, Emergent and Distributed Systems
- Vol. 24 (3) , 205-223
- https://doi.org/10.1080/17445760802337010
Abstract
A wide class of numerical methods needs to solve a linear system, where the matrix pattern of non-zero coefficients can be arbitrary. These problems can greatly benefit from highly multithreaded computational power and large memory bandwidth available on graphics processor units (GPUs), especially since dedicated general purpose APIs such as close-to-metal (CTM) (AMD–ATI) and compute unified device architecture (CUDA) (NVIDIA) have appeared. CUDA even provides a BLAS implementation, but only for dense matrices (CuBLAS). Other existing linear solvers for the GPU are also limited by their internal matrix representation. This paper describes how to combine recent GPU programming techniques and new GPU dedicated APIs with high performance computing strategies (namely block compressed row storage (BCRS), register blocking and vectorization), to implement a sparse general-purpose linear solver. Our implementation of the Jacobi-preconditioned conjugate gradient algorithm outperforms by up to a factor of 6.0 × leading-edge CPU counterparts, making it attractive for applications which are content with single precision.Keywords
This publication has 12 references indexed in Scilit:
- Performance and accuracy of hardware-oriented native-, emulated- and mixed-precision solvers in FEM simulationsInternational Journal of Parallel, Emergent and Distributed Systems, 2007
- Concurrent Number Cruncher: An Efficient Sparse Linear Solver on the GPUPublished by Springer Nature ,2007
- Surface Parameterization: a Tutorial and SurveyPublished by Springer Nature ,2005
- Brook for GPUsACM Transactions on Graphics, 2004
- Linear algebra operators for GPU implementation of numerical algorithmsACM Transactions on Graphics, 2003
- Sparse matrix solvers on the GPUACM Transactions on Graphics, 2003
- Discrete smooth interpolation in geometric modellingPublished by Elsevier ,2003
- Least squares conformal maps for automatic texture atlas generationPublished by Association for Computing Machinery (ACM) ,2002
- Templates for the Solution of Linear Systems: Building Blocks for Iterative MethodsPublished by Society for Industrial & Applied Mathematics (SIAM) ,1994
- Methods of conjugate gradients for solving linear systemsJournal of Research of the National Bureau of Standards, 1952