Effects of Ordering Strategies and Programming Paradigms on Sparse Matrix Computations
- 1 January 2002
- journal article
- Published by Society for Industrial & Applied Mathematics (SIAM) in SIAM Review
- Vol. 44 (3) , 373-393
- https://doi.org/10.1137/s00361445003820
Abstract
The conjugate gradient (CG) algorithm is perhaps the best-known iterative technique for solving sparse linear systems that are symmetric and positive definite. For systems that are ill conditioned, it is often necessary to use a preconditioning technique. In this paper, we investigate the effects of various ordering and partitioning strategies on the performance of parallel CG and ILU(0) preconditioned CG (PCG) using different programming paradigms and architectures. Results show that for this class of applications, ordering significantly improves overall performance on both distributed and distributed shared-memory systems, cache reuse may be more important than reducing communication, it is possible to achieve message-passing performance using shared-memory constructs through careful data ordering and distribution, and a hybrid MPI + OpenMP paradigm increases programming complexity with little performance gain. A multithreaded implementation of CG on the Cray MTA does not require special ordering or partitioning to obtain high efficiency and scalability, giving it a distinct advantage for adaptive applications; however, it shows limited scalability for PCG due to a lack of thread-level parallelism.Keywords
This publication has 10 references indexed in Scilit:
- A Comparison of Three Programming Models for Adaptive Applications on the Origin2000Journal of Parallel and Distributed Computing, 2002
- Parallelization of a dynamic unstructured algorithm using three leading programming paradigmsIEEE Transactions on Parallel and Distributed Systems, 2000
- Self-avoiding walks over adaptive unstructured gridsConcurrency: Practice and Experience, 2000
- Renumbering strategies for unstructured-grid solvers operating on shared-memory, cache-based parallel machinesComputer Methods in Applied Mechanics and Engineering, 1998
- PLUM : Parallel Load Balancing for Adaptive Unstructured MeshesJournal of Parallel and Distributed Computing, 1998
- A Fast and High Quality Multilevel Scheme for Partitioning Irregular GraphsSIAM Journal on Scientific Computing, 1998
- Renumbering unstructured grids to improve the performance of codes on hierarchical memory machinesAdvances in Engineering Software, 1997
- Dynamic partitioning of non-uniform structured workloads with spacefilling curvesIEEE Transactions on Parallel and Distributed Systems, 1996
- Templates for the Solution of Linear Systems: Building Blocks for Iterative MethodsPublished by Society for Industrial & Applied Mathematics (SIAM) ,1994
- A Parallel Graph Coloring HeuristicSIAM Journal on Scientific Computing, 1993