Effects of Ordering Strategies and Programming Paradigms on Sparse Matrix Computations

1 January 2002

journal article
Published by Society for Industrial & Applied Mathematics (SIAM) in SIAM Review

Vol. 44 (3) , 373-393
https://doi.org/10.1137/s00361445003820

Abstract

The conjugate gradient (CG) algorithm is perhaps the best-known iterative technique for solving sparse linear systems that are symmetric and positive definite. For systems that are ill conditioned, it is often necessary to use a preconditioning technique. In this paper, we investigate the effects of various ordering and partitioning strategies on the performance of parallel CG and ILU(0) preconditioned CG (PCG) using different programming paradigms and architectures. Results show that for this class of applications, ordering significantly improves overall performance on both distributed and distributed shared-memory systems, cache reuse may be more important than reducing communication, it is possible to achieve message-passing performance using shared-memory constructs through careful data ordering and distribution, and a hybrid MPI + OpenMP paradigm increases programming complexity with little performance gain. A multithreaded implementation of CG on the Cray MTA does not require special ordering or partitioning to obtain high efficiency and scalability, giving it a distinct advantage for adaptive applications; however, it shows limited scalability for PCG due to a lack of thread-level parallelism.

Keywords

This publication has 10 references indexed in Scilit:

A Comparison of Three Programming Models for Adaptive Applications on the Origin2000
Journal of Parallel and Distributed Computing, 2002
Parallelization of a dynamic unstructured algorithm using three leading programming paradigms
IEEE Transactions on Parallel and Distributed Systems, 2000
Self-avoiding walks over adaptive unstructured grids
Concurrency: Practice and Experience, 2000
Renumbering strategies for unstructured-grid solvers operating on shared-memory, cache-based parallel machines
Computer Methods in Applied Mechanics and Engineering, 1998
PLUM : Parallel Load Balancing for Adaptive Unstructured Meshes
Journal of Parallel and Distributed Computing, 1998
A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs
SIAM Journal on Scientific Computing, 1998
Renumbering unstructured grids to improve the performance of codes on hierarchical memory machines
Advances in Engineering Software, 1997
Dynamic partitioning of non-uniform structured workloads with spacefilling curves
IEEE Transactions on Parallel and Distributed Systems, 1996
Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods
Published by Society for Industrial & Applied Mathematics (SIAM) ,1994
A Parallel Graph Coloring Heuristic
SIAM Journal on Scientific Computing, 1993