Efficient sparse LU factorization with partial pivoting on distributed memory architectures

1 January 1998

journal article
Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Parallel and Distributed Systems

Vol. 9 (2) , 109-125
https://doi.org/10.1109/71.663864

Abstract

A sparse LU factorization based on Gaussian elimination with partial pivoting (GEPP) is important to many scientific applications, but it is still an open problem to develop a high performance GEPP code on distributed memory machines. The main difficulty is that partial pivoting operations dynamically change computation and nonzero fill-in structures during the elimination process. This paper presents an approach called S* for parallelizing this problem on distributed memory machines. The S* approach adopts static symbolic factorization to avoid run-time control overhead, incorporates 2D L/U supemode partitioning and amalgamation strategies to improve caching performance, and exploits irregular task parallelism embedded in sparse LU using asynchronous computation scheduling. The paper discusses and compares the algorithms using 1D and 2D data mapping schemes, and presents experimental studies on Cray-T3D and T3E. The performance results for a set of nonsymmetric benchmark matrices are very encouraging, and S* has achieved up to 6.878 GFLOPS on 128 T3E nodes. To the best of our knowledge, this is the highest performance ever achieved for this challenging problem and the previous record was 2.583 GFLOPS on shared memory machines

Keywords

This publication has 26 references indexed in Scilit:

Efficient run-time support for irregular task computations with mixed granularities
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Improved load distribution in parallel sparse Cholesky factorization
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Synchronization and communication in the T3E multiprocessor
Published by Association for Computing Machinery (ACM) ,1996
Parallel Sparse Orthogonal Factorization on Distributed-Memory Multiprocessors
SIAM Journal on Scientific Computing, 1996
Decoupling synchronization and data transfer in message passing systems of parallel computers
Published by Association for Computing Machinery (ACM) ,1995
Structural Representations of Schur Complements in Sparse Matrices
Published by Springer Nature ,1993
PYRROS
Published by Association for Computing Machinery (ACM) ,1992
Computational models and task scheduling for parallel sparse Cholesky factorization
Parallel Computing, 1986
The Multifrontal Solution of Indefinite Sparse Symmetric Linear
ACM Transactions on Mathematical Software, 1983
On Algorithms for Obtaining a Maximum Transversal
ACM Transactions on Mathematical Software, 1981