Efficient sparse LU factorization with partial pivoting on distributed memory architectures
- 1 January 1998
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Parallel and Distributed Systems
- Vol. 9 (2) , 109-125
- https://doi.org/10.1109/71.663864
Abstract
A sparse LU factorization based on Gaussian elimination with partial pivoting (GEPP) is important to many scientific applications, but it is still an open problem to develop a high performance GEPP code on distributed memory machines. The main difficulty is that partial pivoting operations dynamically change computation and nonzero fill-in structures during the elimination process. This paper presents an approach called S* for parallelizing this problem on distributed memory machines. The S* approach adopts static symbolic factorization to avoid run-time control overhead, incorporates 2D L/U supemode partitioning and amalgamation strategies to improve caching performance, and exploits irregular task parallelism embedded in sparse LU using asynchronous computation scheduling. The paper discusses and compares the algorithms using 1D and 2D data mapping schemes, and presents experimental studies on Cray-T3D and T3E. The performance results for a set of nonsymmetric benchmark matrices are very encouraging, and S* has achieved up to 6.878 GFLOPS on 128 T3E nodes. To the best of our knowledge, this is the highest performance ever achieved for this challenging problem and the previous record was 2.583 GFLOPS on shared memory machinesKeywords
This publication has 26 references indexed in Scilit:
- Efficient run-time support for irregular task computations with mixed granularitiesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Improved load distribution in parallel sparse Cholesky factorizationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Synchronization and communication in the T3E multiprocessorPublished by Association for Computing Machinery (ACM) ,1996
- Parallel Sparse Orthogonal Factorization on Distributed-Memory MultiprocessorsSIAM Journal on Scientific Computing, 1996
- Decoupling synchronization and data transfer in message passing systems of parallel computersPublished by Association for Computing Machinery (ACM) ,1995
- Structural Representations of Schur Complements in Sparse MatricesPublished by Springer Nature ,1993
- PYRROSPublished by Association for Computing Machinery (ACM) ,1992
- Computational models and task scheduling for parallel sparse Cholesky factorizationParallel Computing, 1986
- The Multifrontal Solution of Indefinite Sparse Symmetric LinearACM Transactions on Mathematical Software, 1983
- On Algorithms for Obtaining a Maximum TransversalACM Transactions on Mathematical Software, 1981