Highly Parallel Sparse Cholesky Factorization

Abstract
This paper develops and compares several fine-grained parallel algorithms to compute the Cholesky factorization of a sparse matrix. The experimental implementations are on the Connection Machine, a distributed-memory SIMD machine whose programming model conceptually supplies one processor per data element. In contrast to special-purpose algorithms in which the matrix structure conforms to the connection structure of the machine, this paper focuses on matrices with arbitrary sparsity structure. The most promising alternative is a supernodal, multifrontal algorithm whose inner loop performs several dense factorizations simultaneously on a two-dimensional grid of processors. The key subroutine is a fine-grained parallel, dense-factorization algorithm. The sparse code attains execution rates comparable to those of the dense subroutine. Although at present architectural limitations prevent the dense factorization from realizing its potential efficiency, it is concluded that a regular data parallel architecture can be used efficiently to solve arbitrarily structured sparse problems. A performance model is also presented and used to analyze these algorithms. Asymptotic analysis combined with experimental measurement of parameters is accurate enough to be useful in choosing among alternative algorithms for a complicated problem.

This publication has 18 references indexed in Scilit: