Applications of Parallel Computation to Statistical Inference

1 December 1988

journal article
research article
Published by JSTOR in Journal of the American Statistical Association

Vol. 83 (404) , 976
https://doi.org/10.2307/2290123

Abstract

Recent advances in parallel computation (e.g., Eddy and Schervish 1986; Gardner, Gerard, Mowers, Nemeth, and Schnabel 1986) have made it possible for a network of microcomputers to act together as a parallel processor using data-flow algorithms (see O'Leary and Stewart 1985). A data-flow algorithm is one in which the sequence of computations is not scheduled a priori, but rather is determined by the order in which computations are completed. Many statistical problems can be adapted to dataflow algorithms. One requirement is that the computation can be broken into independent parts, that is, parts that can be performed in any order. For example, calculating the average of a large number of data values requires adding them up, which can be done in any order. Suppose that an application can be broken into independent parts and one has p processors available to perform the computations. Then each of the p processors can work on one of the parts at a time. When one processor finishes, it can work on the next part of the computation. Under the right conditions, this can allow a speedup of as much as a factor of p in the computation. The problems that must be solved are approximating the right conditions and breaking the application up accordingly. This article discusses the problem of subdividing a large task in such a way that it can run efficiently on a network of processors communicating over an Ethernet. The discussion centers on several examples. The first example concerns a mode of inference called discrete-finite inference (see Eddy and Schervish 1986). The second example is large-sample data analysis (see Kim and Schervish, in press). The third example concerns multiprocess time series models (see Schervish and Tsay 1988). The fourth example (Lehoczky and Schervish 1987) is a hierarchical model for the responses to a particular item in a national victimization survey. Each of the applications has a different degree of coarseness or “granularity” to the subdivision of the work. Some problems offer several choices of different granularities. For example, in a multiple integration, each of the integrations (or all of them) could be performed in parallel. These choices and differences lead to different levels of performance of the network of processors.

Keywords

This publication has 0 references indexed in Scilit: