A localized algorithm for parallel association mining

Abstract
Discovery of association rules is an important database mining problem. Mining for association rules involves extracting pat- terns from large databases and inferring useful rules from them. Several parallel and sequential algorithms have been proposed in the literature to solve this problem. Almost all of these algo- rithms make repeated passes over the database to determine the commonly occurring patterns or itemsets (set of items), thus in- curring high I/O overhead. In the parallel case, these algorithms do a reduction at the end of each pass to construct the global patterns, thus incurring high synchronization cost. In this paper we describe a new parallel association min- ing algorithm. Our algorithm is a result of detailed study of the available parallelism and the properties of associations. The algorithm uses a scheme to cluster related frequent itemsets to- gether, and to partition them among the processors. At the same time it also uses a different database layout which clusters related transactions together, and selectively replicates the database so that the portion of the database needed for the computation of associations is local to each processor. After the initial set-up phase, the algorithm eliminates the need for further communi- cation or synchronization. The algorithm further scans the local database partition only three times, thus minimizing I/O over- heads. Unlike previous approaches, the algorithms uses simple intersection operations to compute frequent itemsets and doesn't have to maintain or search complex hash structures. Our experimental testbed is a 32-processor DEC Alpha clus- ter inter-connected by the Memory Channel network. We present results on the performance of our algorithm on various databases, and compare it against a well known parallel algorithm. Our al- gorithm outperforms it by an more than an order of magnitude. provide a host of useful information on customer groups, buying patterns, stock trends, etc. This process of automatic informa- tion inferencing is commonly known as Knowledge Discovery and Data mining (KDD). We look at one aspect of this process — mining for associations. Discovery of association rules is an important problem in database mining. The prototypical appli- cation is the analysis of sales or basket data (2). Basket data consists of items bought by a customer along with the transac- tion identifier. Association rules have been shown to be useful in domains that range from decision support to telecommunications alarm diagnosis, and prediction.

This publication has 0 references indexed in Scilit: