Hash based parallel algorithms for mining association rules
- 24 December 2002
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
Abstract
We propose four parallel algorithms (NPA, SPA, HPA and HPA-ELD) for mining association rules on shared nothing parallel machines to improve its performance. In NPA, candidate itemsets are just copied amongst all the processors, which can lead to memory overflow for large transaction databases. The remaining three algorithms partition the candidate itemsets over the processors. If it is partitioned simply (SPA), transaction data has to be broadcast to all processors. HPA partitions the candidate itemsets using a hash function to eliminate broadcasting, which also reduces the comparison workload significantly. HPA-ELD fully utilizes the available memory space by detecting the extremely large itemsets and copying them, which is also very effective at flattering the load over the processors. We implemented these algorithms in a shared nothing environment. Performance evaluations show that the best algorithm, HPA-ELD, attains good linearity on speedup ratio and is effective for handling skew.Keywords
This publication has 4 references indexed in Scilit:
- Mining generalized association rulesFuture Generation Computer Systems, 1997
- An effective hash-based algorithm for mining association rulesACM SIGMOD Record, 1995
- Efficient parallel data mining for association rulesPublished by Association for Computing Machinery (ACM) ,1995
- Mining association rules between sets of items in large databasesPublished by Association for Computing Machinery (ACM) ,1993