Efficient mining of association rules in text databases

1 November 1999

conference paper
Published by Association for Computing Machinery (ACM)

p. 234-242
https://doi.org/10.1145/319950.319981

Abstract

In this paper, we propose two new algorithms for mining association rules between words in text databases. The characteristics of text databases are quite different from those of retail transaction databases, and existing mining algorithms cannot handle text databases efficiently because of the large number of itemsets (i.e., words) that need to be counted. Two well-known mining algorithms, Apriori algorithm and Direct Hashing and Pruning (DHP) algorithm, are evaluated in the context of mining text databases, and are compared with the new proposed algorithms named Multipass-Apriori (M-Apriori) and Multipass-DHP (M-DHP). It has been shown that the proposed algorithms have better performance for large text databases.

Keywords

This publication has 3 references indexed in Scilit:

Dynamic itemset counting and implication rules for market basket data
Published by Association for Computing Machinery (ACM) ,1997
Using a hash-based method with transaction trimming for mining association rules
IEEE Transactions on Knowledge and Data Engineering, 1997
Data mining: an overview from a database perspective
IEEE Transactions on Knowledge and Data Engineering, 1996