Dynamic miss-counting algorithms: finding implication and similarity rules with confidence pruning
- 7 November 2002
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- No. 10636382,p. 501-511
- https://doi.org/10.1109/icde.2000.839449
Abstract
Dynamic miss-counting (DMC) algorithms are proposed which find all implication and similarity rules with confidence pruning but without support pruning. To handle data sets with a large number of columns, we propose dynamic pruning techniques that can be applied during data scanning. DMC counts the numbers of rows in which each pair of columns disagree instead of counting the number of hits. DMC deletes a candidate as soon as the number of misses exceeds the maximum number of misses allowed for that pair. We also propose several optimization techniques that reduce the required memory size significantly. We evaluated our algorithms by using four data sets, viz. Web access logs, a Web page-link graph, news documents and a dictionary. These data sets have between 74,000 and 700,000 items. Experiments show that DMC can find high-confidence rules for such large data sets efficiently.Keywords
This publication has 12 references indexed in Scilit:
- On the resemblance and containment of documentsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Finding interesting associations without support pruningPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Cure: an efficient clustering algorithm for large databasesInformation Systems, 2001
- Constraint-based rule mining in large, dense databasesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1999
- Dynamic itemset counting and implication rules for market basket dataPublished by Association for Computing Machinery (ACM) ,1997
- A fast distributed algorithm for mining association rulesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1996
- Building a scalable and accurate copy detection mechanismPublished by Association for Computing Machinery (ACM) ,1996
- Randomized AlgorithmsPublished by Cambridge University Press (CUP) ,1995
- An effective hash-based algorithm for mining association rulesACM SIGMOD Record, 1995
- Using collaborative filtering to weave an information tapestryCommunications of the ACM, 1992