Feasible itemset distributions in data mining
- 9 June 2003
- proceedings article
- Published by Association for Computing Machinery (ACM)
- p. 284-295
- https://doi.org/10.1145/773153.773181
Abstract
Computing frequent itemsets and maximally frequent item-sets in a database are classic problems in data mining. The resource requirements of all extant algorithms for both problems depend on the distribution of frequent patterns, a topic that has not been formally investigated. In this paper, we study properties of length distributions of frequent and maximal frequent itemset collections and provide novel solutions for computing tight lower bounds for feasible distributions. We show how these bounding distributions can help in generating realistic synthetic datasets, which can be used for algorithm benchmarking.Keywords
This publication has 5 references indexed in Scilit:
- Real world performance of association rule algorithmsPublished by Association for Computing Machinery (ACM) ,2001
- Depth first generation of long patternsPublished by Association for Computing Machinery (ACM) ,2000
- Mining frequent patterns without candidate generationPublished by Association for Computing Machinery (ACM) ,2000
- Efficiently mining long patterns from databasesPublished by Association for Computing Machinery (ACM) ,1998
- Data mining, hypergraph transversals, and machine learning (extended abstract)Published by Association for Computing Machinery (ACM) ,1997