Feasible itemset distributions in data mining

9 June 2003

proceedings article
Published by Association for Computing Machinery (ACM)

p. 284-295
https://doi.org/10.1145/773153.773181

Abstract

Computing frequent itemsets and maximally frequent item-sets in a database are classic problems in data mining. The resource requirements of all extant algorithms for both problems depend on the distribution of frequent patterns, a topic that has not been formally investigated. In this paper, we study properties of length distributions of frequent and maximal frequent itemset collections and provide novel solutions for computing tight lower bounds for feasible distributions. We show how these bounding distributions can help in generating realistic synthetic datasets, which can be used for algorithm benchmarking.

Keywords

This publication has 5 references indexed in Scilit:

Real world performance of association rule algorithms
Published by Association for Computing Machinery (ACM) ,2001
Depth first generation of long patterns
Published by Association for Computing Machinery (ACM) ,2000
Mining frequent patterns without candidate generation
Published by Association for Computing Machinery (ACM) ,2000
Efficiently mining long patterns from databases
Published by Association for Computing Machinery (ACM) ,1998
Data mining, hypergraph transversals, and machine learning (extended abstract)
Published by Association for Computing Machinery (ACM) ,1997