Mining frequent patterns without candidate generation
Top Cited Papers
- 16 May 2000
- journal article
- conference paper
- Published by Association for Computing Machinery (ACM) in ACM SIGMOD Record
- Vol. 29 (2) , 1-12
- https://doi.org/10.1145/335191.335372
Abstract
Mining frequent patterns in transaction databases, time-series databases, and many other kinds of databases has been studied popularly in data mining research. Most of the previous studies adopt an Apriori-like candidate set generation-and-test approach. However, candidate set generation is still costly, especially when there exist prolific patterns and/or long patterns. In this study, we propose a novel frequent pattern tree (FP-tree) structure, which is an extended prefix-tree structure for storing compressed, crucial information about frequent patterns, and develop an efficient FP-tree-based mining method, FP-growth, for mining the complete set of frequent patterns by pattern fragment growth. Efficiency of mining is achieved with three techniques: (1) a large database is compressed into a highly condensed, much smaller data structure, which avoids costly, repeated database scans, (2) our FP-tree-based mining adopts a pattern fragment growth method to avoid the costly generation of a large number of candidate sets, and (3) a partitioning-based, divide-and-conquer method is used to decompose the mining task into a set of smaller tasks for mining confined patterns in conditional databases, which dramatically reduces the search space. Our performance study shows that the FP-growth method is efficient and scalable for mining both long and short frequent patterns, and is about an order of magnitude faster than the Apriori algorithm and also faster than some recently reported new frequent pattern mining methods.Keywords
This publication has 10 references indexed in Scilit:
- Efficient mining of constrained correlated setsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- A Tree Projection Algorithm for Generation of Frequent Item SetsJournal of Parallel and Distributed Computing, 2001
- Efficient mining of emerging patternsPublished by Association for Computing Machinery (ACM) ,1999
- Efficiently mining long patterns from databasesPublished by Association for Computing Machinery (ACM) ,1998
- Integrating association rule mining with relational database systemsPublished by Association for Computing Machinery (ACM) ,1998
- Exploratory mining and pruning optimizations of constrained associations rulesPublished by Association for Computing Machinery (ACM) ,1998
- Discovery of Frequent Episodes in Event SequencesData Mining and Knowledge Discovery, 1997
- Beyond market basketsPublished by Association for Computing Machinery (ACM) ,1997
- An effective hash-based algorithm for mining association rulesPublished by Association for Computing Machinery (ACM) ,1995
- Finding interesting rules from large sets of discovered association rulesPublished by Association for Computing Machinery (ACM) ,1994