Data organization and access for efficient data mining
- 1 January 1999
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- p. 522-529
- https://doi.org/10.1109/icde.1999.754968
Abstract
Efficient mining of data presents a significant challenge, due to problems of combinatorial explosion in the space and time often required for such processing. While previous work has focused on improving the efficiency of the mining algorithms, we consider how the representation, organization, and access of the data may significantly affect performance, especially when I/O costs are also considered. By a simple analysis and comparison of the counting stage for the a priori association rules algorithm, we show that a "column-wise" approach to data access is often more efficient than the standard row-wise approach. We also provide the results of empirical simulations to validate our analysis. The key idea in our approach is that counting in the a priori algorithm with data accessed in a column-wise manner, significantly reduces the number of disk accesses required to identify itemsets with a minimum support in the database-primarily by reducing the degree to which data and counters need to be repeatedly brought into memory.Keywords
This publication has 4 references indexed in Scilit:
- Data organization and access for efficient data miningPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1999
- Efficiently mining long patterns from databasesPublished by Association for Computing Machinery (ACM) ,1998
- Systems for KDD: From concepts to practiceFuture Generation Computer Systems, 1997
- Mining association rules between sets of items in large databasesPublished by Association for Computing Machinery (ACM) ,1993