Expressive power of an algebra for data mining
- 1 December 2006
- journal article
- Published by Association for Computing Machinery (ACM) in ACM Transactions on Database Systems
- Vol. 31 (4) , 1169-1214
- https://doi.org/10.1145/1189769.1189770
Abstract
The relational data model has simple and clear foundations on which significant theoretical and systems research has flourished. By contrast, most research on data mining has focused on algorithmic issues. A major open question is: what's an appropriate foundation for data mining, which can accommodate disparate mining tasks? We address this problem by presenting a database model and an algebra for data mining. The database model is based on the 3W-model introduced by Johnson et al. [2000]. This model relied on black box mining operators. A main contribution of this article is to open up these black boxes, by using generic operators in a data mining algebra. Two key operators in this algebra are regionize , which creates regions (or models) from data tuples, and a restricted form of looping called mining loop . Then the resulting data mining algebra MA is studied and properties concerning expressive power and complexity are established. We present results in three directions: (1) expressiveness of the mining algebra; (2) relations with alternative frameworks, and (3) interactions between regionize and mining loop.Keywords
This publication has 21 references indexed in Scilit:
- Mining Databases and Data Streams with Query Languages and RulesPublished by Springer Nature ,2006
- Query Languages and Data Models for Database Sequences and Data StreamsPublished by Elsevier ,2004
- Simulation of the nested relational algebra by the flat relational algebra, with an application to the complexity of evaluating powerset algebra expressionsTheoretical Computer Science, 2001
- Nonmonotonic Reasoning in LDL++Published by Springer Nature ,2000
- Modeling KDD Processes within the Inductive Database FrameworkPublished by Springer Nature ,1999
- On the power of aggregation in relational query languagesPublished by Springer Nature ,1998
- A database perspective on knowledge discoveryCommunications of the ACM, 1996
- Space usage in functional query languagesPublished by Springer Nature ,1995
- Maximizing loop parallelism and improving data locality via loop fusion and distributionPublished by Springer Nature ,1994
- Low-complexity aggregation in GraphLog and DatalogTheoretical Computer Science, 1993