Data Mining for Fun and Profit
Open Access
- 1 May 2000
- journal article
- Published by Institute of Mathematical Statistics in Statistical Science
- Vol. 15 (2) , 111-131
- https://doi.org/10.1214/ss/1009212753
Abstract
Data mining is defined as the process of seeking interesting or valuable information within large data sets. This presents novel challenges and problems, distinct from those typically arising in the allied areas of statistics, machine learning, pattern recognition or database science. A distinction is drawn between the two data mining activities of model building and pattern detection. Even though statisticians are familiar with the former, the large data sets involved in data mining mean that novel problems do arise. The second of the activities, pattern detection, presents entirely new classes of challenges, some arising, again, as a consequence of the large sizes of the data sets. Data quality is a particularly troublesome issue in data mining applications, and this is examined. The discussion is illustrated with a variety of real examples.Keywords
This publication has 11 references indexed in Scilit:
- A Discrete Variable Chain Graph for Applicants for CreditJournal of the Royal Statistical Society Series C: Applied Statistics, 1999
- Statistics and data miningACM SIGKDD Explorations Newsletter, 1999
- Graphical models of applicants for creditIMA Journal of Management Mathematics, 1997
- Inference for Non-random SamplesJournal of the Royal Statistical Society Series B: Statistical Methodology, 1997
- An interactive visual query environment for exploring dataPublished by Association for Computing Machinery (ACM) ,1997
- Statistical Themes and Lessons for Data MiningData Mining and Knowledge Discovery, 1997
- What is Statistics?Journal of the Royal Statistical Society Series A: Statistics in Society, 1995
- Greater or lesser statistics: a choice for future researchStatistics and Computing, 1993
- Multivariate Density EstimationPublished by Wiley ,1992
- Screening for vertebral osteoporosis using individual risk factorsOsteoporosis International, 1991