MAD skills
Top Cited Papers
- 1 August 2009
- journal article
- Published by Association for Computing Machinery (ACM) in Proceedings of the VLDB Endowment
- Vol. 2 (2) , 1481-1492
- https://doi.org/10.14778/1687553.1687576
Abstract
As massive data acquisition and storage becomes increasingly affordable, a wide variety of enterprises are employing statisticians to engage in sophisticated data analysis. In this paper we highlight the emerging practice of Magnetic, Agile, Deep (MAD) data analysis as a radical departure from traditional Enterprise Data Warehouses and Business Intelligence. We present our design philosophy, techniques and experience providing MAD analytics for one of the world's largest advertising networks at Fox Audience Network, using the Greenplum parallel database system. We describe database design methodologies that support the agile working style of analysts in these settings. We present dataparallel algorithms for sophisticated statistical techniques, with a focus on density methods. Finally, we reflect on database system features that enable agile design and flexible algorithm development using both SQL and MapReduce interfaces over a variety of storage mechanisms.Keywords
This publication has 11 references indexed in Scilit:
- From databases to dataspacesACM SIGMOD Record, 2005
- OSKI: A library of automatically tuned sparse matrix kernelsJournal of Physics: Conference Series, 2005
- OptimizationPublished by Springer Nature ,2004
- Designing and mining multi-terabyte astronomy archivesACM SIGMOD Record, 2000
- Large-Scale Parallel Data MiningPublished by Springer Nature ,2000
- Online aggregationPublished by Association for Computing Machinery (ACM) ,1997
- ScaLAPACK: a portable linear algebra library for distributed memory computers — design issues and performanceComputer Physics Communications, 1996
- Loading databases using dataflow parallelismACM SIGMOD Record, 1994
- Encapsulation of parallelism in the Volcano query processing systemACM SIGMOD Record, 1990
- Inclusion of new types in relational data base systemsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1986