Bayesian Data Mining in Large Frequency Tables, with an Application to the FDA Spontaneous Reporting System
- 1 August 1999
- journal article
- research article
- Published by Taylor & Francis in The American Statistician
- Vol. 53 (3) , 177-190
- https://doi.org/10.1080/00031305.1999.10474456
Abstract
A common data mining task is the search for associations in large databases. Here we consider the search for “interestingly large” counts in a large frequency table, having millions of cells, most of which have an observed frequency of 0 or 1. We first construct a baseline or null hypothesis expected frequency for each cell, and then suggest and compare screening criteria for ranking the cell deviations of observed from expected count. A criterion based on the results of fitting an empirical Bayes model to the cell counts is recommended. An example compares these criteria for searching the FDA Spontaneous Reporting System database maintained by the Division of Pharmacovigilance and Epidemiology. In the example, each cell count is the number of reports combining one of 1,398 drugs with one of 952 adverse events (total of cell counts = 4.9 million), and the problem is to screen the drug-event combinations for possible further investigation.Keywords
This publication has 13 references indexed in Scilit:
- A Bayesian neural network method for adverse drug reaction signal generationEuropean Journal of Clinical Pharmacology, 1998
- Triple-goal Estimates in Two-stage Hierarchical ModelsJournal of the Royal Statistical Society Series B: Statistical Methodology, 1998
- Data Mining: Statistics and More?The American Statistician, 1998
- Natural language processing in an operational clinical information systemNatural Language Engineering, 1995
- Introduction to Graphical ModellingPublished by Springer Nature ,1995
- Graphical Belief ModelingPublished by Springer Nature ,1995
- Empirical Bayes Methods for Stabilizing Incidence Rates before MappingEpidemiology, 1994
- Empirical Bayes Ranking MethodsJournal of Educational Statistics, 1989
- The 1982 Massachusetts Automobile Insurance Classification SchemeJournal of the Royal Statistical Society: Series D (The Statistician), 1983
- Some Methods for Strengthening the Common χ 2 TestsPublished by JSTOR ,1954