An Application of retrospective sampling in the analysis of a very large clustered data set
- 1 August 1997
- journal article
- research article
- Published by Taylor & Francis in Journal of Statistical Computation and Simulation
- Vol. 59 (1) , 63-81
- https://doi.org/10.1080/00949659708811847
Abstract
Analysts may wish to expedite the analysis of a very large data set by examining a subsample of it. Such an analysis may seem relatively straightforward even when the data set consists of clusters of individuals with polytomous responses, together with covariates measured on the individuals and on the clusters. However, complications arise when a small fraction of these clusters contains rare, but important, responses. In this paper, we use retrospective sampling of the clusters to facilitate the analysis of the large data set while at the same time obtaining considerable information about rare outcomes. The data are then modelled employing weighted generalized estimating equations and a correspondingly weighted robust covariance structure. The analysis of a large set of data containing information on individuals in car accidents is used to demonstrate these techniques.Keywords
This publication has 23 references indexed in Scilit:
- Analysis of Semiparametric Regression Models for Repeated Outcomes in the Presence of Missing DataJournal of the American Statistical Association, 1995
- Analysing Categorical Responses Obtained from Large ClustersJournal of the Royal Statistical Society Series C: Applied Statistics, 1995
- Analysis of repeated categorical data using generalized estimating equationsStatistics in Medicine, 1994
- Regression analysis with clustered dataStatistics in Medicine, 1994
- On Consistency and Inconsistency of Estimating EquationsEconometric Theory, 1986
- Longitudinal data analysis using generalized linear modelsBiometrika, 1986
- Maximum Likelihood Estimation of Misspecified ModelsEconometrica, 1982
- Logistic disease incidence models and case-control studiesBiometrika, 1979
- Asymptotic relations between the likelihood estimating function and the maximum likelihood estimatorAnnals of the Institute of Statistical Mathematics, 1973
- Separate sample logistic discriminationBiometrika, 1972