• 1 January 2006
    • preprint
    • Published in RePEc
Abstract
Many large data sets are created using clustered, rather than random sampling schemes. Clustered data arise when multiple observations exist on the same respondent, as in panel data, and when respondents share a common factor, such as a neighborhood or family. In the presence of clustered data, methods that rely on random sampling to measure the precision of an estimator may be incorrect. Many researchers, however, continue to treat respondents from the same sampling cluster as independent observations and thus implicitly ignore the potential intracluster correlation. In this paper, I use a robust method for drawing inferences and data from the Panel Survey of Income Dynamics, to examine the implications of clustered samples on inference. Consistent with the previous survey sampling literature, important differences are revealed in comparisons between the estimated asymptotic variances derived assuming random and clustered sampling, even when there are only a few observations per cluster. The estimates derived under random sampling are generally biased downward.
All Related Versions

This publication has 0 references indexed in Scilit: