Resampling Inference With Complex Survey Data

Abstract
Methods for standard errors and confidence intervals for nonlinear statistics —such as ratios, regression, and correlation coefficients—have been extensively studied for stratified multistage designs in which the clusters are sampled with replacement, in particular, the important special case of two sampled clusters per stratum. These methods include the customary linearization (or Taylor) method and resampling methods based on the jackknife and balanced repeated replication (BRR). Unlike the jackknife or the BRR, the linearization method is applicable to general sampling designs, but it involves a separate variance formula for each nonlinear statistic, thereby requiring additional programming efforts. Both the jackknife and the BRR use a single variance formula for all nonlinear statistics, but they are more computing-intensive. The resampling methods developed here retain these features of the jackknife and the BRR, yet permit extension to more complex designs involving sampling without replacement. The sampling designs studied include (a) stratified cluster sampling in which the clusters are sampled with replacement, (b) stratified simple random sampling without replacement, (c) unequal probability sampling without replacement, and (d) two-stage cluster sampling with equal probabilities and without replacement. Our proposed resampling methods may be viewed as extensions to complex survey samples of the bootstrap, and in the case of design (c), of the BRR as well. We obtain and study the properties of variance estimators of and confidence intervals for the parameter of interest θ, based on the bootstrap histogram of the t statistic. The variance estimators reduce to the standard ones in the special case of linear statistics. These confidence intervals take account of the skewness in the distribution of , unlike the intervals based on the normal approximation. For case (a), the sampled clusters are resampled in each stratum independently by simple random sampling with replacement, and this procedure is replicated many times. The estimate for each resampled cluster is properly scaled such that the resulting variance estimator of reduces to the standard unbiased variance estimator in the linear case. For case (b), the sampled units are resampled in each stratum as in case (a), but a different scaling is used. Different resampling procedures and scalings are employed for case (c). A two-stage resampling procedure is developed for case (d). Results of a simulation study under a stratified simple random sampling design show that the bootstrap intervals track the nominal error rate in each tail better than the intervals based on the normal approximation, but the bootstrap variance estimators are less stable than those based on the linearization or the jackknife.

This publication has 0 references indexed in Scilit: