Proportional hazards models based on biased samples and estimated selection probabilities

Abstract
In non‐randomized biomedical studies using the proportional hazards model, the data often constitute an unrepresentative sample of the underlying target population, which results in biased regression coefficients. The bias can be avoided by weighting included subjects by the inverse of their respective selection probabilities, as proposed by Horvitz & Thompson (1952) and extended to the proportional hazards setting for use in surveys by Binder (1992) and Lin (2000). In practice, the weights are often estimated and must be treated as such in order for the resulting inference to be accurate. The authors propose a two‐stage weighted proportional hazards model in which, at the first stage, weights are estimated through a logistic regression model fitted to a representative sample from the target population. At the second stage, a weighted Cox model is fitted to the biased sample. The authors propose estimators for the regression parameter and cumulative baseline hazard. They derive the asymptotic properties of the parameter estimators, accounting for the difference in the variance introduced by the randomness of the weights. They evaluate the accuracy of the asymptotic approximations in finite samples through simulation. They illustrate their approach in an analysis of renal transplant patients using data obtained from the Scientific Registry of Transplant Recipients