Statistical methods for the identification and use of prognostic factors

Abstract
This is an expository paper which reviews the rationale for determining prognostic factors and the statistical methods for finding and allowing for such factors in the design and analysis of clinical studies. The delineation of prognostic factors in clinical studies is useful: in possibly providing insight into the mechanism of disease; in the determination of stratifications of patients for planning clinical trials; in facilitating the comparison between the outcomes of disease in different groups of patients; in assisting in the allocation of treatment to an individual patient and in permitting remedial action. A response variable is a measure of the future health or illness of the patient and its value is usually dependent on one or more prognostic variables. Statistical methodology for determining prognostic factors is reviewed for the case in which response is dichotomous, continuous or a measure of time (with the possibility of censored observations) and when the predictor variables are discrete or continuous (or a combination of both types). Methods of allowing for known prognostic variables in the analysis of a study are reviewed. These include: grouping of patients according to prognostic variables and comparing treatments separately within each group; covariance analysis and maximum likelihood. Knowledge of prognostic factors is useful in defining a control group which is to be compared with a treated group in non‐randomized studies. Three examples of studies designed to elucidate prognostic factors are described, one each in breast cancer, cancer of the prostate and Hodgkin's disease.