Why do so many prognostic factors fail to pan out?

1 October 1992

journal article
Published by Springer Nature in Breast Cancer Research and Treatment

Vol. 22 (3) , 197-206
https://doi.org/10.1007/bf01840833

Abstract

Although there can be many reasons that one study fails to confirm the results of another, the consequences of data exploration and the potential for spuriously significant results are often overlooked. A series of simulation experiments were designed to mimic the characteristics of relapse-free survival data that might be encountered in a prognostic factor study of node-negative breast cancer patients. Each simulated dataset of 500 or 250 cases was divided into a training set, used to select the "best" prognostic factor cutpoint, and a validation set, used to confirm the cutpoint. Testing multiple cutpoints markedly increased the risk of making a Type I error. The power to detect even small true differences was substantial, and increased as the number of cutpoints increased. Regardless of the number of cutpoints tested on the training sets, the Type I error rate on an independent validation data set was quite stable and the power of the validation set to detect true differences was not related to the number of cutpoints. Validation power closely approximated that predicted for a simple two group comparison. It is therefore recommended that exploratory analyses of prognostic factors formally employ some method of adjusting for increased Type I errors, such as independent validation sets, ad hoc adjustment factors, or other statistical methods of estimating the true risk.

Keywords

This publication has 7 references indexed in Scilit:

Optimal Mastectomy Timing
JNCI Journal of the National Cancer Institute, 1992
Breast Cancer Prognostic Factors: Evaluation Guidelines
JNCI Journal of the National Cancer Institute, 1991
HER-2/neu Oncogene Amplification and Expression in Human Mammary Carcinoma
Published by Elsevier ,1991
Flow cytometry in primary breast cancer: improving the prognostic value of the fraction of cells in the S-phase by optimal categorisation of cut-off levels
British Journal of Cancer, 1990
Martingale-based residuals for survival models
Biometrika, 1990
CRITLEVEL: An Exploratory Procedure for the Evaluation of Quantitative Prognostic Factors
Methods of Information in Medicine, 1984
Planning the size and duration of a clinical trial studying the time to some critical event
Journal of Chronic Diseases, 1974