Appropriateness of some resampling‐based inference procedures for assessing performance of prognostic classifiers derived from microarray data
- 5 June 2006
- journal article
- research article
- Published by Wiley in Statistics in Medicine
- Vol. 26 (5) , 1102-1113
- https://doi.org/10.1002/sim.2598
Abstract
The goal of many gene‐expression microarray profiling clinical studies is to develop a multivariate classifier to predict patient disease outcome from a gene‐expression profile measured on some biological specimen from the patient. Often some preliminary validation of the predictive power of a profile‐based classifier is carried out using the same data set that was used to derive the classifier. Techniques such as cross‐validation or bootstrapping can be used in this setting to assess predictive power, and if applied correctly, can result in a less biased estimate of predictive accuracy of a classifier. However, some investigators have attempted to apply standard statistical inference procedures to assess the statistical significance of associations between true and cross‐validated predicted outcomes. We demonstrate in this paper that naïve application of standard statistical inference procedures to these measures of association under null situations can result in greatly inflated testing type I error rates. Under alternatives of small to moderate associations, confidence interval coverage probabilities may be too low, although for very large associations coverage probabilities approach their intended values. Our results suggest that caution should be exercised in interpreting some of the claims of exceptional prognostic classifier performance that have been reported in prominent biomedical journals in the past few years. Copyright © 2006 John Wiley & Sons, Ltd.Keywords
This publication has 23 references indexed in Scilit:
- Stratification of Intermediate-Risk Endometrial Cancer Patients into Groups at High Risk or Low Risk for Recurrence Based on Tumor Gene Expression ProfilesClinical Cancer Research, 2005
- Limitations of the Odds Ratio in Gauging the Performance of a Diagnostic, Prognostic, or Screening MarkerAmerican Journal of Epidemiology, 2004
- An Example of Slow Convergence of the Bootstrap in High DimensionsThe American Statistician, 2004
- A Gene-Expression Signature as a Predictor of Survival in Breast CancerNew England Journal of Medicine, 2002
- Gene-expression profiles predict survival of patients with lung adenocarcinomaNature Medicine, 2002
- A Paradigm for Class Prediction Using Gene Expression ProfilesJournal of Computational Biology, 2002
- Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression DataJournal of the American Statistical Association, 2002
- Gene expression profiling predicts clinical outcome of breast cancerNature, 2002
- Pre-validation and inference in microarraysStatistical Applications in Genetics and Molecular Biology, 2002
- Estimating the Error Rate of a Prediction Rule: Improvement on Cross-ValidationJournal of the American Statistical Association, 1983