Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer
Top Cited Papers
- 11 April 2006
- journal article
- research article
- Published by Proceedings of the National Academy of Sciences in Proceedings of the National Academy of Sciences
- Vol. 103 (15) , 5923-5928
- https://doi.org/10.1073/pnas.0601231103
Abstract
Predicting at the time of discovery the prognosis and metastatic potential of cancer is a major challenge in current clinical research. Numerous recent studies searched for gene expression signatures that outperform traditionally used clinical parameters in outcome prediction. Finding such a signature will free many patients of the suffering and toxicity associated with adjuvant chemotherapy given to them under current protocols, even though they do not need such treatment. A reliable set of predictive genes also will contribute to a better understanding of the biological mechanism of metastasis. Several groups have published lists of predictive genes and reported good predictive performance based on them. However, the gene lists obtained for the same clinical types of patients by different groups differed widely and had only very few genes in common. This lack of agreement raised doubts about the reliability and robustness of the reported predictive gene lists, and the main source of the problem was shown to be the small number of samples that were used to generate the gene lists. Here, we introduce a previously undescribed mathematical method, probably approximately correct (PAC) sorting, for evaluating the robustness of such lists. We calculate for several published data sets the number of samples that are needed to achieve any desired level of reproducibility. For example, to achieve a typical overlap of 50% between two predictive lists of genes, breast cancer studies would need the expression profiles of several thousand early discovery patients.Keywords
This publication has 26 references indexed in Scilit:
- Molecular Classification and Molecular Forecasting of Breast Cancer: Ready for Clinical Application?Journal of Clinical Oncology, 2005
- Genomics in breast cancer—therapeutic implicationsNature Clinical Practice Oncology, 2005
- Semi-Supervised Methods to Predict Patient Survival from Gene Expression DataPLoS Biology, 2004
- A Gene-Expression Signature as a Predictor of Survival in Breast CancerNew England Journal of Medicine, 2002
- A molecular signature of metastasis in primary solid tumorsNature Genetics, 2002
- Gene-expression profiles predict survival of patients with lung adenocarcinomaNature Medicine, 2002
- Gene expression profiling predicts clinical outcome of breast cancerNature, 2002
- Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implicationsProceedings of the National Academy of Sciences, 2001
- Polychemotherapy for early breast cancer: an overview of the randomised trialsThe Lancet, 1998
- On the optimal number of features in the classification of multivariate Gaussian dataPattern Recognition, 1978