Deriving chemosensitivity from cell lines: Forensic bioinformatics and reproducible research in high-throughput biology
Open Access
- 1 December 2009
- journal article
- research article
- Published by Institute of Mathematical Statistics in The Annals of Applied Statistics
- Vol. 3 (4) , 1309-1334
- https://doi.org/10.1214/09-aoas291
Abstract
High-throughput biological assays such as microarrays let us ask very detailed questions about how diseases operate, and promise to let us personalize therapy. Data processing, however, is often not described well enough to allow for exact reproduction of the results, leading to exercises in “forensic bioinformatics” where aspects of raw data and reported results are used to infer what methods must have been employed. Unfortunately, poor documentation can shift from an inconvenience to an active danger when it obscures not just methods but errors. In this report we examine several related papers purporting to use microarray-based signatures of drug sensitivity derived from cell lines to predict patient response. Patients in clinical trials are currently being allocated to treatment arms on the basis of these results. However, we show in five case studies that the results incorporate several simple errors that may be putting patients at risk. One theme that emerges is that the most common errors are simple (e.g., row or column offsets); conversely, it is our experience that the most simple errors are common. We then discuss steps we are taking to avoid such errors in our own investigations.Keywords
All Related Versions
This publication has 32 references indexed in Scilit:
- Genomic and Molecular Profiling Predicts Response to Temozolomide in MelanomaClinical Cancer Research, 2009
- A genomic approach to identify molecular pathways associated with chemotherapy resistanceMolecular Cancer Therapeutics, 2008
- Automating dChip: toward reproducible sharing of microarray data analysisBMC Bioinformatics, 2008
- An Integrated Approach to the Prediction of Chemotherapeutic Response in Patients with Breast CancerPLOS ONE, 2008
- Statistical Analyses and Reproducible ResearchJournal of Computational and Graphical Statistics, 2007
- Genomic signatures to guide the use of chemotherapeuticsNature Medicine, 2006
- Reproducible Research: A Bioinformatics Case StudyStatistical Applications in Genetics and Molecular Biology, 2005
- High-resolution serum proteomic patterns for ovarian cancer detectionEndocrine-Related Cancer, 2004
- A Compendium to Ensure Computational Reproducibility in High-Dimensional Classification TasksStatistical Applications in Genetics and Molecular Biology, 2004
- A conjecture of Gy. PetruskaAequationes mathematicae, 1990