Reproducible Statistical Analysis in Microarray Profiling Studies

Abstract
Objectives: Microarrays are a recent biotechnology that offers the hope of improved cancer classification. A number of publications presented clinically promising results by combining this new kind of biological data with specifically designed algorithmic approaches. But, reproducing published results in this domain is harder than it may seem. Methods: This paper presents examples, discusses the problems hidden in the published analyses and demonstrates a strategy to improve the situation which is based on the vignette technology available from the R and Bioconductor projects. Results: The tool of a compendium is discussed to achieve reproducible calculations and to offer an extensible computational framework. A compendium is a document that bundles primary data, processing methods (computational code), derived data, and statistical output with textual documentation and conclusions. It is interactive in the sense that it allows for the modification of the processing options, plugging in new data, or inserting further algorithms and visualizations. Conclusions: Due to the complexity of the algorithms, the size of the data sets, and the limitations of the medium printed paper it is usually not possible to report all the minutiae of the data processing and statistical computations. The technique of a compendium allows a complete critical assessment of a complex analysis.

This publication has 0 references indexed in Scilit: