Significance Analysis ofROCIndices for Comparing Diagnostic Markers: Applications to Gene Microarray Data
- 31 December 2004
- journal article
- research article
- Published by Taylor & Francis in Journal of Biopharmaceutical Statistics
- Vol. 14 (4) , 985-1003
- https://doi.org/10.1081/bip-200035475
Abstract
A common objective in microarray experiments is to select genes that are differentially expressed between two classes (two treatment groups). Selection of differentially expressed genes involves two steps. The first step is to calculate a discriminatory score that will rank the genes in order of evidence of differential expressions. The second step is to determine a cutoff for the ranked scores. Summary indices of the receiver operating characteristic (ROC) curve provide relative measures for a ranking of differential expressions. This article proposes using the hypothesis-testing approach to compute the raw p-values and/or adjusted p-values for three ROC discrimination measures. A cutoff p-value can be determined from the (ranked) p-values or the adjusted p-values to select differentially expressed genes. To quantify the degree of confidence in the selected top-ranked genes, the conditional false discovery rate (FDR) over the selected gene set and the “Type I” (false positive) error probability for each selected gene are estimated. The proposed approach is applied to a public colon tumor data set for illustration. The selected gene sets from three ROC summary indices and the commonly used two-sample t-statistic are applied to the sample classification to evaluate the predictability of the four discrimination measures.Keywords
This publication has 24 references indexed in Scilit:
- Multiple‐Testing Strategy for Analyzing cDNA Array Data on Gene ExpressionBiometrics, 2004
- Partial AUC Estimation and RegressionBiometrics, 2003
- Resampling-based multiple testing for microarray data analysisTEST, 2003
- Selection bias in gene extraction on the basis of microarray gene-expression dataProceedings of the National Academy of Sciences, 2002
- Support vector machine classification and validation of cancer tissue samples using microarray expression dataBioinformatics, 2000
- On the Adaptive Control of the False Discovery Rate in Multiple Testing With Independent StatisticsJournal of Educational and Behavioral Statistics, 2000
- Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression MonitoringScience, 1999
- Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arraysProceedings of the National Academy of Sciences, 1999
- Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple TestingJournal of the Royal Statistical Society Series B: Statistical Methodology, 1995
- Some implications of an alternative definition of the multiple comparison problemBiometrika, 1988