Significance Analysis ofROCIndices for Comparing Diagnostic Markers: Applications to Gene Microarray Data

31 December 2004

journal article
research article
Published by Taylor & Francis in Journal of Biopharmaceutical Statistics

Vol. 14 (4) , 985-1003
https://doi.org/10.1081/bip-200035475

Abstract

A common objective in microarray experiments is to select genes that are differentially expressed between two classes (two treatment groups). Selection of differentially expressed genes involves two steps. The first step is to calculate a discriminatory score that will rank the genes in order of evidence of differential expressions. The second step is to determine a cutoff for the ranked scores. Summary indices of the receiver operating characteristic (ROC) curve provide relative measures for a ranking of differential expressions. This article proposes using the hypothesis-testing approach to compute the raw p-values and/or adjusted p-values for three ROC discrimination measures. A cutoff p-value can be determined from the (ranked) p-values or the adjusted p-values to select differentially expressed genes. To quantify the degree of confidence in the selected top-ranked genes, the conditional false discovery rate (FDR) over the selected gene set and the “Type I” (false positive) error probability for each selected gene are estimated. The proposed approach is applied to a public colon tumor data set for illustration. The selected gene sets from three ROC summary indices and the commonly used two-sample t-statistic are applied to the sample classification to evaluate the predictability of the four discrimination measures.

Keywords

This publication has 24 references indexed in Scilit:

Multiple‐Testing Strategy for Analyzing cDNA Array Data on Gene Expression
Biometrics, 2004
Partial AUC Estimation and Regression
Biometrics, 2003
Resampling-based multiple testing for microarray data analysis
TEST, 2003
Selection bias in gene extraction on the basis of microarray gene-expression data
Proceedings of the National Academy of Sciences, 2002
Support vector machine classification and validation of cancer tissue samples using microarray expression data
Bioinformatics, 2000
On the Adaptive Control of the False Discovery Rate in Multiple Testing With Independent Statistics
Journal of Educational and Behavioral Statistics, 2000
Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring
Science, 1999
Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays
Proceedings of the National Academy of Sciences, 1999
Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing
Journal of the Royal Statistical Society Series B: Statistical Methodology, 1995
Some implications of an alternative definition of the multiple comparison problem
Biometrika, 1988