The Genetic Interpretation of Area under the ROC Curve in Genomic Profiling
Top Cited Papers
Open Access
- 26 February 2010
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLoS Genetics
- Vol. 6 (2) , e1000864
- https://doi.org/10.1371/journal.pgen.1000864
Abstract
Genome-wide association studies in human populations have facilitated the creation of genomic profiles which combine the effects of many associated genetic variants to predict risk of disease. The area under the receiver operator characteristic (ROC) curve is a well established measure for determining the efficacy of tests in correctly classifying diseased and non-diseased individuals. We use quantitative genetics theory to provide insight into the genetic interpretation of the area under the ROC curve (AUC) when the test classifier is a predictor of genetic risk. Even when the proportion of genetic variance explained by the test is 100%, there is a maximum value for AUC that depends on the genetic epidemiology of the disease, i.e. either the sibling recurrence risk or heritability and disease prevalence. We derive an equation relating maximum AUC to heritability and disease prevalence. The expression can be reversed to calculate the proportion of genetic variance explained given AUC, disease prevalence, and heritability. We use published estimates of disease prevalence and sibling recurrence risk for 17 complex genetic diseases to calculate the proportion of genetic variance that a test must explain to achieve AUC = 0.75; this varied from 0.10 to 0.74. We provide a genetic interpretation of AUC for use with predictors of genetic risk based on genomic profiles. We provide a strategy to estimate proportion of genetic variance explained on the liability scale from estimates of AUC, disease prevalence, and heritability (or sibling recurrence risk) available as an online calculator. Genome-wide association studies in human populations have facilitated the creation of genomic profiles that combine the effects of many associated genetic variants to predict risk of disease. However, genomic profiles are inherently constrained in their ability to classify diseased from non-diseased individuals dictated by the genetic epidemiology of the disease. In this paper, we use a genetic interpretation to provide insight into the constraints on genomic profiles for risk prediction. We provide a strategy to estimate proportion of genetic variance explained on the liability scale from estimates of AUC, disease prevalence, and heritability available as an online calculator.Keywords
This publication has 47 references indexed in Scilit:
- Sporadic cases are the norm for complex diseaseEuropean Journal of Human Genetics, 2009
- Assessing Susceptibility to Age-related Macular Degeneration with Proteomic and Genomic BiomarkersMolecular & Cellular Proteomics, 2009
- Using Relative Utility Curves to Evaluate Risk PredictionJournal of the Royal Statistical Society Series A: Statistics in Society, 2009
- Common genetic determinants of schizophrenia and bipolar disorder in Swedish families: a population-based studyThe Lancet, 2009
- Transgenerational Epigenetic EffectsAnnual Review of Genomics and Human Genetics, 2008
- Using the Optimal Receiver Operating Characteristic Curve to Design a Predictive Genetic Test, Exemplified with Type 2 DiabetesAmerican Journal of Human Genetics, 2008
- Prediction of individual genetic risk to disease from genome-wide association studiesGenome Research, 2007
- Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controlsNature, 2007
- Global variation in copy number in the human genomeNature, 2006
- Basic principles of ROC analysisSeminars in Nuclear Medicine, 1978