Proper analysis of secondary phenotype data in case‐control association studies
- 1 December 2008
- journal article
- research article
- Published by Wiley in Genetic Epidemiology
- Vol. 33 (3) , 256-265
- https://doi.org/10.1002/gepi.20377
Abstract
Case‐control association studies often collect extensive information on secondary phenotypes, which are quantitative or qualitative traits other than the case‐control status. Exploring secondary phenotypes can yield valuable insights into biological pathways and identify genetic variants influencing phenotypes of direct interest. All publications on secondary phenotypes have used standard statistical methods, such as least‐squares regression for quantitative traits. Because of unequal selection probabilities between cases and controls, the case‐control sample is not a random sample from the general population. As a result, standard statistical analysis of secondary phenotype data can be extremely misleading. Although one may avoid the sampling bias by analyzing cases and controls separately or by including the case‐control status as a covariate in the model, the associations between a secondary phenotype and a genetic variant in the case and control groups can be quite different from the association in the general population. In this article, we present novel statistical methods that properly reflect the case‐control sampling in the analysis of secondary phenotype data. The new methods provide unbiased estimation of genetic effects and accurate control of false‐positive rates while maximizing statistical power. We demonstrate the pitfalls of the standard methods and the advantages of the new methods both analytically and numerically. The relevant software is available at our website. Genet. Epidemiol . 2009.Keywords
This publication has 18 references indexed in Scilit:
- Common variants near MC4R are associated with fat mass, weight and risk of obesityNature Genetics, 2008
- Many sequence variants affecting diversity of adult human heightNature Genetics, 2008
- Identification of ten loci associated with height highlights new biological pathways in human growthNature Genetics, 2008
- Six new loci associated with blood low-density lipoprotein cholesterol, high-density lipoprotein cholesterol or triglycerides in humansNature Genetics, 2008
- Analyses of Genome-Wide Association Scans for Additional OutcomesEpidemiology, 2007
- A Common Variant in the FTO Gene Is Associated with Body Mass Index and Predisposes to Childhood and Adult ObesityScience, 2007
- Likelihood-Based Inference on Haplotype Effects in Genetic Association StudiesJournal of the American Statistical Association, 2006
- Secondary analysis of case-control dataStatistics in Medicine, 2005
- Logistic regression in case‐control studies: The effect of using independent as dependent variablesStatistics in Medicine, 1995
- Logistic disease incidence models and case-control studiesBiometrika, 1979