Practical Issues in Imputation-Based Association Mapping
Open Access
- 5 December 2008
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLoS Genetics
- Vol. 4 (12) , e1000279
- https://doi.org/10.1371/journal.pgen.1000279
Abstract
Imputation-based association methods provide a powerful framework for testing untyped variants for association with phenotypes and for combining results from multiple studies that use different genotyping platforms. Here, we consider several issues that arise when applying these methods in practice, including: (i) factors affecting imputation accuracy, including choice of reference panel; (ii) the effects of imputation accuracy on power to detect associations; (iii) the relative merits of Bayesian and frequentist approaches to testing imputed genotypes for association with phenotype; and (iv) how to quickly and accurately compute Bayes factors for testing imputed SNPs. We find that imputation-based methods can be robust to imputation accuracy and can improve power to detect associations, even when average imputation accuracy is poor. We explain how ranking SNPs for association by a standard likelihood ratio test gives the same results as a Bayesian procedure that uses an unnatural prior assumption—specifically, that difficult-to-impute SNPs tend to have larger effects—and assess the power gained from using a Bayesian approach that does not make this assumption. Within the Bayesian framework, we find that good approximations to a full analysis can be achieved by simply replacing unknown genotypes with a point estimate—their posterior mean. This approximation considerably reduces computational expense compared with published sampling-based approaches, and the methods we present are practical on a genome-wide scale with very modest computational resources (e.g., a single desktop computer). The approximation also facilitates combining information across studies, using only summary data for each SNP. Methods discussed here are implemented in the software package BIMBAM, which is available from http://stephenslab.uchicago.edu/software.html. Genotype imputation is becoming a popular approach to comparing and combining results of multiple association studies that used different SNP genotyping platforms. The basic idea is to exploit the fact that, due to correlation among untyped and typed SNPs, genotypes of untyped SNPs in each study can be inferred (“imputed”) from the genotypes at typed SNPs, often with high accuracy. In this paper, we consider several issues that arise when applying these methods in practice, including factors affecting imputation accuracy, the importance of taking account of imputation uncertainty when testing for association between imputed SNPs and phenotype, how imputation accuracy affects power, and how to combine results across studies when only single-SNP summary data can be shared among research groups.Keywords
This publication has 22 references indexed in Scilit:
- Polymorphisms of the HNF1A Gene Encoding Hepatocyte Nuclear Factor-1α are Associated with C-Reactive ProteinAmerican Journal of Human Genetics, 2008
- Simple and Efficient Analysis of Disease Association with Missing Genotype DataAmerican Journal of Human Genetics, 2008
- Rapid and Accurate Haplotype Phasing and Missing-Data Inference for Whole-Genome Association Studies By Use of Localized Haplotype ClusteringAmerican Journal of Human Genetics, 2007
- A second generation human haplotype map of over 3.1 million SNPsNature, 2007
- A Bayesian Measure of the Probability of False Discovery in Genetic Epidemiology StudiesAmerican Journal of Human Genetics, 2007
- A new multipoint method for genome-wide association studies by imputation of genotypesNature Genetics, 2007
- Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controlsNature, 2007
- A Genome-Wide Association Study of Type 2 Diabetes in Finns Detects Multiple Susceptibility VariantsScience, 2007
- A Fast and Flexible Statistical Model for Large-Scale Population Genotype Data: Applications to Inferring Missing Genotypes and Haplotypic PhaseAmerican Journal of Human Genetics, 2006
- Statistical significance for genomewide studiesProceedings of the National Academy of Sciences, 2003