A Flexible and Accurate Genotype Imputation Method for the Next Generation of Genome-Wide Association Studies
Top Cited Papers
Open Access
- 19 June 2009
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLoS Genetics
- Vol. 5 (6) , e1000529
- https://doi.org/10.1371/journal.pgen.1000529
Abstract
Genotype imputation methods are now being widely used in the analysis of genome-wide association studies. Most imputation analyses to date have used the HapMap as a reference dataset, but new reference panels (such as controls genotyped on multiple SNP chips and densely typed samples from the 1,000 Genomes Project) will soon allow a broader range of SNPs to be imputed with higher accuracy, thereby increasing power. We describe a genotype imputation method (IMPUTE version 2) that is designed to address the challenges presented by these new datasets. The main innovation of our approach is a flexible modelling framework that increases accuracy and combines information across multiple reference panels while remaining computationally feasible. We find that IMPUTE v2 attains higher accuracy than other methods when the HapMap provides the sole reference panel, but that the size of the panel constrains the improvements that can be made. We also find that imputation accuracy can be greatly enhanced by expanding the reference panel to contain thousands of chromosomes and that IMPUTE v2 outperforms other methods in this setting at both rare and common SNPs, with overall error rates that are 15%–20% lower than those of the closest competing method. One particularly challenging aspect of next-generation association studies is to integrate information across multiple reference panels genotyped on different sets of SNPs; we show that our approach to this problem has practical advantages over other suggested solutions. Large association studies have proven to be effective tools for identifying parts of the genome that influence disease risk and other heritable traits. So-called “genotype imputation” methods form a cornerstone of modern association studies: by extrapolating genetic correlations from a densely characterized reference panel to a sparsely typed study sample, such methods can estimate unobserved genotypes with high accuracy, thereby increasing the chances of finding true associations. To date, most genome-wide imputation analyses have used reference data from the International HapMap Project. While this strategy has been successful, association studies in the near future will also have access to additional reference information, such as control sets genotyped on multiple SNP chips and dense genome-wide haplotypes from the 1,000 Genomes Project. These new reference panels should improve the quality and scope of imputation, but they also present new methodological challenges. We describe a genotype imputation method, IMPUTE version 2, that is designed to address these challenges in next-generation association studies. We show that our method can use a reference panel containing thousands of chromosomes to attain higher accuracy than is possible with the HapMap alone, and that our approach is more accurate than competing methods on both current and next-generation datasets. We also highlight the modeling issues that arise in imputation datasets.Keywords
This publication has 23 references indexed in Scilit:
- Genome-wide association study and meta-analysis find that over 40 loci affect risk of type 1 diabetesNature Genetics, 2009
- A Unified Approach to Genotype Imputation and Haplotype-Phase Inference for Large Data Sets of Trios and Unrelated IndividualsAmerican Journal of Human Genetics, 2009
- Genotype-Imputation Accuracy across Worldwide Human PopulationsAmerican Journal of Human Genetics, 2009
- Genome-wide association defines more than 30 distinct susceptibility loci for Crohn's diseaseNature Genetics, 2008
- Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetesNature Genetics, 2008
- Simple and Efficient Analysis of Disease Association with Missing Genotype DataAmerican Journal of Human Genetics, 2008
- A second generation human haplotype map of over 3.1 million SNPsNature, 2007
- A new multipoint method for genome-wide association studies by imputation of genotypesNature Genetics, 2007
- Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controlsNature, 2007
- Genome-wide association study identifies new susceptibility loci for Crohn disease and implicates autophagy in disease pathogenesisNature Genetics, 2007