Practical considerations for imputation of untyped markers in admixed populations
- 13 November 2009
- journal article
- research article
- Published by Wiley in Genetic Epidemiology
- Vol. 34 (3) , 258-265
- https://doi.org/10.1002/gepi.20457
Abstract
Imputation of genotypes for markers untyped in a study sample has become a standard approach to increase genome coverage in genome‐wide association studies at practically zero cost. Most methods for imputing missing genotypes extend previously described algorithms for inferring haplotype phase. These algorithms generally fall into three classes based on the underlying model for estimating the conditional distribution of haplotype frequencies: a cluster‐based model, a multinomial model, or a population genetics‐based model. We compared BEAGLE, PLINK, and MACH, representing the three classes of models, respectively, with specific attention to measures of imputation success and selection of the reference panel for an admixed study sample of African Americans. Based on analysis of chromosome 22 and after calibration to a fixed level of 90% concordance between experimentally determined and imputed genotypes, MACH yielded the largest absolute number of successfully imputed markers and the largest gain in coverage of the variation captured by HapMap reference panels. Following the common practice of performing imputation once, the Yoruba in Ibadan, Nigeria (YRI) reference panel outperformed other HapMap reference panels, including (1) African ancestry from Southwest USA (ASW) data, (2) an unweighted combination of the Northern and Western Europe (CEU) and YRI data into a single reference panel, and (3) a combination of the CEU and YRI data into a single reference panel with weights matching estimates of admixture proportions. For our admixed study sample, the optimal strategy involved imputing twice with the HapMap CEU and YRI reference panels separately and then merging the data sets. Genet. Epidemiol. 34: 258–265, 2010.Keywords
This publication has 35 references indexed in Scilit:
- Genotype-Imputation Accuracy across Worldwide Human PopulationsAmerican Journal of Human Genetics, 2009
- Use of weighted reference panels based on empirical estimates of ancestry for capturing untyped variationHuman Genetics, 2009
- Missing data imputation and haplotype phase inference for genome-wide association studiesHuman Genetics, 2008
- Comparing Algorithms for Genotype ImputationAmerican Journal of Human Genetics, 2008
- Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVsNature Genetics, 2008
- Simple and Efficient Analysis of Disease Association with Missing Genotype DataAmerican Journal of Human Genetics, 2008
- Rapid and Accurate Haplotype Phasing and Missing-Data Inference for Whole-Genome Association Studies By Use of Localized Haplotype ClusteringAmerican Journal of Human Genetics, 2007
- A second generation human haplotype map of over 3.1 million SNPsNature, 2007
- A new multipoint method for genome-wide association studies by imputation of genotypesNature Genetics, 2007
- The International HapMap ProjectNature, 2003