Methods to impute missing genotypes for population data
- 13 September 2007
- journal article
- Published by Springer Nature in Human Genetics
- Vol. 122 (5) , 495-504
- https://doi.org/10.1007/s00439-007-0427-y
Abstract
For large-scale genotyping studies, it is common for most subjects to have some missing genetic markers, even if the missing rate per marker is low. This compromises association analyses, with varying numbers of subjects contributing to analyses when performing single-marker or multi-marker analyses. In this paper, we consider eight methods to infer missing genotypes, including two haplotype reconstruction methods (local expectation maximization-EM, and fastPHASE), two k-nearest neighbor methods (original k-nearest neighbor, KNN, and a weighted k-nearest neighbor, wtKNN), three linear regression methods (backward variable selection, LM.back, least angle regression, LM.lars, and singular value decomposition, LM.svd), and a regression tree, Rtree. We evaluate the accuracy of them using single nucleotide polymorphism (SNP) data from the HapMap project, under a variety of conditions and parameters. We find that fastPHASE has the lowest error rates across different analysis panels and marker densities. LM.lars gives slightly less accurate estimate of missing genotypes than fastPHASE, but has better performance than the other methods.Keywords
This publication has 37 references indexed in Scilit:
- Imputation-Based Analysis of Association Studies: Candidate Regions and Quantitative TraitsPLoS Genetics, 2007
- A new multipoint method for genome-wide association studies by imputation of genotypesNature Genetics, 2007
- Principal components analysis corrects for stratification in genome-wide association studiesNature Genetics, 2006
- Bayesian mapping of genotype × expression interactions in quantitative and qualitative traitsHeredity, 2006
- A Fast and Flexible Statistical Model for Large-Scale Population Genotype Data: Applications to Inferring Missing Genotypes and Haplotypic PhaseAmerican Journal of Human Genetics, 2006
- A haplotype map of the human genomeNature, 2005
- A Comparison of Bayesian Methods for Haplotype Reconstruction from Population Genotype DataAmerican Journal of Human Genetics, 2003
- Score Tests for Association between Traits and Haplotypes when Linkage Phase Is AmbiguousAmerican Journal of Human Genetics, 2002
- A tutorial on hidden Markov models and selected applications in speech recognitionProceedings of the IEEE, 1989
- A new look at the statistical model identificationIEEE Transactions on Automatic Control, 1974