Handling missing values in population data: consequences for maximum likelihood estimation of haplotype frequencies

14 July 2004

journal article
Published by Springer Nature in European Journal of Human Genetics

Vol. 12 (10) , 805-812
https://doi.org/10.1038/sj.ejhg.5201233

Abstract

Haplotype frequency estimation in population data is an important problem in genetics and different methods including expectation maximisation (EM) methods have been proposed. The statistical properties of EM methods have been extensively assessed for data sets with no missing values. When numerous markers and/or individuals are tested, however, it is likely that some genotypes will be missing. Thus, it is of interest to investigate the behaviour of the method in the presence of incomplete genotype observations. We propose an extension of the EM method to handle missing genotypes, and we compare it with commonly used methods (such as ignoring individuals with incomplete genotype information or treating a missing allele as any other allele). Simulations were performed, starting from data sets of haematopoietic stem cell donors genotyped at three HLA loci. We deleted some data to create incomplete genotype observations in various proportions. We then compared the haplotype frequencies obtained on these incomplete data sets using the different methods to those obtained on the complete data. We found that the method proposed here provides better estimations, both qualitatively and quantitatively, but increases the computation time required. We discuss the influence of missing values on the algorithm's efficiency and the advantages and disadvantages of deleting incomplete genotypes. We propose guidelines for missing data handling in routine analysis.

Keywords

This publication has 22 references indexed in Scilit:

Estimation of haplotype frequencies
Tissue Antigens, 2008
The impact of genotyping error on haplotype reconstruction and frequency estimation
European Journal of Human Genetics, 2002
Haplotype frequency estimation in patient populations: The effect of departures from Hardy‐Weinberg proportions and collapsing over a locus in the HLA region
Genetic Epidemiology, 2002
Effectiveness of computational methods in haplotype prediction
Human Genetics, 2001
Merlin—rapid analysis of dense genetic maps using sparse gene flow trees
Nature Genetics, 2001
Accuracy of Haplotype Frequency Estimation for Biallelic Loci, via the Expectation-Maximization Algorithm for Unphased Diploid Genotype Data
American Journal of Human Genetics, 2000
HLA gene and haplotype frequencies in bone marrow donors worldwide registries
Human Immunology, 1997
HLA -A,-B,-DR HAPLOTYPE FREQUENCIES IN FRANCE—IMPLICATIONS FOR RECRUITMENT OF POTENTIAL BONE MARROW DONORS
Transplantation, 1995
Estimation of Haplotype Frequency and Linkage Disequilibrium Parameter in the HLA System¹
Tissue Antigens, 1978
COUNTING METHODS IN GENETICAL STATISTICS
Annals of Human Genetics, 1957