Handling missing values in population data: consequences for maximum likelihood estimation of haplotype frequencies
- 14 July 2004
- journal article
- Published by Springer Nature in European Journal of Human Genetics
- Vol. 12 (10) , 805-812
- https://doi.org/10.1038/sj.ejhg.5201233
Abstract
Haplotype frequency estimation in population data is an important problem in genetics and different methods including expectation maximisation (EM) methods have been proposed. The statistical properties of EM methods have been extensively assessed for data sets with no missing values. When numerous markers and/or individuals are tested, however, it is likely that some genotypes will be missing. Thus, it is of interest to investigate the behaviour of the method in the presence of incomplete genotype observations. We propose an extension of the EM method to handle missing genotypes, and we compare it with commonly used methods (such as ignoring individuals with incomplete genotype information or treating a missing allele as any other allele). Simulations were performed, starting from data sets of haematopoietic stem cell donors genotyped at three HLA loci. We deleted some data to create incomplete genotype observations in various proportions. We then compared the haplotype frequencies obtained on these incomplete data sets using the different methods to those obtained on the complete data. We found that the method proposed here provides better estimations, both qualitatively and quantitatively, but increases the computation time required. We discuss the influence of missing values on the algorithm's efficiency and the advantages and disadvantages of deleting incomplete genotypes. We propose guidelines for missing data handling in routine analysis.Keywords
This publication has 22 references indexed in Scilit:
- Estimation of haplotype frequenciesTissue Antigens, 2008
- The impact of genotyping error on haplotype reconstruction and frequency estimationEuropean Journal of Human Genetics, 2002
- Haplotype frequency estimation in patient populations: The effect of departures from Hardy‐Weinberg proportions and collapsing over a locus in the HLA regionGenetic Epidemiology, 2002
- Effectiveness of computational methods in haplotype predictionHuman Genetics, 2001
- Merlin—rapid analysis of dense genetic maps using sparse gene flow treesNature Genetics, 2001
- Accuracy of Haplotype Frequency Estimation for Biallelic Loci, via the Expectation-Maximization Algorithm for Unphased Diploid Genotype DataAmerican Journal of Human Genetics, 2000
- HLA gene and haplotype frequencies in bone marrow donors worldwide registriesHuman Immunology, 1997
- HLA -A,-B,-DR HAPLOTYPE FREQUENCIES IN FRANCE—IMPLICATIONS FOR RECRUITMENT OF POTENTIAL BONE MARROW DONORSTransplantation, 1995
- Estimation of Haplotype Frequency and Linkage Disequilibrium Parameter in the HLA System1Tissue Antigens, 1978
- COUNTING METHODS IN GENETICAL STATISTICSAnnals of Human Genetics, 1957