Modeling Informatively Missing Genotypes in Haplotype Analysis
- 15 September 2009
- journal article
- research article
- Published by Taylor & Francis in Communications in Statistics - Theory and Methods
- Vol. 38 (18) , 3445-3460
- https://doi.org/10.1080/03610920802696588
Abstract
It is common to have missing genotypes in practical genetic studies. The majority of the existing statistical methods, including those on haplotype analysis, assume that genotypes are missing at random—that is, at a given marker, different genotypes and different alleles are missing with the same probabilities. In our previous work, we demonstrated that the violation of this assumption may lead to serious bias in haplotype frequency estimates and haplotype association analysis. We proposed a general missing data model to simultaneously characterize missing data patterns across a set of two or more biallelic markers. We proved that haplotype frequencies and missing data probabilities are identifiable if and only if there is linkage disequilibrium between these markers under the general missing data model. In this study, we extend our work to multi-allelic markers and observe a similar finding. Simulation studies on the analysis of haplotypes consisting of two markers illustrate that our proposed model can reduce the bias for haplotype frequency estimates due to incorrect assumptions on the missing data mechanism. Finally, we illustrate the utilities of our method through its application to a real data set from a study of scleroderma.Keywords
This publication has 50 references indexed in Scilit:
- Methods to impute missing genotypes for population dataHuman Genetics, 2007
- Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controlsNature, 2007
- Sequential haplotype scan methods for association analysisGenetic Epidemiology, 2007
- An Arabidopsis Example of Association Mapping in Structured SamplesPLoS Genetics, 2007
- Genetic epidemiology and haplotypesGenetic Epidemiology, 2004
- Evaluating associations of haplotypes with traitsGenetic Epidemiology, 2004
- Estimating haplotype frequencies and standard errors for multiple single nucleotide polymorphismsBiostatistics, 2003
- On the advantage of haplotype analysis in the presence of multiple disease susceptibility allelesGenetic Epidemiology, 2002
- Score Tests for Association between Traits and Haplotypes when Linkage Phase Is AmbiguousAmerican Journal of Human Genetics, 2002
- Fine genetic mapping using haplotype analysis and the missing data problemAnnals of Human Genetics, 1998