Identification of gene‐gene interactions in the presence of missing data using the multifactor dimensionality reduction method
- 24 February 2009
- journal article
- research article
- Published by Wiley in Genetic Epidemiology
- Vol. 33 (7) , 646-656
- https://doi.org/10.1002/gepi.20416
Abstract
Gene‐gene interaction is believed to play an important role in understanding complex traits. Multifactor dimensionality reduction (MDR) was proposed by Ritchie et al. [2001. Am J Hum Genet 69:138–147] to identify multiple loci that simultaneously affect disease susceptibility. Although the MDR method has been widely used to detect gene‐gene interactions, few studies have been reported on MDR analysis when there are missing data. Currently, there are four approaches available in MDR analysis to handle missing data. The first approach uses only complete observations that have no missing data, which can cause a severe loss of data. The second approach is to treat missing values as an additional genotype category, but interpretation of the results may then be not clear and the conclusions may be misleading. Furthermore, it performs poorly when the missing rates are unbalanced between the case and control groups. The third approach is a simple imputation method that imputes missing genotypes as the most frequent genotype, which may also produce biased results. The fourth approach, Available, uses all data available for the given loci to increase power. In any real data analysis, it is not clear which MDR approach one should use when there are missing data. In this article, we consider a new EM Impute approach to handle missing data more appropriately. Through simulation studies, we compared the performance of the proposed EM Impute approach with the current approaches. Our results showed that Available and EM Impute approaches perform better than the three other current approaches in terms of power and precision.Genet. Epidemiol.33:646–656, 2009.Keywords
This publication has 22 references indexed in Scilit:
- Association analysis of sphingomyelinase 2 polymorphisms for the extrinsic type of atopic dermatitis in KoreansJournal of Dermatological Science, 2007
- A balanced accuracy function for epistasis modeling in imbalanced datasets using multifactor dimensionality reductionGenetic Epidemiology, 2007
- Odds ratio based multifactor-dimensionality reduction method for detecting gene–gene interactionsBioinformatics, 2006
- A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibilityJournal of Theoretical Biology, 2006
- Parallel multifactor dimensionality reduction: a tool for the large-scale analysis of gene–gene interactionsBioinformatics, 2006
- A Fast and Flexible Statistical Model for Large-Scale Population Genotype Data: Applications to Inferring Missing Genotypes and Haplotypic PhaseAmerican Journal of Human Genetics, 2006
- Power of multifactor dimensionality reduction for detecting gene‐gene interactions in the presence of genotyping error, missing data, phenocopy, and genetic heterogeneityGenetic Epidemiology, 2003
- Partition-Ligation–Expectation-Maximization Algorithm for Haplotype Inference with Single-Nucleotide PolymorphismsAmerican Journal of Human Genetics, 2002
- Atopic dermatitis: A genetic-epidemiologic study in a population-based twin sampleJournal of the American Academy of Dermatology, 1993
- On the Convergence Properties of the EM AlgorithmThe Annals of Statistics, 1983