An ensemble learning approach jointly modeling main and interaction effects in genetic association studies
- 18 January 2008
- journal article
- research article
- Published by Wiley in Genetic Epidemiology
- Vol. 32 (4) , 285-300
- https://doi.org/10.1002/gepi.20304
Abstract
Complex diseases are presumed to be the results of interactions of several genes and environmental factors, with each gene only having a small effect on the disease. Thus, the methods that can account for gene‐gene interactions to search for a set of marker loci in different genes or across genome and to analyze these loci jointly are critical. In this article, we propose an ensemble learning approach (ELA) to detect a set of loci whose main and interaction effects jointly have a significant association with the trait. In the ELA, we first search for “base learners” and then combine the effects of the base learners by a linear model. Each base learner represents a main effect or an interaction effect. The result of the ELA is easy to interpret. When the ELA is applied to analyze a data set, we can get a final model, an overallP‐value of the association test between the set of loci involved in the final model and the trait, and an importance measure for each base learner and each marker involved in the final model. The final model is a linear combination of some base learners. We know which base learner represents a main effect and which one represents an interaction effect. The importance measure of each base learner or marker can tell us the relative importance of the base learner or marker in the final model. We used intensive simulation studies as well as a real data set to evaluate the performance of the ELA. Our simulation studies demonstrated that the ELA is more powerful than the single‐marker test in all the simulation scenarios. The ELA also outperformed the other three existing multi‐locus methods in almost all cases. In an application to a large‐scale case‐control study for Type 2 diabetes, the ELA identified 11 single nucleotide polymorphisms that have a significant multi‐locus effect (P‐value=0.01), while none of the single nucleotide polymorphisms showed significant marginal effects and none of the two‐locus combinations showed significant two‐locus interaction effects.Genet. Epidemiol.Keywords
This publication has 61 references indexed in Scilit:
- A Testing Framework for Identifying Susceptibility Genes in the Presence of EpistasisAmerican Journal of Human Genetics, 2006
- Oligogenic combinations associated with breast cancer risk in women under 53 years of ageHuman Genetics, 2004
- Identifying SNPs predictive of phenotype using random forestsGenetic Epidemiology, 2004
- Glutathione pathway genes and lung cancer risk in young and old populationsCarcinogenesis: Integrative Cancer Research, 2004
- Candidate Gene Association Study in Type 2 Diabetes Indicates a Role for Genes Involved in β-Cell Function as Well as Insulin ActionPLoS Biology, 2003
- Mathematical multi-locus approaches to localizing complex human trait genesNature Reviews Genetics, 2003
- Power of multifactor dimensionality reduction for detecting gene‐gene interactions in the presence of genotyping error, missing data, phenocopy, and genetic heterogeneityGenetic Epidemiology, 2003
- Score Tests for Association between Traits and Haplotypes when Linkage Phase Is AmbiguousAmerican Journal of Human Genetics, 2002
- Trimming, Weighting, and Grouping SNPs in Human Case-Control Association StudiesGenome Research, 2001
- A Combinatorial Partitioning Method to Identify Multilocus Genotypic Partitions That Predict Quantitative Trait VariationGenome Research, 2001