The challenge for genetic epidemiologists: how to analyze large numbers of SNPs in relation to complex diseases
Open Access
- 21 April 2006
- journal article
- editorial
- Published by Springer Nature in BMC Genomic Data
- Vol. 7 (1) , 23
- https://doi.org/10.1186/1471-2156-7-23
Abstract
Genetic epidemiologists have taken the challenge to identify genetic polymorphisms involved in the development of diseases. Many have collected data on large numbers of genetic markers but are not familiar with available methods to assess their association with complex diseases. Statistical methods have been developed for analyzing the relation between large numbers of genetic and environmental predictors to disease or disease-related variables in genetic association studies. In this commentary we discuss logistic regression analysis, neural networks, including the parameter decreasing method (PDM) and genetic programming optimized neural networks (GPNN) and several non-parametric methods, which include the set association approach, combinatorial partitioning method (CPM), restricted partitioning method (RPM), multifactor dimensionality reduction (MDR) method and the random forests approach. The relative strengths and weaknesses of these methods are highlighted. Logistic regression and neural networks can handle only a limited number of predictor variables, depending on the number of observations in the dataset. Therefore, they are less useful than the non-parametric methods to approach association studies with large numbers of predictor variables. GPNN on the other hand may be a useful approach to select and model important predictors, but its performance to select the important effects in the presence of large numbers of predictors needs to be examined. Both the set association approach and random forests approach are able to handle a large number of predictors and are useful in reducing these predictors to a subset of predictors with an important contribution to disease. The combinatorial methods give more insight in combination patterns for sets of genetic and/or environmental predictor variables that may be related to the outcome variable. As the non-parametric methods have different strengths and weaknesses we conclude that to approach genetic association studies using the case-control design, the application of a combination of several methods, including the set association approach, MDR and the random forests approach, will likely be a useful strategy to find the important genes and interaction patterns involved in complex diseases.Keywords
This publication has 44 references indexed in Scilit:
- A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibilityJournal of Theoretical Biology, 2006
- Combinatorial PharmacogeneticsNature Reviews Drug Discovery, 2005
- Identifying SNPs predictive of phenotype using random forestsGenetic Epidemiology, 2004
- Genetics, statistics and human disease: analytical retooling for complexityTrends in Genetics, 2004
- Mathematical multi-locus approaches to localizing complex human trait genesNature Reviews Genetics, 2003
- Set Association Analysis of SNP Case-Control and Microarray DataJournal of Computational Biology, 2003
- Power of multifactor dimensionality reduction for detecting gene‐gene interactions in the presence of genotyping error, missing data, phenocopy, and genetic heterogeneityGenetic Epidemiology, 2003
- Trimming, Weighting, and Grouping SNPs in Human Case-Control Association StudiesGenome Research, 2001
- Statistical multilocus methods for disequilibrium analysis in complex traitsHuman Mutation, 2001
- A Combinatorial Partitioning Method to Identify Multilocus Genotypic Partitions That Predict Quantitative Trait VariationGenome Research, 2001