Data mining

8 December 2005

journal article
gaw14 workshop-summary
Published by Wiley in Genetic Epidemiology

Vol. 29 (S1) , S103-S109
https://doi.org/10.1002/gepi.20117

Abstract

Group 14 used data-mining strategies to evaluate a number of issues, including appropriate diagnosis, haplotype estimation, genetic linkage and association studies, and type I error. Methods ranged from exploratory analyses, to machine learning strategies (neural networks, supervised learning, and tree-based methods), to false discovery rate control of type I errors. The general motivations were to find the “story” in the data and to summarize information from a multitude of measures. Several methods illustrated strategies for better trait definition, using summarization of related traits. In the few studies that sought to identify genes for alcoholism, there was little agreement among the different strategies, likely reflecting the complexities of the disease. Nevertheless, Group 14 found that these methods offered strategies to gain a better understanding of the complex pathways by which disease develops. Genet. Epidemiol. 29(Suppl. 1):S103–S109, 2005.

Keywords

This publication has 18 references indexed in Scilit:

Analysis of alcoholism data using support vector machines
BMC Genomic Data, 2005
Data mining of the GAW14 simulated data using rough set theory and tree-based methods
BMC Genomic Data, 2005
Boosting alternating decision trees modeling of disease trait information
BMC Genomic Data, 2005
Diagnosis of alcoholism based on neural network analysis of phenotypic risk factors
BMC Genomic Data, 2005
Power and type I error rate of false discovery rate approaches in genome-wide association studies
BMC Genomic Data, 2005
Whole-genome association studies on alcoholism comparing different phenotypes using single-nucleotide polymorphisms and microsatellites
BMC Genomic Data, 2005
An artificial neural network for estimating haplotype frequencies
BMC Genomic Data, 2005
Large-Scale Simultaneous Hypothesis Testing
Journal of the American Statistical Association, 2004
A Direct Approach to False Discovery Rates
Journal of the Royal Statistical Society Series B: Statistical Methodology, 2002
A Unified Approach to Adjusting Association Tests for Population Admixture with Arbitrary Pedigree Structure and Arbitrary Missing Marker Information
Human Heredity, 2000