Data mining of the GAW14 simulated data using rough set theory and tree-based methods

Open Access

30 December 2005

journal article
Published by Springer Nature in BMC Genomic Data

Vol. 6 (S1) , S133
https://doi.org/10.1186/1471-2156-6-s1-s133

Abstract

Rough set theory and decision trees are data mining methods used for dealing with vagueness and uncertainty. They have been utilized to unearth hidden patterns in complicated datasets collected for industrial processes. The Genetic Analysis Workshop 14 simulated data were generated using a system that implemented multiple correlations among four consequential layers of genetic data (disease-related loci, endophenotypes, phenotypes, and one disease trait). When information of one layer was blocked and uncertainty was created in the correlations among these layers, the correlation between the first and last layers (susceptibility genes and the disease trait in this case), was not easily directly detected. In this study, we proposed a two-stage process that applied rough set theory and decision trees to identify genes susceptible to the disease trait. During the first stage, based on phenotypes of subjects and their parents, decision trees were built to predict trait values. Phenotypes retained in the decision trees were then advanced to the second stage, where rough set theory was applied to discover the minimal subsets of genes associated with the disease trait. For comparison, decision trees were also constructed to map susceptible genes during the second stage. Our results showed that the decision trees of the first stage had accuracy rates of about 99% in predicting the disease trait. The decision trees and rough set theory failed to identify the true disease-related loci.

Keywords

This publication has 5 references indexed in Scilit:

Rough set theory: a data mining tool for semiconductor manufacturing
IEEE Transactions on Electronics Packaging Manufacturing, 2001
Use of classification trees for association studies
Genetic Epidemiology, 2000
Business failure prediction using rough sets
European Journal of Operational Research, 1999
A comparison of decision tree classifiers with backpropagation neural networks for multimodal classification problems
Pattern Recognition, 1993
Rough sets
International Journal of Parallel Programming, 1982