Cell and tumor classification using gene expression data: Construction of forests
- 17 March 2003
- journal article
- research article
- Published by Proceedings of the National Academy of Sciences in Proceedings of the National Academy of Sciences
- Vol. 100 (7) , 4168-4172
- https://doi.org/10.1073/pnas.0230559100
Abstract
The advent of gene chips has led to a promising technology for cell, tumor, and cancer classification. We exploit and expand the methodology of recursive partitioning trees for tumor and cell classification from microarray gene expression data. To improve classification and prediction accuracy, we introduce a deterministic procedure to form forests of classification trees and compare their performance with extant alternatives. When two published and commonly used data sets are used, we find that the deterministic forests perform similarly to the random forests in terms of the error rate obtained from the leave-one-out procedure, and all of the forests are far better than the single trees. In addition, we provide graphical presentations to facilitate interpretation of complex forests and compare our findings with the current biological literature. In addition to numerical improvement, the main advantage of deterministic forests is reproducibility and scientific interpretability of all steps in tree construction.Keywords
This publication has 26 references indexed in Scilit:
- SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivationNature Genetics, 2008
- Diagnosis of multiple cancer types by shrunken centroids of gene expressionProceedings of the National Academy of Sciences, 2002
- Acute myelogenous leukemia: Advances and limitations of treatmentOral Surgery Oral Medicine Oral Pathology Oral Radiology and Endodontology, 2002
- Tree-based analysis of microarray data for classifying breast cancerFrontiers in Bioscience-Landmark, 2002
- Cyclin D3 is a target gene of t(6;14)(p21.1;q32.3) of mature B-cell malignanciesBlood, 2001
- Recursive partitioning for tumor classification with gene expression microarray dataProceedings of the National Academy of Sciences, 2001
- Distinct types of diffuse large B-cell lymphoma identified by gene expression profilingNature, 2000
- Molecular genetic analysis of the gene encoding the trifunctional enzyme MTHFD (methylenetetrahydrofolate‐dehydrogenase, methenyltetrahydrofolate‐cyclohydrolase, formyltetrahydrofolate synthetase) in patients with neural tube defectsClinical Genetics, 1998
- Exon/intron structure of the human AF‐4 gene, a member of the AF‐4/LAF‐4/FMR‐2 gene family coding for a nuclear protein with structural alterations in acute leukaemiaBritish Journal of Haematology, 1997