Proteomic mass spectra classification using decision tree based ensemble methods
Open Access
- 12 May 2005
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 21 (14) , 3138-3145
- https://doi.org/10.1093/bioinformatics/bti494
Abstract
Motivation: Modern mass spectrometry allows the determination of proteomic fingerprints of body fluids like serum, saliva or urine. These measurements can be used in many medical applications in order to diagnose the current state or predict the evolution of a disease. Recent developments in machine learning allow one to exploit such datasets, characterized by small numbers of very high-dimensional samples. Results: We propose a systematic approach based on decision tree ensemble methods, which is used to automatically determine proteomic biomarkers and predictive models. The approach is validated on two datasets of surface-enhanced laser desorption/ionization time of flight measurements, for the diagnosis of rheumatoid arthritis and inflammatory bowel diseases. The results suggest that the methodology can handle a broad class of similar problems. Supplementary information: Additional tables, appendicies and datasets may be found at http://www.montefiore.ulg.ac.be/~geurts/Papers/Proteomic-suppl.html Contact:p.geurts@ulg.ac.beKeywords
This publication has 13 references indexed in Scilit:
- SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivationNature Genetics, 2008
- Application of the Random Forest Classification Algorithm to a SELDI‐TOF Proteomics Study in the Setting of a Cancer Prevention TrialAnnals of the New York Academy of Sciences, 2004
- Discovery of significant rules for classifying cancer diagnosis dataBioinformatics, 2003
- Comparison of statistical methods for classification of ovarian cancer using mass spectrometry dataBioinformatics, 2003
- Mass spectrometry-based clinical proteomicsPharmacogenomics, 2003
- Selection bias in gene extraction on the basis of microarray gene-expression dataProceedings of the National Academy of Sciences, 2002
- Random ForestsMachine Learning, 2001
- An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and RandomizationMachine Learning, 2000
- An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and VariantsMachine Learning, 1999
- A desicion-theoretic generalization of on-line learning and an application to boostingPublished by Springer Nature ,1995