Discriminant models for high‐throughput proteomics mass spectrometer data
- 9 September 2003
- journal article
- research article
- Published by Wiley in Proteomics
- Vol. 3 (9) , 1699-1703
- https://doi.org/10.1002/pmic.200300518
Abstract
We use several different multivariate analysis methods to discriminate between diseased and healthy patients using protein mass spectrometer data provided by Duke University. Two problems were presented by the university; one in which the responses (diseased or healthy) of the patients were not known and second, when the responses were known. In the latter case, the data can be used as a ‘training’ set. We attempted both problems. In particular, we use principle component analysis along with clustering methods to discriminate for the first problem set and partial least squares coupled with logistic and discriminant methods when the responses were known. In addition, we were able to detect regions of interest in the spectrum where there were differences in the protein patterns between healthy and diseased patients. There was considerable effort involved in the preprocessing of the data. We used a binning approach to reduce the number of variables rather than peak heights or peak areas. We performed a square root transformation on the data to help stabilize the variance; this in turn made a significant improvement in clustering results.Keywords
This publication has 4 references indexed in Scilit:
- Multi-class cancer classification via partial least squares with gene expression profilesBioinformatics, 2002
- A variance-stabilizing transformation for gene-expression microarray dataBioinformatics, 2002
- Tumor classification by partial least squares using microarray gene expression dataBioinformatics, 2002
- Classification of Acute Leukemia Based on DNA Microarray Gene Expressions Using Partial Least SquaresPublished by Springer Nature ,2002