Probabilistic Disease Classification of Expression-Dependent Proteomic Data from Mass Spectrometry of Human Serum
- 1 December 2003
- journal article
- research article
- Published by Mary Ann Liebert Inc in Journal of Computational Biology
- Vol. 10 (6) , 925-946
- https://doi.org/10.1089/106652703322756159
Abstract
We have developed an algorithm called Q5 for probabilistic classification of healthy versus disease whole serum samples using mass spectrometry. The algorithm employs principal components analysis (PCA) followed by linear discriminant analysis (LDA) on whole spectrum surface-enhanced laser desorption/ionization time of flight (SELDI-TOF) mass spectrometry (MS) data and is demonstrated on four real datasets from complete, complex SELDI spectra of human blood serum. Q5 is a closed-form, exact solution to the problem of classification of complete mass spectra of a complex protein mixture. Q5 employs a probabilistic classification algorithm built upon a dimension-reduced linear discriminant analysis. Our solution is computationally efficient; it is noniterative and computes the optimal linear discriminant using closed-form equations. The optimal discriminant is computed and verified for datasets of complete, complex SELDI spectra of human blood serum. Replicate experiments of different training/testing splits of each dataset are employed to verify robustness of the algorithm. The probabilistic classification method achieves excellent performance. We achieve sensitivity, specificity, and positive predictive values above 97% on three ovarian cancer datasets and one prostate cancer dataset. The Q5 method outperforms previous full-spectrum complex sample spectral classification techniques and can provide clues as to the molecular identities of differentially expressed proteins and peptides.Keywords
This publication has 27 references indexed in Scilit:
- Quantitative proteome analysis by solid-phase isotope tagging and mass spectrometryNature Biotechnology, 2002
- Complementary Profiling of Gene Expression at the Transcriptome and Proteome Levels in Saccharomyces cerevisiaeMolecular & Cellular Proteomics, 2002
- Differentiation of betamethasone and dexamethasone using liquid chromatography/positive electrospray tandem mass spectrometry and multivariate statistical analysisJournal of Mass Spectrometry, 2001
- Mutation-Tolerant Protein Identification by Mass SpectrometryJournal of Computational Biology, 2000
- Probability-based protein identification by searching sequence databases using mass spectrometry dataElectrophoresis, 1999
- De NovoPeptide Sequencing via Tandem Mass SpectrometryJournal of Computational Biology, 1999
- Characterisation of intact microorganisms using electrospray ionisation mass spectrometryFEMS Microbiology Letters, 1999
- Rapid identification of urinary tract infection bacteria using hyperspectral whole-organism fingerprinting and artificial neural networksMicrobiology, 1998
- New desorption strategies for the mass spectrometric analysis of macromoleculesRapid Communications in Mass Spectrometry, 1993
- Laser desorption ionization of proteins with molecular masses exceeding 10,000 daltonsAnalytical Chemistry, 1988