Integrated data management and validation platform for phosphorylated tandem mass spectrometry data
Open Access
- 8 September 2010
- journal article
- research article
- Published by Wiley in Proteomics
- Vol. 10 (19) , 3515-3524
- https://doi.org/10.1002/pmic.200900727
Abstract
MS/MS is a widely used method for proteome‐wide analysis of protein expression and PTMs. The thousands of MS/MS spectra produced from a single experiment pose a major challenge for downstream analysis. Standard programs, such as MASCOT, provide peptide assignments for many of the spectra, including identification of PTM sites, but these results are plagued by false‐positive identifications. In phosphoproteomic experiments, only a single peptide assignment is typically available to support identification of each phosphorylation site, and hence minimizing false positives is critical. Thus, tedious manual validation is often required to increase confidence in the spectral assignments. We have developed phoMSVal, an open‐source platform for managing MS/MS data and automatically validating identified phosphopeptides. We tested five classification algorithms with 17 extracted features to separate correct peptide assignments from incorrect ones using over 2600 manually curated spectra. The naïve Bayes algorithm was among the best classifiers with an AUC value of 97% and PPV of 97% for phosphotyrosine data. This classifier required only three features to achieve a 76% decrease in false positives as compared with MASCOT while retaining 97% of true positives. This algorithm was able to classify an independent phosphoserine/threonine data set with AUC value of 93% and PPV of 91%, demonstrating the applicability of this method for all types of phospho‐MS/MS data. PhoMSVal is available at http://csbi.ltdk.helsinki.fi/phomsval.Keywords
This publication has 38 references indexed in Scilit:
- Deterministic protein inference for shotgun proteomics data provides new insights into Arabidopsis pollen development and functionGenome Research, 2009
- Mascot-Derived False Positive Peptide Identifications Revealed by Manual Analysis of Tandem Mass SpectraJournal of Proteome Research, 2009
- Quantitative phosphoproteomic analysis of signaling network dynamicsCurrent Opinion in Biotechnology, 2008
- 2DB: a Proteomics database for storage, analysis, presentation, and retrieval of information from mass spectrometric experimentsBMC Bioinformatics, 2008
- PhosphoScore: An Open-Source Phosphorylation Site Assignment Tool for MSn DataJournal of Proteome Research, 2008
- PHOSIDA (phosphorylation site database): management, structural and evolutionary investigation, and prediction of phosphositesGenome Biology, 2007
- Automatic Validation of Phosphopeptide Identifications from Tandem Mass SpectraAnalytical Chemistry, 2007
- Multiplexed Protein Quantitation in Saccharomyces cerevisiae Using Amine-reactive Isobaric Tagging ReagentsMolecular & Cellular Proteomics, 2004
- Automatic Quality Assessment of Peptide Tandem Mass SpectraBioinformatics, 2004
- Intensity-based protein identification by machine learning from a library of tandem mass spectraNature Biotechnology, 2004