Automatic Validation of Phosphopeptide Identifications from Tandem Mass Spectra
- 13 January 2007
- journal article
- research article
- Published by American Chemical Society (ACS) in Analytical Chemistry
- Vol. 79 (4) , 1301-1310
- https://doi.org/10.1021/ac061334v
Abstract
We developed and compared two approaches for automated validation of phosphopeptide tandem mass spectra identified using database searching algorithms. Phosphopeptide identifications were obtained through SEQUEST searches of a protein database appended with its decoy (reversed sequences). Statistical evaluation and iterative searches were employed to create a high-quality data set of phosphopeptides. Automation of postsearch validation was approached by two different strategies. By using statistical multiple testing, we calculate a p value for each tentative peptide phosphorylation. In a second method, we use a support vector machine (SVM; a machine learning algorithm) binary classifier to predict whether a tentative peptide phosphorylation is true. We show good agreement (85%) between postsearch validation of phosphopeptide/spectrum matches by multiple testing and that from support vector machines. Automatic methods conform very well with manual expert validation in a blinded test. Additionally, the algorithms were tested on the identification of synthetic phosphopeptides. We show that phosphate neutral losses in tandem mass spectra can be used to assess the correctness of phosphopeptide/spectrum matches. An SVM classifier with a radial basis function provided classification accuracy from 95.7% to 96.8% of the positive data set, depending on search algorithm used. Establishing the efficacy of an identification is a necessary step for further postsearch interrogation of the spectra for complete localization of phosphorylation sites. Our current implementation performs validation of phosphoserine/phosphothreonine-containing peptides having one or two phosphorylation sites from data gathered on an ion trap mass spectrometer. The SVM-based algorithm has been implemented in the software package DeBunker. We illustrate the application of the SVM-based software DeBunker on a large phosphorylation data set.Keywords
This publication has 33 references indexed in Scilit:
- A probability-based approach for high-throughput protein phosphorylation analysis and site localizationNature Biotechnology, 2006
- Analysis of protein phosphorylation by mass spectrometryMethods, 2005
- Strategies for shotgun identification of post-translational modifications by mass spectrometryJournal of Chromatography A, 2004
- MS1, MS2, and SQT—three unified, compact, and easily parsed file formats for the storage of shotgun proteomic spectra and identificationsRapid Communications in Mass Spectrometry, 2004
- Automatic Quality Assessment of Peptide Tandem Mass SpectraBioinformatics, 2004
- Impact of Ion Trap Tandem Mass Spectra Variability on the Identification of PeptidesAnalytical Chemistry, 2004
- A method for the comprehensive proteomic analysis of membrane proteinsNature Biotechnology, 2003
- Phosphoproteome analysis by mass spectrometry and its application to Saccharomyces cerevisiaeNature Biotechnology, 2002
- Tyrosine Phosphorylation Mapping of the Epidermal Growth Factor Receptor Signaling PathwayJournal of Biological Chemistry, 2002
- An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein databaseJournal of the American Society for Mass Spectrometry, 1994