Semisupervised Model-Based Validation of Peptide Identifications in Mass Spectrometry-Based Proteomics
- 27 December 2007
- journal article
- research article
- Published by American Chemical Society (ACS) in Journal of Proteome Research
- Vol. 7 (1) , 254-265
- https://doi.org/10.1021/pr070542g
Abstract
Development of robust statistical methods for validation of peptide assignments to tandem mass (MS/MS) spectra obtained using database searching remains an important problem. PeptideProphet is one of the commonly used computational tools available for that purpose. An alternative simple approach for validation of peptide assignments is based on addition of decoy (reversed, randomized, or shuffled) sequences to the searched protein sequence database. The probabilistic modeling approach of PeptideProphet and the decoy strategy can be combined within a single semisupervised framework, leading to improved robustness and higher accuracy of computed probabilities even in the case of most challenging data sets. We present a semisupervised expectation-maximization (EM) algorithm for constructing a Bayes classifier for peptide identification using the probability mixture model, extending PeptideProphet to incorporate decoy peptide matches. Using several data sets of varying complexity, from control protein mixtures to a human plasma sample, and using three commonly used database search programs, SEQUEST, MASCOT, and TANDEM/k-score, we illustrate that more accurate mixture estimation leads to an improved control of the false discovery rate in the classification of peptide assignments.Keywords
This publication has 38 references indexed in Scilit:
- Analysis and validation of proteomic data generated by tandem mass spectrometryNature Methods, 2007
- The Standard Protein Mix Database: A Diverse Data Set To Assist in the Production of Improved Peptide and Protein Identification Software ToolsJournal of Proteome Research, 2007
- Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometryNature Methods, 2007
- MyriMatch: Highly Accurate Tandem Mass Spectral Peptide Identification by Multivariate Hypergeometric AnalysisJournal of Proteome Research, 2007
- Multiplexed Protein Quantitation in Saccharomyces cerevisiae Using Amine-reactive Isobaric Tagging ReagentsMolecular & Cellular Proteomics, 2004
- Statistical significance for genomewide studiesProceedings of the National Academy of Sciences, 2003
- Mass spectrometry-based proteomicsNature, 2003
- Empirical Statistical Model To Estimate the Accuracy of Peptide Identifications Made by MS/MS and Database SearchAnalytical Chemistry, 2002
- Probability-based protein identification by searching sequence databases using mass spectrometry dataElectrophoresis, 1999
- An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein databaseJournal of the American Society for Mass Spectrometry, 1994