Statistical Validation of Peptide Identifications in Large-Scale Proteomics Using the Target-Decoy Database Search Strategy and Flexible Mixture Modeling

14 December 2007

journal article
research article
Published by American Chemical Society (ACS) in Journal of Proteome Research

Vol. 7 (1) , 286-292
https://doi.org/10.1021/pr7006818

Abstract

Reliable statistical validation of peptide and protein identifications is a top priority in large-scale mass spectrometry based proteomics. PeptideProphet is one of the computational tools commonly used for assessing the statistical confidence in peptide assignments to tandem mass spectra obtained using database search programs such as SEQUEST, MASCOT, or X! TANDEM. We present two flexible methods, the variable component mixture model and the semiparametric mixture model, that remove the restrictive parametric assumptions in the mixture modeling approach of PeptideProphet. Using a control protein mixture data set generated on an linear ion trap Fourier transform (LTQ-FT) mass spectrometer, we demonstrate that both methods improve parametric models in terms of the accuracy of probability estimates and the power to detect correct identifications controlling the false discovery rate to the same degree. The statistical approaches presented here require that the data set contain a sufficient number of decoy (known to be incorrect) peptide identifications, which can be obtained using the target-decoy database search strategy.

Keywords

This publication has 23 references indexed in Scilit:

Semisupervised Model-Based Validation of Peptide Identifications in Mass Spectrometry-Based Proteomics
Journal of Proteome Research, 2007
Analysis and validation of proteomic data generated by tandem mass spectrometry
Nature Methods, 2007
Open Mass Spectrometry Search Algorithm
Journal of Proteome Research, 2004
The Need for Guidelines in Publication of Peptide and Protein Identification Data
Molecular & Cellular Proteomics, 2004
TANDEM: matching proteins with tandem mass spectra
Bioinformatics, 2004
Mass spectrometry-based proteomics
Nature, 2003
A Method for Assessing the Statistical Significance of Mass Spectrometry-Based Protein Identifications Using General Scoring Schemes
Analytical Chemistry, 2003
Empirical Statistical Model To Estimate the Accuracy of Peptide Identifications Made by MS/MS and Database Search
Analytical Chemistry, 2002
Empirical Bayes Analysis of a Microarray Experiment
Journal of the American Statistical Association, 2001
Probability-based protein identification by searching sequence databases using mass spectrometry data
Electrophoresis, 1999