Statistical Calibration of the SEQUEST XCorr Function
- 10 March 2009
- journal article
- research article
- Published by American Chemical Society (ACS) in Journal of Proteome Research
- Vol. 8 (4) , 2106-2113
- https://doi.org/10.1021/pr8011107
Abstract
Obtaining accurate peptide identifications from shotgun proteomics liquid chromatography tandem mass spectrometry (LC-MS/MS) experiments requires a score function that consistently ranks correct peptide−spectrum matches (PSMs) above incorrect matches. We have observed that, for the Sequest score function Xcorr, the inability to discriminate between correct and incorrect PSMs is due in part to spectrum-specific properties of the score distribution. In other words, some spectra score well regardless of which peptides they are scored against, and other spectra score well because they are scored against a large number of peptides. We describe a protocol for calibrating PSM score functions, and we demonstrate its application to Xcorr and the preliminary Sequest score function Sp. The protocol accounts for spectrum- and peptide-specific effects by calculating p values for each spectrum individually, using only that spectrum’s score distribution. We demonstrate that these calculated p values are uniform under a null distribution and therefore accurately measure significance. These p values can be used to estimate the false discovery rate, therefore, eliminating the need for an extra search against a decoy database. In addition, we show that the p values are better calibrated than their underlying scores; consequently, when ranking top-scoring PSMs from multiple spectra, p values are better at discriminating between correct and incorrect PSMs. The calibration protocol is generally applicable to any PSM score function for which an appopriate parametric family can be identified.Keywords
This publication has 24 references indexed in Scilit:
- Estimating the Statistical Significance of Peptide Identifications from Shotgun Proteomics ExperimentsJournal of Proteome Research, 2007
- InsPecT: Identification of Posttranslationally Modified Peptides from Tandem Mass SpectraAnalytical Chemistry, 2005
- Statistical Model for Large-Scale Peptide Identification in Databases from Tandem Mass Spectra Using SEQUESTAnalytical Chemistry, 2004
- Statistical significance for genomewide studiesProceedings of the National Academy of Sciences, 2003
- A Hypergeometric Probability Model for Protein Identification and Validation Using Tandem Mass Spectral Data and Protein Sequence DatabasesAnalytical Chemistry, 2003
- ProbID: A probabilistic algorithm to identify peptides through sequence database searching using tandem mass spectral dataProteomics, 2002
- Empirical Statistical Model To Estimate the Accuracy of Peptide Identifications Made by MS/MS and Database SearchAnalytical Chemistry, 2002
- SCOPE: a probabilistic model for scoring tandem mass spectra against a peptide databaseBioinformatics, 2001
- Large-scale analysis of the yeast proteome by multidimensional protein identification technologyNature Biotechnology, 2001
- An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein databaseJournal of the American Society for Mass Spectrometry, 1994