Improvements to the Percolator Algorithm for Peptide Identification from Shotgun Proteomics Data Sets
Top Cited Papers
- 23 April 2009
- journal article
- research article
- Published by American Chemical Society (ACS) in Journal of Proteome Research
- Vol. 8 (7) , 3737-3745
- https://doi.org/10.1021/pr801109k
Abstract
Shotgun proteomics coupled with database search software allows the identification of a large number of peptides in a single experiment. However, some existing search algorithms, such as SEQUEST, use score functions that are designed primarily to identify the best peptide for a given spectrum. Consequently, when comparing identifications across spectra, the SEQUEST score function Xcorr fails to discriminate accurately between correct and incorrect peptide identifications. Several machine learning methods have been proposed to address the resulting classification task of distinguishing between correct and incorrect peptide-spectrum matches (PSMs). A recent example is Percolator, which uses semisupervised learning and a decoy database search strategy to learn to distinguish between correct and incorrect PSMs identified by a database search algorithm. The current work describes three improvements to Percolator. (1) Percolator’s heuristic optimization is replaced with a clear objective function, with intuitive reasons behind its choice. (2) Tractable nonlinear models are used instead of linear models, leading to improved accuracy over the original Percolator. (3) A method, Q-ranker, for directly optimizing the number of identified spectra at a specified q value is proposed, which achieves further gains.Keywords
This publication has 24 references indexed in Scilit:
- Accurate and Sensitive Peptide Identification with Mascot PercolatorJournal of Proteome Research, 2009
- Adaptive Discriminant Function Analysis and Reranking of MS/MS Database Search Results for Improved Peptide Identification in Shotgun ProteomicsJournal of Proteome Research, 2008
- Semisupervised Model-Based Validation of Peptide Identifications in Mass Spectrometry-Based ProteomicsJournal of Proteome Research, 2007
- Statistical Validation of Peptide Identifications in Large-Scale Proteomics Using the Target-Decoy Database Search Strategy and Flexible Mixture ModelingJournal of Proteome Research, 2007
- Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometryNature Methods, 2007
- Intensity-based protein identification by machine learning from a library of tandem mass spectraNature Biotechnology, 2004
- OLAV: Towards high‐throughput tandem mass spectrometry data identificationProteomics, 2003
- A New Algorithm for the Evaluation of Shotgun Peptide Sequencing in Proteomics: Support Vector Machine Classification of Peptide MS/MS Spectra and SEQUEST ScoresJournal of Proteome Research, 2002
- An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein databaseJournal of the American Society for Mass Spectrometry, 1994
- The meaning and use of the area under a receiver operating characteristic (ROC) curve.Radiology, 1982