A Ranking-Based Scoring Function for Peptide−Spectrum Matches
- 21 February 2009
- journal article
- research article
- Published by American Chemical Society (ACS) in Journal of Proteome Research
- Vol. 8 (5) , 2241-2252
- https://doi.org/10.1021/pr800678b
Abstract
The analysis of the large volume of tandem mass spectrometry (MS/MS) proteomics data that is generated these days relies on automated algorithms that identify peptides from their mass spectra. An essential component of these algorithms is the scoring function used to evaluate the quality of peptide−spectrum matches (PSMs). In this paper, we present new approach to scoring of PSMs. We argue that since this problem is at its core a ranking task (especially in the case of de novo sequencing), it can be solved effectively using machine learning ranking algorithms. We developed a new discriminative boosting-based approach to scoring. Our scoring models draw upon a large set of diverse feature functions that measure different qualities of PSMs. Our method improves the performance of our de novo sequencing algorithm beyond the current state-of-the-art, and also greatly enhances the performance of database search programs. Furthermore, by increasing the efficiency of tag filtration and improving the sensitivity of PSM scoring, we make it practical to perform large-scale MS/MS analysis, such as proteogenomic search of a six-frame translation of the human genome (in which we achieve a reduction of the running time by a factor of 15 and a 60% increase in the number of identified peptides, compared to the InsPecT database search tool). Our scoring function is incorporated into PepNovo+ which is available for download or can be run online at http://bix.ucsd.edu.Keywords
This publication has 78 references indexed in Scilit:
- Spectral Probabilities and Generating Functions of Tandem Mass Spectra: A Strike against Decoy DatabasesJournal of Proteome Research, 2008
- Phosphorylation-Specific MS/MS Scoring for Rapid and Accurate Phosphoproteome AnalysisJournal of Proteome Research, 2008
- The Standard Protein Mix Database: A Diverse Data Set To Assist in the Production of Improved Peptide and Protein Identification Software ToolsJournal of Proteome Research, 2007
- De Novo Peptide Sequencing and Identification with Precision Mass SpectrometryJournal of Proteome Research, 2006
- PepHMM: A Hidden Markov Model Based Scoring Function for Mass Spectrometry Database SearchAnalytical Chemistry, 2005
- NovoHMM: A Hidden Markov Model for de Novo Peptide SequencingAnalytical Chemistry, 2005
- Proteogenomic mapping as a complementary method to perform genome annotationProteomics, 2004
- Empirical Statistical Model To Estimate the Accuracy of Peptide Identifications Made by MS/MS and Database SearchAnalytical Chemistry, 2002
- Charting the Proteomes of Organisms with Unsequenced Genomes by MALDI-Quadrupole Time-of-Flight Mass Spectrometry and BLAST Homology SearchingAnalytical Chemistry, 2001
- An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein databaseJournal of the American Society for Mass Spectrometry, 1994