Improving Sensitivity by Probabilistically Combining Results from Multiple MS/MS Search Methodologies

1 January 2008

journal article
research article
Published by American Chemical Society (ACS) in Journal of Proteome Research

Vol. 7 (1) , 245-253
https://doi.org/10.1021/pr070540w

Abstract

Database-searching programs generally identify only a fraction of the spectra acquired in a standard LC/MS/MS study of digested proteins. Subtle variations in database-searching algorithms for assigning peptides to MS/MS spectra have been known to provide different identification results. To leverage this variation, a probabilistic framework is developed for combining the results of multiple search engines. The scores for each search engine are first independently converted into peptide probabilities. These probabilities can then be readily combined across search engines using Bayesian rules and the expectation maximization learning algorithm. A significant gain in the number of peptides identified with high confidence with each additional search engine is demonstrated using several data sets of increasing complexity, from a control protein mixture to a human plasma sample, searched using SEQUEST, Mascot, and X! Tandem database-searching programs. The increased rate of peptide assignments also translates into a substantially larger number of protein identifications in LC/MS/MS studies compared to a typical analysis using a single database-search tool.

Keywords

This publication has 32 references indexed in Scilit:

Improved Validation of Peptide MS/MS Assignments Using Spectral Intensity Prediction
Molecular & Cellular Proteomics, 2007
Interpretation of Shotgun Proteomic Data
Molecular & Cellular Proteomics, 2005
Open Mass Spectrometry Search Algorithm
Journal of Proteome Research, 2004
Analysis, statistical validation and dissemination of large-scale proteomics datasets generated by tandem MS
Drug Discovery Today, 2004
A method for reducing the time required to match protein sequences with tandem mass spectra
Rapid Communications in Mass Spectrometry, 2003
A Hypergeometric Probability Model for Protein Identification and Validation Using Tandem Mass Spectral Data and Protein Sequence Databases
Analytical Chemistry, 2003
Mass spectrometry-based proteomics
Nature, 2003
Probability-based protein identification by searching sequence databases using mass spectrometry data
Electrophoresis, 1999
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
Nucleic Acids Research, 1997
An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database
Journal of the American Society for Mass Spectrometry, 1994