Central Limit Theorem as an Approximation for Intensity-Based Scoring Function

24 November 2005

journal article
research article
Published by American Chemical Society (ACS) in Analytical Chemistry

Vol. 78 (1) , 89-95
https://doi.org/10.1021/ac051206r

Abstract

In this paper, we present an intensity-based probability function to identify peptides from tandem mass spectra and amino acid sequence databases. The function is an approximation to the central limiting theorem, and it explicitly depends on the cumulative product ion intensities, number of product ions of a peptide, and expectation value of the cumulative intensity. We compare the results of database searches using the new scoring function and scoring functions from earlier algorithms, which implement hypergeometric probability, Poisson's model, and cross-correlation scores. For a standard protein mixture (tandem mass spectra generated from the mixture of five known proteins), we generate receiver operating curves with all scoring schemes. The receiver operating curves show that the shared peaks count-based probability methods (like Poisson and hypergeometric models) are the most specific for matching high-quality tandem mass spectra. The intensity-based (central limit model) and intensity-modeled (cross-correlation) methods are more sensitive when matching low-quality tandem mass spectra, where the number of shared peaks is insufficient to correctly identify a peptide. Cross-correlation methods show a small advantage over the intensity-based probability method.

Keywords

This publication has 11 references indexed in Scilit:

Large-scale database searching using tandem mass spectra: Looking up the answer in the back of the book
Nature Methods, 2004
Mass Spectral Analysis in Proteomics
Annual Review of Biophysics, 2004
Improving Reproducibility and Sensitivity in Identifying Human Proteins by Shotgun Proteomics
Analytical Chemistry, 2004
A Hypergeometric Probability Model for Protein Identification and Validation Using Tandem Mass Spectral Data and Protein Sequence Databases
Analytical Chemistry, 2003
MultiTag: Multiple Error-Tolerant Sequence Tag Search for the Sequence-Similarity Identification of Proteins by Mass Spectrometry
Analytical Chemistry, 2003
Evaluation of Multidimensional Chromatography Coupled with Tandem Mass Spectrometry (LC/LC−MS/MS) for Large-Scale Protein Analysis: The Yeast Proteome
Journal of Proteome Research, 2002
Probability-Based Validation of Protein Identifications Using a Modified SEQUEST Algorithm
Analytical Chemistry, 2002
Comparison of three directly coupled HPLC MS/MS strategies for identification of proteins from complex mixtures: single-dimension LC-MS/MS, 2-phase MudPIT, and 3-phase MudPIT
International Journal of Mass Spectrometry, 2002
SCOPE: a probabilistic model for scoring tandem mass spectra against a peptide database
Bioinformatics, 2001
Error-Tolerant Identification of Peptides in Sequence Databases by Peptide Sequence Tags
Analytical Chemistry, 1994