A Statistical Model for Identifying Proteins by Tandem Mass Spectrometry
Top Cited Papers
- 15 July 2003
- journal article
- research article
- Published by American Chemical Society (ACS) in Analytical Chemistry
- Vol. 75 (17) , 4646-4658
- https://doi.org/10.1021/ac0341261
Abstract
A statistical model is presented for computing probabilities that proteins are present in a sample on the basis of peptides assigned to tandem mass (MS/MS) spectra acquired from a proteolytic digest of the sample. Peptides that correspond to more than a single protein in the sequence database are apportioned among all corresponding proteins, and a minimal protein list sufficient to account for the observed peptide assignments is derived using the expectation−maximization algorithm. Using peptide assignments to spectra generated from a sample of 18 purified proteins, as well as complex H. influenzae and Halobacterium samples, the model is shown to produce probabilities that are accurate and have high power to discriminate correct from incorrect protein identifications. This method allows filtering of large-scale proteomics data sets with predictable sensitivity and false positive identification error rates. Fast, consistent, and transparent, it provides a standard for publishing large-scale protein identification data sets in the literature and for comparing the results obtained from different experiments.Keywords
This publication has 32 references indexed in Scilit:
- Analysis of the Plasmodium falciparum proteome by high-accuracy mass spectrometryNature, 2002
- A proteomic view of the Plasmodium falciparum life cycleNature, 2002
- Empirical Statistical Model To Estimate the Accuracy of Peptide Identifications Made by MS/MS and Database SearchAnalytical Chemistry, 2002
- Experimental Protein Mixture for Validating Tandem Mass Spectral AnalysisOMICS: A Journal of Integrative Biology, 2002
- What does it mean to identify a protein in proteomics?Trends in Biochemical Sciences, 2002
- Functional organization of the yeast proteome by systematic analysis of protein complexesNature, 2002
- Alternative nucleotide incision repair pathway for oxidative DNA damageNature, 2002
- Charting the Proteomes of Organisms with Unsequenced Genomes by MALDI-Quadrupole Time-of-Flight Mass Spectrometry and BLAST Homology SearchingAnalytical Chemistry, 2001
- Probability-based protein identification by searching sequence databases using mass spectrometry dataElectrophoresis, 1999
- Error-Tolerant Identification of Peptides in Sequence Databases by Peptide Sequence TagsAnalytical Chemistry, 1994