Statistical Model for Large-Scale Peptide Identification in Databases from Tandem Mass Spectra Using SEQUEST
- 2 November 2004
- journal article
- Published by American Chemical Society (ACS) in Analytical Chemistry
- Vol. 76 (23) , 6853-6860
- https://doi.org/10.1021/ac049305c
Abstract
Recent technological advances have made multidimensional peptide separation techniques coupled with tandem mass spectrometry the method of choice for high-throughput identification of proteins. Due to these advances, the development of software tools for large-scale, fully automated, unambiguous peptide identification is highly necessary. In this work, we have used as a model the nuclear proteome from Jurkat cells and present a processing algorithm that allows accurate predictions of random matching distributions, based on the two SEQUEST scores Xcorr and DeltaCn. Our method permits a very simple and precise calculation of the probabilities associated with individual peptide assignments, as well as of the false discovery rate among the peptides identified in any experiment. A further mathematical analysis demonstrates that the score distributions are highly dependent on database size and precursor mass window and suggests that the probability associated with SEQUEST scores depends on the number of candidate peptide sequences available for the search. Our results highlight the importance of adjusting the filtering criteria to discriminate between correct and incorrect peptide sequences according to the circumstances of each particular experiment.Keywords
This publication has 9 references indexed in Scilit:
- A computational method for assessing peptide‐ identification reliability in tandem mass spectrometry analysis with SEQUESTProteomics, 2004
- Intensity-based protein identification by machine learning from a library of tandem mass spectraNature Biotechnology, 2004
- Statistical significance for genomewide studiesProceedings of the National Academy of Sciences, 2003
- A Hypergeometric Probability Model for Protein Identification and Validation Using Tandem Mass Spectral Data and Protein Sequence DatabasesAnalytical Chemistry, 2003
- A New Algorithm for the Evaluation of Shotgun Peptide Sequencing in Proteomics: Support Vector Machine Classification of Peptide MS/MS Spectra and SEQUEST ScoresJournal of Proteome Research, 2002
- A proteomic view of the Plasmodium falciparum life cycleNature, 2002
- Empirical Statistical Model To Estimate the Accuracy of Peptide Identifications Made by MS/MS and Database SearchAnalytical Chemistry, 2002
- Analysis of Quantitative Proteomic Data Generated via Multidimensional Protein Identification TechnologyAnalytical Chemistry, 2002
- Vascular Endothelial Growth Factor Activates Nuclear Factor of Activated T Cells in Human Endothelial Cells: a Role for Tissue Factor Gene ExpressionMolecular and Cellular Biology, 1999