Rapid and Accurate Peptide Identification from Tandem Mass Spectra
- 28 May 2008
- journal article
- research article
- Published by American Chemical Society (ACS) in Journal of Proteome Research
- Vol. 7 (7) , 3022-3027
- https://doi.org/10.1021/pr800127y
Abstract
Mass spectrometry, the core technology in the field of proteomics, promises to enable scientists to identify and quantify the entire complement of proteins in a complex biological sample. Currently, the primary bottleneck in this type of experiment is computational. Existing algorithms for interpreting mass spectra are slow and fail to identify a large proportion of the given spectra. We describe a database search program called Crux that reimplements and extends the widely used database search program Sequest. For speed, Crux uses a peptide indexing scheme to rapidly retrieve candidate peptides for a given spectrum. For each peptide in the target database, Crux generates shuffled decoy peptides on the fly, providing a good null model and, hence, accurate false discovery rate estimates. Crux also implements two recently described postprocessing methods: a p value calculation based upon fitting a Weibull distribution to the observed scores, and a semisupervised method that learns to discriminate between target and decoy matches. Both methods significantly improve the overall rate of peptide identification. Crux is implemented in C and is distributed with source code freely to noncommercial users.Keywords
This publication has 24 references indexed in Scilit:
- Statistical Calibration of the SEQUEST XCorr FunctionJournal of Proteome Research, 2009
- Assigning Significance to Peptides Identified by Tandem Mass Spectrometry Using Decoy DatabasesJournal of Proteome Research, 2007
- The Standard Protein Mix Database: A Diverse Data Set To Assist in the Production of Improved Peptide and Protein Identification Software ToolsJournal of Proteome Research, 2007
- Computational prediction of proteotypic peptides for quantitative proteomicsNature Biotechnology, 2006
- Intensity-based protein identification by machine learning from a library of tandem mass spectraNature Biotechnology, 2004
- Statistical significance for genomewide studiesProceedings of the National Academy of Sciences, 2003
- Empirical Statistical Model To Estimate the Accuracy of Peptide Identifications Made by MS/MS and Database SearchAnalytical Chemistry, 2002
- A Direct Approach to False Discovery RatesJournal of the Royal Statistical Society Series B: Statistical Methodology, 2002
- Probability-based protein identification by searching sequence databases using mass spectrometry dataElectrophoresis, 1999
- An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein databaseJournal of the American Society for Mass Spectrometry, 1994