Rapid and Accurate Peptide Identification from Tandem Mass Spectra

28 May 2008

journal article
research article
Published by American Chemical Society (ACS) in Journal of Proteome Research

Vol. 7 (7) , 3022-3027
https://doi.org/10.1021/pr800127y

Abstract

Mass spectrometry, the core technology in the field of proteomics, promises to enable scientists to identify and quantify the entire complement of proteins in a complex biological sample. Currently, the primary bottleneck in this type of experiment is computational. Existing algorithms for interpreting mass spectra are slow and fail to identify a large proportion of the given spectra. We describe a database search program called Crux that reimplements and extends the widely used database search program Sequest. For speed, Crux uses a peptide indexing scheme to rapidly retrieve candidate peptides for a given spectrum. For each peptide in the target database, Crux generates shuffled decoy peptides on the fly, providing a good null model and, hence, accurate false discovery rate estimates. Crux also implements two recently described postprocessing methods: a p value calculation based upon fitting a Weibull distribution to the observed scores, and a semisupervised method that learns to discriminate between target and decoy matches. Both methods significantly improve the overall rate of peptide identification. Crux is implemented in C and is distributed with source code freely to noncommercial users.

Keywords

This publication has 24 references indexed in Scilit:

Statistical Calibration of the SEQUEST XCorr Function
Journal of Proteome Research, 2009
Assigning Significance to Peptides Identified by Tandem Mass Spectrometry Using Decoy Databases
Journal of Proteome Research, 2007
The Standard Protein Mix Database: A Diverse Data Set To Assist in the Production of Improved Peptide and Protein Identification Software Tools
Journal of Proteome Research, 2007
Computational prediction of proteotypic peptides for quantitative proteomics
Nature Biotechnology, 2006
Intensity-based protein identification by machine learning from a library of tandem mass spectra
Nature Biotechnology, 2004
Statistical significance for genomewide studies
Proceedings of the National Academy of Sciences, 2003
Empirical Statistical Model To Estimate the Accuracy of Peptide Identifications Made by MS/MS and Database Search
Analytical Chemistry, 2002
A Direct Approach to False Discovery Rates
Journal of the Royal Statistical Society Series B: Statistical Methodology, 2002
Probability-based protein identification by searching sequence databases using mass spectrometry data
Electrophoresis, 1999
An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database
Journal of the American Society for Mass Spectrometry, 1994