Randomized Sequence Databases for Tandem Mass Spectrometry Peptide and Protein Identification
- 1 December 2005
- journal article
- research article
- Published by Mary Ann Liebert Inc in OMICS: A Journal of Integrative Biology
- Vol. 9 (4) , 364-379
- https://doi.org/10.1089/omi.2005.9.364
Abstract
Tandem mass spectrometry (MS/MS) combined with database searching is currently the most widely used method for high-throughput peptide and protein identification. Many different algorithms, scoring criteria, and statistical models have been used to identify peptides and proteins in complex biological samples, and many studies, including our own, describe the accuracy of these identifications, using at best generic terms such as "high confidence." False positive identification rates for these criteria can vary substantially with changing organisms under study, growth conditions, sequence databases, experimental protocols, and instrumentation; therefore, study-specific methods are needed to estimate the accuracy (false positive rates) of these peptide and protein identifications. We present and evaluate methods for estimating false positive identification rates based on searches of randomized databases (reversed and reshuffled). We examine the use of separate searches of a forward then a randomized database and combined searches of a randomized database appended to a forward sequence database. Estimated error rates from randomized database searches are first compared against actual error rates from MS/MS runs of known protein standards. These methods are then applied to biological samples of the model microorganism Shewanella oneidensis strain MR-1. Based on the results obtained in this study, we recommend the use of use of combined searches of a reshuffled database appended to a forward sequence database as a means providing quantitative estimates of false positive identification rates of peptides and proteins. This will allow researchers to set criteria and thresholds to achieve a desired error rate and provide the scientific community with direct and quantifiable measures of peptide and protein identification accuracy as opposed to vague assessments such as "high confidence."Keywords
This publication has 30 references indexed in Scilit:
- Charge State Estimation for Tandem Mass Spectrometry ProteomicsOMICS: A Journal of Integrative Biology, 2005
- LIP Index for Peptide Classification Using MS/MS and SEQUEST Search via Logistic RegressionOMICS: A Journal of Integrative Biology, 2004
- In vitro and in silico processes to identify differentially expressed proteinsProteomics, 2004
- Open Mass Spectrometry Search AlgorithmJournal of Proteome Research, 2004
- TANDEM: matching proteins with tandem mass spectraBioinformatics, 2004
- Statistical analysis of global gene expression data: some practical considerationsCurrent Opinion in Biotechnology, 2004
- Mass spectrometry-based proteomicsNature, 2003
- Genome sequence of the dissimilatory metal ion–reducing bacterium Shewanella oneidensisNature Biotechnology, 2002
- Mass Spectrometry in ProteomicsChemical Reviews, 2001
- An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein databaseJournal of the American Society for Mass Spectrometry, 1994