Prediction of Error Associated with False-Positive Rate Determination for Peptide Identification in Large-Scale Proteomics Experiments Using a Combined Reverse and Forward Peptide Sequence Database Strategy
- 18 November 2006
- journal article
- research article
- Published by American Chemical Society (ACS) in Journal of Proteome Research
- Vol. 6 (1) , 392-398
- https://doi.org/10.1021/pr0603194
Abstract
In recent years, a variety of approaches have been developed using decoy databases to empirically assess the error associated with peptide identifications from large-scale proteomics experiments. We have developed an approach for calculating the expected uncertainty associated with false-positive rate determination using concatenated reverse and forward protein sequence databases. After explaining the theoretical basis of our model, we compare predicted error with the results of experiments characterizing a series of mixtures containing known proteins. In general, results from characterization of known proteins show good agreement with our predictions. Finally, we consider how these approaches may be applied to more complicated data sets, as when peptides are separated by charge state prior to false-positive determination. Keywords: Peptide Identification • False-Positive Rate • False Discovery Rate • Proteomics • Data Analysis • Mass Spectrometry • Reversed Database • Decoy DatabaseKeywords
This publication has 9 references indexed in Scilit:
- Randomized Sequence Databases for Tandem Mass Spectrometry Peptide and Protein IdentificationOMICS: A Journal of Integrative Biology, 2005
- Probability-Based Evaluation of Peptide and Protein Identifications from Tandem Mass Spectrometry and SEQUEST Analysis: The Human ProteomeJournal of Proteome Research, 2004
- Potential for False Positive Identifications from Large Databases through Tandem Mass SpectrometryJournal of Proteome Research, 2004
- Open Mass Spectrometry Search AlgorithmJournal of Proteome Research, 2004
- Evaluation of Multidimensional Chromatography Coupled with Tandem Mass Spectrometry (LC/LC−MS/MS) for Large-Scale Protein Analysis: The Yeast ProteomeJournal of Proteome Research, 2002
- Empirical Statistical Model To Estimate the Accuracy of Peptide Identifications Made by MS/MS and Database SearchAnalytical Chemistry, 2002
- Probability-Based Validation of Protein Identifications Using a Modified SEQUEST AlgorithmAnalytical Chemistry, 2002
- Probability-based protein identification by searching sequence databases using mass spectrometry dataElectrophoresis, 1999
- An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein databaseJournal of the American Society for Mass Spectrometry, 1994