Prediction of Error Associated with False-Positive Rate Determination for Peptide Identification in Large-Scale Proteomics Experiments Using a Combined Reverse and Forward Peptide Sequence Database Strategy

18 November 2006

journal article
research article
Published by American Chemical Society (ACS) in Journal of Proteome Research

Vol. 6 (1) , 392-398
https://doi.org/10.1021/pr0603194

Abstract

In recent years, a variety of approaches have been developed using decoy databases to empirically assess the error associated with peptide identifications from large-scale proteomics experiments. We have developed an approach for calculating the expected uncertainty associated with false-positive rate determination using concatenated reverse and forward protein sequence databases. After explaining the theoretical basis of our model, we compare predicted error with the results of experiments characterizing a series of mixtures containing known proteins. In general, results from characterization of known proteins show good agreement with our predictions. Finally, we consider how these approaches may be applied to more complicated data sets, as when peptides are separated by charge state prior to false-positive determination. Keywords: Peptide Identification • False-Positive Rate • False Discovery Rate • Proteomics • Data Analysis • Mass Spectrometry • Reversed Database • Decoy Database

Keywords

This publication has 9 references indexed in Scilit:

Randomized Sequence Databases for Tandem Mass Spectrometry Peptide and Protein Identification
OMICS: A Journal of Integrative Biology, 2005
Probability-Based Evaluation of Peptide and Protein Identifications from Tandem Mass Spectrometry and SEQUEST Analysis: The Human Proteome
Journal of Proteome Research, 2004
Potential for False Positive Identifications from Large Databases through Tandem Mass Spectrometry
Journal of Proteome Research, 2004
Open Mass Spectrometry Search Algorithm
Journal of Proteome Research, 2004
Evaluation of Multidimensional Chromatography Coupled with Tandem Mass Spectrometry (LC/LC−MS/MS) for Large-Scale Protein Analysis: The Yeast Proteome
Journal of Proteome Research, 2002
Empirical Statistical Model To Estimate the Accuracy of Peptide Identifications Made by MS/MS and Database Search
Analytical Chemistry, 2002
Probability-Based Validation of Protein Identifications Using a Modified SEQUEST Algorithm
Analytical Chemistry, 2002
Probability-based protein identification by searching sequence databases using mass spectrometry data
Electrophoresis, 1999
An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database
Journal of the American Society for Mass Spectrometry, 1994