Improving sensitivity in proteome studies by analysis of false discovery rates for multiple search engines
- 27 February 2009
- journal article
- research article
- Published by Wiley in Proteomics
- Vol. 9 (5) , 1220-1229
- https://doi.org/10.1002/pmic.200800473
Abstract
LC-MS experiments can generate large quantities of data, for which a variety of database search engines are available to make peptide and protein identifications. Decoy databases are becoming widely used to place statistical confidence in result sets, allowing the false discovery rate (FDR) to be estimated. Different search engines produce different identification sets so employing more than one search engine could result in an increased number of peptides (and proteins) being identified, if an appropriate mechanism for combining data can be defined. We have developed a search engine independent score, based on FDR, which allows peptide identifications from different search engines to be combined, called the FDR Score. The results demonstrate that the observed FDR is significantly different when analysing the set of identifications made by all three search engines, by each pair of search engines or by a single search engine. Our algorithm assigns identifications to groups according to the set of search engines that have made the identification, and re-assigns the score (combined FDR Score). The combined FDR Score can differentiate between correct and incorrect peptide identifications with high accuracy, allowing on average 35% more peptide identifications to be made at a fixed FDR than using a single search engine.Keywords
This publication has 26 references indexed in Scilit:
- Assigning Significance to Peptides Identified by Tandem Mass Spectrometry Using Decoy DatabasesJournal of Proteome Research, 2007
- Methods, algorithms and tools in computational proteomics: A practical point of viewProteomics, 2007
- Probability-based pattern recognition and statistical framework for randomization: modeling tandem mass spectrum/peptide sequence false match frequenciesBioinformatics, 2007
- Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometryNature Methods, 2007
- Prediction of Error Associated with False-Positive Rate Determination for Peptide Identification in Large-Scale Proteomics Experiments Using a Combined Reverse and Forward Peptide Sequence Database StrategyJournal of Proteome Research, 2006
- Confident protein identification using the average peptide score method coupled with search‐specific, ab initio thresholdsRapid Communications in Mass Spectrometry, 2005
- Comparison of Label-free Methods for Quantifying Human Proteins by Shotgun ProteomicsMolecular & Cellular Proteomics, 2005
- Statistical significance for genomewide studiesProceedings of the National Academy of Sciences, 2003
- Empirical Statistical Model To Estimate the Accuracy of Peptide Identifications Made by MS/MS and Database SearchAnalytical Chemistry, 2002
- Probability-based protein identification by searching sequence databases using mass spectrometry dataElectrophoresis, 1999