Optimization of filtering criterion for SEQUEST database searching to improve proteome coverage in shotgun proteomics

Open Access

31 August 2007

journal article
research article
Published by Springer Nature in BMC Bioinformatics

Vol. 8 (1) , 323
https://doi.org/10.1186/1471-2105-8-323

Abstract

Background In proteomic analysis, MS/MS spectra acquired by mass spectrometer are assigned to peptides by database searching algorithms such as SEQUEST. The assignations of peptides to MS/MS spectra by SEQUEST searching algorithm are defined by several scores including Xcorr, ΔCn, Sp, Rsp, matched ion count and so on. Filtering criterion using several above scores is used to isolate correct identifications from random assignments. However, the filtering criterion was not favorably optimized up to now. Results In this study, we implemented a machine learning approach known as predictive genetic algorithm (GA) for the optimization of filtering criteria to maximize the number of identified peptides at fixed false-discovery rate (FDR) for SEQUEST database searching. As the FDR was directly determined by decoy database search scheme, the GA based optimization approach did not require any pre-knowledge on the characteristics of the data set, which represented significant advantages over statistical approaches such as PeptideProphet. Compared with PeptideProphet, the GA based approach can achieve similar performance in distinguishing true from false assignment with only 1/10 of the processing time. Moreover, the GA based approach can be easily extended to process other database search results as it did not rely on any assumption on the data. Conclusion Our results indicated that filtering criteria should be optimized individually for different samples. The new developed software using GA provides a convenient and fast way to create tailored optimal criteria for different proteome samples to improve proteome coverage.

Keywords

This publication has 40 references indexed in Scilit:

Quality Assessment of Tandem Mass Spectra Based on Cumulative Intensity Normalization
Journal of Proteome Research, 2006
Global, In Vivo, and Site-Specific Phosphorylation Dynamics in Signaling Networks
Cell, 2006
Trade-Off between High Sensitivity and Increased Potential for False Positive Peptide Sequence Matches Using a Two-Dimensional Linear Ion Trap for Tandem Mass Spectrometry-Based Proteomics
Journal of Proteome Research, 2006
A Heuristic Method for Assigning a False-discovery Rate for Protein Identifications from Mascot Database Search Results
Molecular & Cellular Proteomics, 2005
Data mining techniques for cancer detection using serum proteomic profiling
Artificial Intelligence in Medicine, 2004
Mass Spectral Analysis in Proteomics
Annual Review of Biophysics, 2004
A method for the comprehensive proteomic analysis of membrane proteins
Nature Biotechnology, 2003
A proteomic view of the Plasmodium falciparum life cycle
Nature, 2002
Empirical Statistical Model To Estimate the Accuracy of Peptide Identifications Made by MS/MS and Database Search
Analytical Chemistry, 2002
Direct analysis of protein complexes using mass spectrometry
Nature Biotechnology, 1999