Improving Sensitivity in Shotgun Proteomics Using a Peptide-Centric Database with Reduced Complexity: Protease Cleavage and SCX Elution Rules from Data Mining of MS/MS Spectra
- 13 January 2006
- journal article
- research article
- Published by American Chemical Society (ACS) in Analytical Chemistry
- Vol. 78 (4) , 1071-1084
- https://doi.org/10.1021/ac051127f
Abstract
Correct identification of a peptide sequence from MS/MS data is still a challenging research problem, particularly in proteomic analyses of higher eukaryotes where protein databases are large. The scoring methods of search programs often generate cases where incorrect peptide sequences score higher than correct peptide sequences (referred to as distraction). Because smaller databases yield less distraction and better discrimination between correct and incorrect assignments, we developed a method for editing a peptide-centric database (PC-DB) to remove unlikely sequences and strategies for enabling search programs to utilize this peptide database. Rules for unlikely missed cleavage and nontryptic proteolysis products were identified by data mining 11 849 high-confidence peptide assignments. We also evaluated ion exchange chromatographic behavior as an editing criterion to generate subset databases. When used to search a well-annotated test data set of MS/MS spectra, we found no loss of critical information using PC-DBs, validating the methods for generating and searching against the databases. On the other hand, improved confidence in peptide assignments was achieved for tryptic peptides, measured by changes in ΔCN and RSP. Decreased distraction was also achieved, consistent with the 3−9-fold decrease in database size. Data mining identified a major class of common nonspecific proteolytic products corresponding to leucine aminopeptidase (LAP) cleavages. Large improvements in identifying LAP products were achieved using the PC-DB approach when compared with conventional searches against protein databases. These results demonstrate that peptide properties can be used to reduce database size, yielding improved accuracy and information capture due to reduced distraction, but with little loss of information compared to conventional protein database searches.Keywords
This publication has 14 references indexed in Scilit:
- Targeted Proteomic Analysis of 14-3-3ς, a p53 Effector Commonly Silenced in CancerMolecular & Cellular Proteomics, 2005
- Large-scale database searching using tandem mass spectra: Looking up the answer in the back of the bookNature Methods, 2004
- DBParser: Web-Based Software for Shotgun Proteomic Data AnalysesJournal of Proteome Research, 2004
- The Need for Guidelines in Publication of Peptide and Protein Identification DataMolecular & Cellular Proteomics, 2004
- Trypsin Cleaves Exclusively C-terminal to Arginine and Lysine ResiduesMolecular & Cellular Proteomics, 2004
- Improving Reproducibility and Sensitivity in Identifying Human Proteins by Shotgun ProteomicsAnalytical Chemistry, 2004
- Impact of Ion Trap Tandem Mass Spectra Variability on the Identification of PeptidesAnalytical Chemistry, 2004
- A method for reducing the time required to match protein sequences with tandem mass spectraRapid Communications in Mass Spectrometry, 2003
- Use of Artificial Neural Networks for the Accurate Prediction of Peptide Liquid Chromatography Elution Times in Proteome AnalysesAnalytical Chemistry, 2003
- Qscore: An algorithm for evaluating SEQUEST database search resultsJournal of the American Society for Mass Spectrometry, 2002