DBDigger: Reorganized Proteomic Database Identification That Improves Flexibility and Speed
- 5 March 2005
- journal article
- research article
- Published by American Chemical Society (ACS) in Analytical Chemistry
- Vol. 77 (8) , 2464-2474
- https://doi.org/10.1021/ac0487000
Abstract
Database search identification algorithms, such as Sequest and Mascot, constitute powerful enablers for proteomic tandem mass spectrometry. We introduce DBDigger, an algorithm that reorganizes the database identification process to remove a problematic bottleneck. Typically such algorithms determine which candidate sequences can be compared to each spectrum. Instead, DBDigger determines which spectra can be compared to each candidate sequence, enabling the software to generate candidate sequences only once for each HPLC separation rather than for each spectrum. This reorganization also reduces the number of times a spectrum must be predicted for a particular candidate sequence and charge state. As a result, DBDigger can accelerate some database searches by more than an order of magnitude. In addition, the software offers features to reduce the performance degradation introduced by posttranslational modification (PTM) searching. DBDigger allows researchers to specify the sequence context in which each PTM is possible. In the case of CNBr digests, for example, modified methionine residues can be limited to occur only at the C-termini of peptides. Use of “context-dependent” PTM searching reduces the performance penalty relative to traditional PTM searching. We characterize the performance possible with DBDigger, showcasing MASPIC, a new statistical scorer. We describe the implementation of these innovations in the hope that other researchers will employ them for rapid and highly flexible proteomic database search.Keywords
This publication has 17 references indexed in Scilit:
- Characterization of the 70S Ribosome from Rhodopseudomonas palustris Using an Integrated “Top-Down” and “Bottom-Up” Mass Spectrometric ApproachJournal of Proteome Research, 2004
- MS1, MS2, and SQT—three unified, compact, and easily parsed file formats for the storage of shotgun proteomic spectra and identificationsRapid Communications in Mass Spectrometry, 2004
- Improving large‐scale proteomics by clustering of mass spectrometry dataProteomics, 2004
- TANDEM: matching proteins with tandem mass spectraBioinformatics, 2004
- Complete genome sequence of the metabolically versatile photosynthetic bacterium Rhodopseudomonas palustrisNature Biotechnology, 2003
- Characterizing degradation products of peptides containing N‐terminal Cys residues by (off‐line high‐performance liquid chromatography)/matrix‐assisted laser desorption/ionization quadrupole time‐of‐flight measurementsRapid Communications in Mass Spectrometry, 2003
- A method for reducing the time required to match protein sequences with tandem mass spectraRapid Communications in Mass Spectrometry, 2003
- Empirical Statistical Model To Estimate the Accuracy of Peptide Identifications Made by MS/MS and Database SearchAnalytical Chemistry, 2002
- Probability-based protein identification by searching sequence databases using mass spectrometry dataElectrophoresis, 1999
- Letter to the editorsJournal of Mass Spectrometry, 1984