Modeling peptide fragmentation with dynamic Bayesian networks for peptide identification
Open Access
- 1 July 2008
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 24 (13) , i348-i356
- https://doi.org/10.1093/bioinformatics/btn189
Abstract
Motivation: Tandem mass spectrometry (MS/MS) is an indispensable technology for identification of proteins from complex mixtures. Proteins are digested to peptides that are then identified by their fragmentation patterns in the mass spectrometer. Thus, at its core, MS/MS protein identification relies on the relative predictability of peptide fragmentation. Unfortunately, peptide fragmentation is complex and not fully understood, and what is understood is not always exploited by peptide identification algorithms. Results: We use a hybrid dynamic Bayesian network (DBN)/support vector machine (SVM) approach to address these two problems. We train a set of DBNs on high-confidence peptide-spectrum matches. These DBNs, known collectively as Riptide, comprise a probabilistic model of peptide fragmentation chemistry. Examination of the distributions learned by Riptide allows identification of new trends, such as prevalent a-ion fragmentation at peptide cleavage sites C-term to hydrophobic residues. In addition, Riptide can be used to produce likelihood scores that indicate whether a given peptide-spectrum match is correct. A vector of such scores is evaluated by an SVM, which produces a final score to be used in peptide identification. Using Riptide in this way yields improved discrimination when compared to other state-of-the-art MS/MS identification algorithms, increasing the number of positive identifications by as much as 12% at a 1% false discovery rate. Availability: Python and C source code are available upon request from the authors. The curated training sets are available at http://noble.gs.washington.edu/proj/intense/. The Graphical Model Tool Kit (GMTK) is freely available at http://ssli.ee.washington.edu/bilmes/gmtk. Contact:noble@gs.washington.eduKeywords
This publication has 36 references indexed in Scilit:
- Rapid and Accurate Peptide Identification from Tandem Mass SpectraJournal of Proteome Research, 2008
- Assigning Significance to Peptides Identified by Tandem Mass Spectrometry Using Decoy DatabasesJournal of Proteome Research, 2007
- High-Speed Data Reduction, Feature Detection, and MS/MS Spectrum Quality Assessment of Shotgun Proteomics Data Sets Using High-Resolution Mass SpectrometryAnalytical Chemistry, 2007
- The utility of ETD mass spectrometry in proteomic analysisPublished by Elsevier ,2006
- PepHMM: A Hidden Markov Model Based Scoring Function for Mass Spectrometry Database SearchAnalytical Chemistry, 2005
- Influence of Basic Residue Content on Fragment Ion Peak Intensities in Low-Energy Collision-Induced Dissociation Spectra of PeptidesAnalytical Chemistry, 2004
- Intensity-based protein identification by machine learning from a library of tandem mass spectraNature Biotechnology, 2004
- Statistical significance for genomewide studiesProceedings of the National Academy of Sciences, 2003
- Analysis of Proteins and Proteomes by Mass SpectrometryAnnual Review of Biochemistry, 2001
- An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein databaseJournal of the American Society for Mass Spectrometry, 1994