ProbID: A probabilistic algorithm to identify peptides through sequence database searching using tandem mass spectral data
- 1 October 2002
- journal article
- research article
- Published by Wiley in Proteomics
- Vol. 2 (10) , 1406-1412
- https://doi.org/10.1002/1615-9861(200210)2:10<1406::aid-prot1406>3.0.co;2-9
Abstract
With the recent quick expansion of DNA and protein sequence databases, intensive efforts are underway to interpret the linear genetic information of DNA in terms of function, structure, and control of biological processes. The systematic identification and quantification of expressed proteins has proven particularly powerful in this regard. Large‐scale protein identification is usually achieved by automated liquid chromatography‐tandem mass spectrometry of complex peptide mixtures and sequence database searching of the resulting spectra [Aebersold and Goodlett, Chem. Rev. 2001, 101, 269–295]. As generating large numbers of sequence‐specific mass spectra (collision‐induced dissociation/CID) spectra has become a routine operation, research has shifted from the generation of sequence database search results to their validation. Here we describe in detail a novel probabilistic model and score function that ranks the quality of the match between tandem mass spectral data and a peptide sequence in a database. We document the performance of the algorithm on a reference data set and in comparison with another sequence database search tool. The software is publicly available for use and evaluation at http://www.systemsbiology.org/research/software/proteomics/ProbID.Keywords
This publication has 14 references indexed in Scilit:
- An Automated Multidimensional Protein Identification Technology for Shotgun ProteomicsAnalytical Chemistry, 2001
- SCOPE: a probabilistic model for scoring tandem mass spectra against a peptide databaseBioinformatics, 2001
- A Dynamic Programming Approach to De Novo Peptide Sequencing via Tandem Mass SpectrometryJournal of Computational Biology, 2001
- Mass Spectrometry in ProteomicsChemical Reviews, 2001
- Evaluation of two-dimensional gel electrophoresis-based proteome analysis technologyProceedings of the National Academy of Sciences, 2000
- Probability-based protein identification by searching sequence databases using mass spectrometry dataElectrophoresis, 1999
- Quantitative analysis of complex protein mixtures using isotope-coded affinity tagsNature Biotechnology, 1999
- De NovoPeptide Sequencing via Tandem Mass SpectrometryJournal of Computational Biology, 1999
- Sequence database searches viade novo peptide sequencing by tandem mass spectrometryRapid Communications in Mass Spectrometry, 1997
- An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein databaseJournal of the American Society for Mass Spectrometry, 1994