Semi-supervised learning for peptide identification from shotgun proteomics datasets
Top Cited Papers
- 21 October 2007
- journal article
- Published by Springer Nature in Nature Methods
- Vol. 4 (11) , 923-925
- https://doi.org/10.1038/nmeth1113
Abstract
Shotgun proteomics uses liquid chromatography-tandem mass spectrometry to identify proteins in complex biological samples. We describe an algorithm, called Percolator, for improving the rate of confident peptide identifications from a collection of tandem mass spectra. Percolator uses semi-supervised machine learning to discriminate between correct and decoy spectrum identifications, correctly assigning peptides to 17% more spectra from a tryptic Saccharomyces cerevisiae dataset, and up to 77% more spectra from non-tryptic digests, relative to a fully supervised approach.Keywords
This publication has 10 references indexed in Scilit:
- Statistical significance for genomewide studiesProceedings of the National Academy of Sciences, 2003
- A New Algorithm for the Evaluation of Shotgun Peptide Sequencing in Proteomics: Support Vector Machine Classification of Peptide MS/MS Spectra and SEQUEST ScoresJournal of Proteome Research, 2002
- Evaluation of Multidimensional Chromatography Coupled with Tandem Mass Spectrometry (LC/LC−MS/MS) for Large-Scale Protein Analysis: The Yeast ProteomeJournal of Proteome Research, 2002
- Empirical Statistical Model To Estimate the Accuracy of Peptide Identifications Made by MS/MS and Database SearchAnalytical Chemistry, 2002
- Probability-Based Validation of Protein Identifications Using a Modified SEQUEST AlgorithmAnalytical Chemistry, 2002
- Qscore: An algorithm for evaluating SEQUEST database search resultsJournal of the American Society for Mass Spectrometry, 2002
- DTASelect and Contrast: Tools for Assembling and Comparing Protein Identifications from Shotgun ProteomicsJournal of Proteome Research, 2002
- Large-scale analysis of the yeast proteome by multidimensional protein identification technologyNature Biotechnology, 2001
- Probability-based protein identification by searching sequence databases using mass spectrometry dataElectrophoresis, 1999
- An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein databaseJournal of the American Society for Mass Spectrometry, 1994