Clustering Millions of Tandem Mass Spectra
- 8 December 2007
- journal article
- research article
- Published by American Chemical Society (ACS) in Journal of Proteome Research
- Vol. 7 (1) , 113-122
- https://doi.org/10.1021/pr070361e
Abstract
Tandem mass spectrometry (MS/MS) experiments often generate redundant data sets containing multiple spectra of the same peptides. Clustering of MS/MS spectra takes advantage of this redundancy by identifying multiple spectra of the same peptide and replacing them with a single representative spectrum. Analyzing only representative spectra results in significant speed-up of MS/MS database searches. We present an efficient clustering approach for analyzing large MS/MS data sets (over 10 million spectra) with a capability to reduce the number of spectra submitted to further analysis by an order of magnitude. The MS/MS database search of clustered spectra results in fewer spurious hits to the database and increases number of peptide identifications as compared to regular nonclustered searches. Our open source software MS-Clustering is available for download at http://peptide.ucsd.edu or can be run online at http://proteomics.bioprojects.org/MassSpec.Keywords
This publication has 36 references indexed in Scilit:
- The Paragon Algorithm, a Next Generation Search Engine That Uses Sequence Temperature Values and Feature Probabilities to Identify Peptides from Tandem Mass SpectraMolecular & Cellular Proteomics, 2007
- Peptide Sequence Tags for Fast Database Search in Mass-SpectrometryJournal of Proteome Research, 2005
- InsPecT: Identification of Posttranslationally Modified Peptides from Tandem Mass SpectraAnalytical Chemistry, 2005
- Open Mass Spectrometry Search AlgorithmJournal of Proteome Research, 2004
- TANDEM: matching proteins with tandem mass spectraBioinformatics, 2004
- Similarity among Tandem Mass Spectra from Proteomic Experiments: Detection, Significance, and UtilityAnalytical Chemistry, 2003
- Probability-based protein identification by searching sequence databases using mass spectrometry dataElectrophoresis, 1999
- Sequence tag identification of intact proteins by matching tanden mass spectral data against sequence data bases.Proceedings of the National Academy of Sciences, 1996
- Error-Tolerant Identification of Peptides in Sequence Databases by Peptide Sequence TagsAnalytical Chemistry, 1994
- An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein databaseJournal of the American Society for Mass Spectrometry, 1994