Improving large‐scale proteomics by clustering of mass spectrometry data
- 23 March 2004
- journal article
- research article
- Published by Wiley in Proteomics
- Vol. 4 (4) , 950-960
- https://doi.org/10.1002/pmic.200300652
Abstract
Tandem mass spectrometry (MS/MS), coupled with liquid chromatography (LC), is a powerful tool for the analysis and comparison of complex protein and peptide mixtures. However, the extremely large amounts of data that result from the process are very complex and difficult to analyze. We show how the clustering of similar spectra from multiple LC-MS/MS runs can help in data management and improve the analysis of complex peptide mixtures. The major effect of spectrum clustering is the reduction of the huge amounts of data to a manageable size. As a result, analysis time is shorter and more data can be stored for further analysis. Furthermore, spectrum quality improvement allows the identification of more peptides with greater confidence, the comparison of complex peptide mixtures is facilitated, and the entire proteomics project is presented in concise form. Pep-Miner is an advanced software tool that implements these clustering-based applications. It proved useful in several comparative proteomics projects involving lung cancer cells and various other cell types. In one of these projects, Pep-Miner reduced 517 000 spectra to 20 900 clusters and identified 2518 peptides derived from 830 proteins. Clustering and identification lasted less than two hours on an IBM Thinkpad T23 computer (laptop). Pep-Miner's unique properties make it a very useful tool for large-scale shotgun proteomics projects.Keywords
This publication has 26 references indexed in Scilit:
- Mass spectrometry-based proteomicsNature, 2003
- Data analysis—the Achilles heel of proteomicsNature Biotechnology, 2003
- Use of Artificial Neural Networks for the Accurate Prediction of Peptide Liquid Chromatography Elution Times in Proteome AnalysesAnalytical Chemistry, 2003
- CHOMPER: A bioinformatic tool for rapid validation of tandem mass spectrometry search results associated with high-throughput proteomic strategiesProteomics, 2002
- Mutation-Tolerant Protein Identification by Mass SpectrometryJournal of Computational Biology, 2000
- Probability-based protein identification by searching sequence databases using mass spectrometry dataElectrophoresis, 1999
- De NovoPeptide Sequencing via Tandem Mass SpectrometryJournal of Computational Biology, 1999
- Clustering Gene Expression PatternsJournal of Computational Biology, 1999
- Error-Tolerant Identification of Peptides in Sequence Databases by Peptide Sequence TagsAnalytical Chemistry, 1994
- An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein databaseJournal of the American Society for Mass Spectrometry, 1994