MS1, MS2, and SQT—three unified, compact, and easily parsed file formats for the storage of shotgun proteomic spectra and identifications
Top Cited Papers
- 13 August 2004
- journal article
- research article
- Published by Wiley in Rapid Communications in Mass Spectrometry
- Vol. 18 (18) , 2162-2168
- https://doi.org/10.1002/rcm.1603
Abstract
As the speed with which proteomic labs generate data increases along with the scale of projects they are undertaking, the resulting data storage and data processing problems will continue to challenge computational resources. This is especially true for shotgun proteomic techniques that can generate tens of thousands of spectra per instrument each day. One design factor leading to many of these problems is caused by storing spectra and the database identifications for a given spectrum as individual files. While these problems can be addressed by storing all of the spectra and search results in large relational databases, the infrastructure to implement such a strategy can be beyond the means of academic labs. We report here a series of unified text file formats for storing spectral data (MS1 and MS2) and search results (SQT) that are compact, easily parsed by both machine and humans, and yet flexible enough to be coupled with new algorithms and data‐mining strategies. Copyright © 2004 John Wiley & Sons, Ltd.Keywords
This publication has 15 references indexed in Scilit:
- GutenTag: High-Throughput Sequence Tagging via an Empirically Derived Fragmentation ModelAnalytical Chemistry, 2003
- The Proteomics Standards InitiativeProteomics, 2003
- A Hypergeometric Probability Model for Protein Identification and Validation Using Tandem Mass Spectral Data and Protein Sequence DatabasesAnalytical Chemistry, 2003
- A systematic approach to modeling, capturing, and disseminating proteomics experimental dataNature Biotechnology, 2003
- Evaluation of Multidimensional Chromatography Coupled with Tandem Mass Spectrometry (LC/LC−MS/MS) for Large-Scale Protein Analysis: The Yeast ProteomeJournal of Proteome Research, 2002
- A proteomic view of the Plasmodium falciparum life cycleNature, 2002
- Shotgun identification of protein modifications from protein complexes and lens tissueProceedings of the National Academy of Sciences, 2002
- Comparison of three directly coupled HPLC MS/MS strategies for identification of proteins from complex mixtures: single-dimension LC-MS/MS, 2-phase MudPIT, and 3-phase MudPITInternational Journal of Mass Spectrometry, 2002
- Direct Analysis and Identification of Proteins in Mixtures by LC/MS/MS and Database Searching at the Low-Femtomole LevelAnalytical Chemistry, 1997
- An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein databaseJournal of the American Society for Mass Spectrometry, 1994